2025-09-14 13:14:53,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.050-delay_18
2025-09-14 13:14:53,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.050-delay_18
2025-09-14 13:14:53,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'18': <latency_env.delayed_mdp.ConstantDelay object at 0x7f868f803b90>}
2025-09-14 13:14:53,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 13:14:53,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 13:14:53,697 baseline-bpql-noisepromille50-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=125, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 13:14:53,697 baseline-bpql-noisepromille50-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 13:14:55,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 13:14:55,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 13:17:27,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 13:17:35,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: -491.60516 ± 100.400
2025-09-14 13:17:35,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-497.50403), np.float32(-520.3326), np.float32(-534.9102), np.float32(-556.6897), np.float32(-577.3249), np.float32(-217.30193), np.float32(-558.7236), np.float32(-533.6832), np.float32(-498.13538), np.float32(-421.44632)]
2025-09-14 13:17:35,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:17:35,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (-491.61) for latency 18
2025-09-14 13:17:35,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 23 minutes, 4 seconds)
2025-09-14 13:20:08,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 13:20:16,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: -226.39291 ± 34.522
2025-09-14 13:20:16,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-252.37898), np.float32(-153.0662), np.float32(-250.7244), np.float32(-212.82497), np.float32(-237.0333), np.float32(-262.04355), np.float32(-257.8296), np.float32(-199.74936), np.float32(-188.29921), np.float32(-249.97969)]
2025-09-14 13:20:16,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:20:16,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (-226.39) for latency 18
2025-09-14 13:20:16,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 22 minutes, 19 seconds)
2025-09-14 13:22:50,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 13:22:57,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: -192.04477 ± 53.570
2025-09-14 13:22:57,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-189.63918), np.float32(-270.31482), np.float32(-205.45796), np.float32(-228.8884), np.float32(-190.22664), np.float32(-268.39417), np.float32(-128.46126), np.float32(-120.15635), np.float32(-203.39062), np.float32(-115.518265)]
2025-09-14 13:22:57,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:22:57,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (-192.04) for latency 18
2025-09-14 13:22:57,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 19 minutes, 54 seconds)
2025-09-14 13:25:32,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 13:25:40,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: -65.24731 ± 48.932
2025-09-14 13:25:40,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-58.425934), np.float32(-82.89797), np.float32(-15.420616), np.float32(-106.36892), np.float32(-97.57687), np.float32(-80.784065), np.float32(-112.72045), np.float32(-107.18521), np.float32(-42.429485), np.float32(51.33644)]
2025-09-14 13:25:40,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:25:40,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (-65.25) for latency 18
2025-09-14 13:25:40,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 17 minutes, 55 seconds)
2025-09-14 13:28:18,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 13:28:26,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 37.72731 ± 108.425
2025-09-14 13:28:26,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(147.2732), np.float32(-80.53658), np.float32(93.27758), np.float32(265.53003), np.float32(-56.58075), np.float32(19.769524), np.float32(121.79636), np.float32(-50.795444), np.float32(-42.768627), np.float32(-39.692215)]
2025-09-14 13:28:26,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:28:26,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (37.73) for latency 18
2025-09-14 13:28:26,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 16 minutes, 48 seconds)
2025-09-14 13:31:33,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 13:31:43,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 186.77231 ± 96.464
2025-09-14 13:31:43,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(102.97178), np.float32(170.90065), np.float32(138.93631), np.float32(102.13775), np.float32(418.35684), np.float32(145.08437), np.float32(179.20308), np.float32(87.69572), np.float32(260.79254), np.float32(261.64398)]
2025-09-14 13:31:43,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:31:43,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (186.77) for latency 18
2025-09-14 13:31:43,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 25 minutes, 44 seconds)
2025-09-14 13:34:51,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 13:35:02,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 412.11084 ± 100.990
2025-09-14 13:35:02,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(452.49356), np.float32(329.47318), np.float32(657.0507), np.float32(411.324), np.float32(258.14035), np.float32(424.33426), np.float32(460.66196), np.float32(335.52432), np.float32(374.37454), np.float32(417.73148)]
2025-09-14 13:35:02,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:35:02,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (412.11) for latency 18
2025-09-14 13:35:02,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 34 minutes, 39 seconds)
2025-09-14 13:38:03,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 13:38:13,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 825.13672 ± 175.116
2025-09-14 13:38:13,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(566.7764), np.float32(919.7741), np.float32(1254.38), np.float32(708.3269), np.float32(881.6672), np.float32(856.55524), np.float32(830.895), np.float32(717.273), np.float32(821.6995), np.float32(694.0198)]
2025-09-14 13:38:13,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:38:13,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (825.14) for latency 18
2025-09-14 13:38:13,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 40 minutes, 46 seconds)
2025-09-14 13:41:08,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 13:41:18,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1051.36743 ± 162.508
2025-09-14 13:41:18,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1031.2682), np.float32(938.90594), np.float32(1188.959), np.float32(1103.4371), np.float32(1242.1676), np.float32(1224.0511), np.float32(807.95123), np.float32(1239.997), np.float32(841.68585), np.float32(895.25226)]
2025-09-14 13:41:18,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:41:18,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1051.37) for latency 18
2025-09-14 13:41:18,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 44 minutes, 27 seconds)
2025-09-14 13:44:12,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 13:44:22,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1031.06067 ± 280.179
2025-09-14 13:44:22,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(858.482), np.float32(1131.4387), np.float32(790.2839), np.float32(1646.6421), np.float32(1127.8759), np.float32(930.54156), np.float32(997.24133), np.float32(924.0716), np.float32(588.9269), np.float32(1315.1025)]
2025-09-14 13:44:22,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:44:22,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 46 minutes, 50 seconds)
2025-09-14 13:47:16,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 13:47:25,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1396.72339 ± 351.226
2025-09-14 13:47:25,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1094.4523), np.float32(1092.3083), np.float32(1729.9381), np.float32(1572.7351), np.float32(1379.3557), np.float32(2161.9854), np.float32(1104.8737), np.float32(1632.6294), np.float32(1054.2899), np.float32(1144.6672)]
2025-09-14 13:47:25,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:47:25,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1396.72) for latency 18
2025-09-14 13:47:25,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 39 minutes, 32 seconds)
2025-09-14 13:50:18,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 13:50:28,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1634.72534 ± 544.676
2025-09-14 13:50:28,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1229.9565), np.float32(1457.0771), np.float32(2131.3364), np.float32(1092.7457), np.float32(1252.6527), np.float32(2734.9353), np.float32(2271.2397), np.float32(1059.1069), np.float32(1307.51), np.float32(1810.6926)]
2025-09-14 13:50:28,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:50:28,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1634.73) for latency 18
2025-09-14 13:50:28,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 31 minutes, 35 seconds)
2025-09-14 13:53:29,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 13:53:40,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1467.00879 ± 492.451
2025-09-14 13:53:40,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1194.5793), np.float32(1269.7167), np.float32(2210.028), np.float32(1198.5651), np.float32(2643.5422), np.float32(1251.7533), np.float32(1111.618), np.float32(1214.5681), np.float32(1249.0106), np.float32(1326.706)]
2025-09-14 13:53:40,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:53:40,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 28 minutes, 47 seconds)
2025-09-14 13:56:57,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 13:57:08,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1275.40161 ± 320.492
2025-09-14 13:57:08,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1699.6129), np.float32(1194.9254), np.float32(1362.8661), np.float32(1357.7362), np.float32(386.12302), np.float32(1356.1743), np.float32(1285.8707), np.float32(1358.4783), np.float32(1404.8804), np.float32(1347.3472)]
2025-09-14 13:57:08,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:57:08,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 32 minutes, 32 seconds)
2025-09-14 14:00:24,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:00:33,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1616.81812 ± 308.575
2025-09-14 14:00:33,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2157.2878), np.float32(1485.9426), np.float32(1918.3389), np.float32(1737.2859), np.float32(1885.4712), np.float32(1725.9126), np.float32(1453.2125), np.float32(1153.3274), np.float32(1471.2269), np.float32(1180.176)]
2025-09-14 14:00:33,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:00:33,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 35 minutes, 11 seconds)
2025-09-14 14:03:34,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:03:44,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1735.11230 ± 480.834
2025-09-14 14:03:44,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1417.1422), np.float32(1261.3036), np.float32(1897.3982), np.float32(2065.978), np.float32(1268.1377), np.float32(1686.8699), np.float32(1745.5691), np.float32(1738.9181), np.float32(2944.3547), np.float32(1325.452)]
2025-09-14 14:03:44,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:03:44,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1735.11) for latency 18
2025-09-14 14:03:44,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 34 minutes, 1 second)
2025-09-14 14:06:38,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:06:48,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2067.03296 ± 611.935
2025-09-14 14:06:48,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2897.2693), np.float32(2391.581), np.float32(2531.258), np.float32(1731.6864), np.float32(2284.2043), np.float32(1402.2858), np.float32(1268.5623), np.float32(1309.9722), np.float32(1850.7765), np.float32(3002.7344)]
2025-09-14 14:06:48,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:06:48,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (2067.03) for latency 18
2025-09-14 14:06:48,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 31 minutes, 3 seconds)
2025-09-14 14:09:40,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:09:49,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1811.56177 ± 704.853
2025-09-14 14:09:49,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1237.7186), np.float32(2121.7373), np.float32(2365.9644), np.float32(1767.872), np.float32(492.38412), np.float32(1248.4651), np.float32(1217.1774), np.float32(2324.862), np.float32(2877.6287), np.float32(2461.8076)]
2025-09-14 14:09:49,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:09:49,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 24 minutes, 52 seconds)
2025-09-14 14:12:44,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:12:53,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1768.82690 ± 436.449
2025-09-14 14:12:53,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1376.9027), np.float32(2100.8425), np.float32(1618.1061), np.float32(1393.8934), np.float32(1815.8035), np.float32(1059.866), np.float32(2225.5698), np.float32(1580.0558), np.float32(1905.8575), np.float32(2611.374)]
2025-09-14 14:12:53,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:12:53,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 15 minutes, 9 seconds)
2025-09-14 14:15:48,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:15:58,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1469.78210 ± 190.668
2025-09-14 14:15:58,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1468.7218), np.float32(1218.1014), np.float32(1317.3865), np.float32(1700.6005), np.float32(1681.257), np.float32(1620.5771), np.float32(1638.7314), np.float32(1400.8254), np.float32(1527.6514), np.float32(1123.9686)]
2025-09-14 14:15:58,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:15:58,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 6 minutes, 32 seconds)
2025-09-14 14:19:05,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:19:15,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2141.16406 ± 739.725
2025-09-14 14:19:15,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1713.0994), np.float32(1810.4515), np.float32(2544.066), np.float32(3488.1155), np.float32(1505.9589), np.float32(3226.2283), np.float32(1424.5189), np.float32(1625.301), np.float32(1396.4656), np.float32(2677.4368)]
2025-09-14 14:19:15,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:19:15,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (2141.16) for latency 18
2025-09-14 14:19:15,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 5 minutes, 24 seconds)
2025-09-14 14:22:33,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:22:44,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2340.27417 ± 973.436
2025-09-14 14:22:44,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3227.0896), np.float32(2031.1398), np.float32(3817.8582), np.float32(1243.3989), np.float32(2948.563), np.float32(1867.2722), np.float32(1598.3406), np.float32(1157.9299), np.float32(3832.674), np.float32(1678.4766)]
2025-09-14 14:22:44,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:22:44,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (2340.27) for latency 18
2025-09-14 14:22:44,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 8 minutes, 40 seconds)
2025-09-14 14:25:54,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:26:04,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2448.30981 ± 708.626
2025-09-14 14:26:04,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2183.192), np.float32(3063.357), np.float32(3354.4312), np.float32(1526.683), np.float32(2463.6777), np.float32(2587.7417), np.float32(3230.1885), np.float32(1391.6794), np.float32(3093.406), np.float32(1588.7429)]
2025-09-14 14:26:04,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:26:04,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (2448.31) for latency 18
2025-09-14 14:26:04,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 10 minutes, 17 seconds)
2025-09-14 14:29:04,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:29:14,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1909.73608 ± 522.847
2025-09-14 14:29:14,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2524.671), np.float32(1350.5924), np.float32(1826.0334), np.float32(1419.154), np.float32(2709.9153), np.float32(1462.91), np.float32(1509.4352), np.float32(2476.503), np.float32(2395.807), np.float32(1422.3397)]
2025-09-14 14:29:14,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:29:14,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 8 minutes, 17 seconds)
2025-09-14 14:32:06,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:32:15,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2159.10278 ± 682.557
2025-09-14 14:32:15,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1633.3859), np.float32(1616.7461), np.float32(1292.0082), np.float32(3332.845), np.float32(2086.4668), np.float32(3284.6753), np.float32(1728.2135), np.float32(2768.048), np.float32(2023.9939), np.float32(1824.6439)]
2025-09-14 14:32:15,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:32:15,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 4 minutes, 16 seconds)
2025-09-14 14:35:10,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:35:20,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1692.09741 ± 455.171
2025-09-14 14:35:20,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1280.1525), np.float32(1449.2229), np.float32(1995.5355), np.float32(2707.1301), np.float32(1326.5156), np.float32(1114.6155), np.float32(1490.1619), np.float32(1903.137), np.float32(1570.5754), np.float32(2083.928)]
2025-09-14 14:35:20,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:35:20,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 57 minutes, 52 seconds)
2025-09-14 14:38:16,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:38:25,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2082.41235 ± 640.868
2025-09-14 14:38:25,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2185.7825), np.float32(1541.9054), np.float32(2207.5767), np.float32(1620.172), np.float32(3275.4155), np.float32(1414.0582), np.float32(3079.6467), np.float32(1429.7764), np.float32(2409.114), np.float32(1660.6783)]
2025-09-14 14:38:25,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:38:25,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 49 minutes, 1 second)
2025-09-14 14:41:21,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:41:31,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2113.62036 ± 689.688
2025-09-14 14:41:31,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1631.6342), np.float32(2716.5996), np.float32(1935.3466), np.float32(1714.4419), np.float32(2298.666), np.float32(1321.3342), np.float32(3730.751), np.float32(2502.6794), np.float32(1422.0464), np.float32(1862.7029)]
2025-09-14 14:41:31,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:41:31,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 42 minutes, 24 seconds)
2025-09-14 14:44:26,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:44:35,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1953.55115 ± 667.315
2025-09-14 14:44:35,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3502.7473), np.float32(1837.2539), np.float32(2518.0037), np.float32(1337.0228), np.float32(1383.177), np.float32(2535.7358), np.float32(1275.9779), np.float32(1538.1266), np.float32(1803.9152), np.float32(1803.5503)]
2025-09-14 14:44:35,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:44:35,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 38 minutes, 1 second)
2025-09-14 14:47:49,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:48:00,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2001.60474 ± 586.500
2025-09-14 14:48:00,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2125.2036), np.float32(1339.4576), np.float32(1383.0525), np.float32(3313.64), np.float32(2619.4949), np.float32(1357.4235), np.float32(1905.7585), np.float32(2147.2148), np.float32(1772.757), np.float32(2052.0452)]
2025-09-14 14:48:00,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:48:00,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 40 minutes, 21 seconds)
2025-09-14 14:51:17,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:51:28,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1820.85510 ± 283.197
2025-09-14 14:51:28,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2060.698), np.float32(2156.204), np.float32(1558.8624), np.float32(2018.401), np.float32(1334.6329), np.float32(1873.2749), np.float32(1979.1025), np.float32(2025.9639), np.float32(1354.5844), np.float32(1846.8265)]
2025-09-14 14:51:28,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:51:28,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 42 minutes, 36 seconds)
2025-09-14 14:54:32,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:54:42,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2074.19849 ± 455.383
2025-09-14 14:54:42,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2472.7363), np.float32(2439.1865), np.float32(1533.1772), np.float32(2632.4053), np.float32(1214.3011), np.float32(1866.775), np.float32(2152.5303), np.float32(2082.739), np.float32(2603.478), np.float32(1744.6536)]
2025-09-14 14:54:42,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:54:42,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 41 minutes, 18 seconds)
2025-09-14 14:57:39,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:57:49,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1720.80408 ± 586.254
2025-09-14 14:57:49,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1281.0852), np.float32(1314.0364), np.float32(2492.2893), np.float32(1249.7986), np.float32(2078.2444), np.float32(1538.714), np.float32(1357.9319), np.float32(3056.804), np.float32(1452.0262), np.float32(1387.1102)]
2025-09-14 14:57:49,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:57:49,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 38 minutes, 24 seconds)
2025-09-14 15:00:44,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:00:54,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1903.70898 ± 823.969
2025-09-14 15:00:54,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2999.1091), np.float32(1731.4847), np.float32(2868.858), np.float32(1344.3806), np.float32(2106.2834), np.float32(1505.7491), np.float32(1471.6656), np.float32(3178.732), np.float32(497.0106), np.float32(1333.8192)]
2025-09-14 15:00:54,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:00:54,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 35 minutes, 19 seconds)
2025-09-14 15:03:49,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:03:59,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1970.72070 ± 648.080
2025-09-14 15:03:59,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1535.8296), np.float32(1605.8622), np.float32(2529.6694), np.float32(1424.0822), np.float32(3520.296), np.float32(1991.7318), np.float32(1298.5795), np.float32(1978.5687), np.float32(2361.8806), np.float32(1460.706)]
2025-09-14 15:03:59,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:03:59,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 27 minutes, 47 seconds)
2025-09-14 15:06:54,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:07:04,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1683.94690 ± 449.503
2025-09-14 15:07:04,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1480.5573), np.float32(1364.9086), np.float32(1781.2004), np.float32(1325.5886), np.float32(1180.3241), np.float32(1581.5791), np.float32(1443.8085), np.float32(1838.7957), np.float32(2812.6326), np.float32(2030.0732)]
2025-09-14 15:07:04,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:07:04,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 19 minutes, 42 seconds)
2025-09-14 15:09:57,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:10:06,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2699.34424 ± 1189.217
2025-09-14 15:10:06,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3816.1462), np.float32(4093.103), np.float32(1964.82), np.float32(2234.9414), np.float32(4108.231), np.float32(1374.1384), np.float32(1298.1874), np.float32(4304.1353), np.float32(1334.4844), np.float32(2465.2566)]
2025-09-14 15:10:06,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:10:06,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (2699.34) for latency 18
2025-09-14 15:10:06,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 14 minutes, 4 seconds)
2025-09-14 15:13:05,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:13:16,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1763.59631 ± 246.093
2025-09-14 15:13:16,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1312.859), np.float32(2124.3289), np.float32(1665.922), np.float32(1795.0574), np.float32(1650.5178), np.float32(1949.5436), np.float32(1622.3456), np.float32(2176.0261), np.float32(1611.6099), np.float32(1727.7523)]
2025-09-14 15:13:16,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:13:16,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 11 minutes, 38 seconds)
2025-09-14 15:16:34,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:16:45,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2025.40759 ± 752.295
2025-09-14 15:16:45,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2319.2463), np.float32(1804.138), np.float32(2002.458), np.float32(2119.1504), np.float32(3489.829), np.float32(1355.7427), np.float32(3206.8176), np.float32(1230.5984), np.float32(1455.2178), np.float32(1270.8776)]
2025-09-14 15:16:45,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:16:45,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 13 minutes, 26 seconds)
2025-09-14 15:20:02,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:20:11,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2040.80505 ± 762.332
2025-09-14 15:20:11,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1284.4225), np.float32(1225.1948), np.float32(1789.4746), np.float32(2781.2344), np.float32(3111.1497), np.float32(2522.5596), np.float32(1602.2401), np.float32(1429.1353), np.float32(1369.6117), np.float32(3293.0293)]
2025-09-14 15:20:11,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:20:11,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 14 minutes, 32 seconds)
2025-09-14 15:23:11,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:23:21,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2305.49194 ± 1166.140
2025-09-14 15:23:21,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3772.47), np.float32(2437.7617), np.float32(1396.9708), np.float32(2436.422), np.float32(4058.0894), np.float32(3810.1301), np.float32(1769.6687), np.float32(417.22626), np.float32(1599.2845), np.float32(1356.8953)]
2025-09-14 15:23:21,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:23:21,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 12 minutes, 9 seconds)
2025-09-14 15:26:17,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:26:26,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2322.72852 ± 1329.791
2025-09-14 15:26:26,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1417.7887), np.float32(-233.03525), np.float32(1637.0366), np.float32(3128.4927), np.float32(1271.4723), np.float32(1678.85), np.float32(4087.8054), np.float32(3382.2092), np.float32(2717.4424), np.float32(4139.222)]
2025-09-14 15:26:26,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:26:26,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 9 minutes, 31 seconds)
2025-09-14 15:29:21,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:29:31,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3370.85083 ± 1014.408
2025-09-14 15:29:31,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1353.6241), np.float32(4061.6538), np.float32(1708.8131), np.float32(4014.4062), np.float32(3913.298), np.float32(4320.1416), np.float32(3640.406), np.float32(3826.7117), np.float32(2698.7727), np.float32(4170.68)]
2025-09-14 15:29:31,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:29:31,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (3370.85) for latency 18
2025-09-14 15:29:31,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 5 minutes, 13 seconds)
2025-09-14 15:32:26,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:32:36,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2756.85767 ± 1103.472
2025-09-14 15:32:36,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3977.2734), np.float32(1254.5327), np.float32(1449.2905), np.float32(1517.8014), np.float32(2348.709), np.float32(4056.2957), np.float32(2289.322), np.float32(4011.5889), np.float32(2670.1643), np.float32(3993.5977)]
2025-09-14 15:32:36,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:32:36,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 57 minutes, 27 seconds)
2025-09-14 15:35:31,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:35:40,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3146.99854 ± 943.404
2025-09-14 15:35:40,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1466.4208), np.float32(1968.9813), np.float32(1984.3422), np.float32(3481.3262), np.float32(3069.4944), np.float32(4006.8906), np.float32(3705.3435), np.float32(4229.714), np.float32(3464.996), np.float32(4092.4785)]
2025-09-14 15:35:40,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:35:40,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 50 minutes, 14 seconds)
2025-09-14 15:38:35,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:38:44,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2237.82861 ± 1148.821
2025-09-14 15:38:44,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3510.2993), np.float32(2045.6367), np.float32(2332.1553), np.float32(4380.3823), np.float32(2002.4816), np.float32(165.72722), np.float32(1399.4907), np.float32(3266.8718), np.float32(1873.326), np.float32(1401.9141)]
2025-09-14 15:38:44,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:38:44,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 46 minutes, 13 seconds)
2025-09-14 15:41:44,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:41:54,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3115.06299 ± 1066.004
2025-09-14 15:41:54,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3911.5747), np.float32(2121.9236), np.float32(3477.2725), np.float32(2281.5269), np.float32(4551.0366), np.float32(4604.0356), np.float32(1666.1183), np.float32(2008.6735), np.float32(4061.04), np.float32(2467.429)]
2025-09-14 15:41:54,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:41:54,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 43 minutes, 58 seconds)
2025-09-14 15:45:11,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:45:22,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1993.00037 ± 548.137
2025-09-14 15:45:22,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1435.3972), np.float32(1604.8), np.float32(1971.5989), np.float32(2504.8503), np.float32(2123.7617), np.float32(1932.0201), np.float32(3246.806), np.float32(1675.3687), np.float32(1251.2471), np.float32(2184.1533)]
2025-09-14 15:45:22,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:45:22,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 44 minutes, 56 seconds)
2025-09-14 15:48:38,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:48:48,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3616.14307 ± 1266.570
2025-09-14 15:48:48,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4493.1816), np.float32(4794.1846), np.float32(4581.5356), np.float32(2174.2107), np.float32(4353.3975), np.float32(2670.577), np.float32(1280.2795), np.float32(2375.1606), np.float32(4667.6943), np.float32(4771.2085)]
2025-09-14 15:48:48,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:48:48,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (3616.14) for latency 18
2025-09-14 15:48:48,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 45 minutes, 17 seconds)
2025-09-14 15:51:47,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:51:57,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3043.94678 ± 1149.690
2025-09-14 15:51:57,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4132.241), np.float32(3993.445), np.float32(3489.139), np.float32(1793.5863), np.float32(4435.783), np.float32(2869.9705), np.float32(1393.2655), np.float32(1522.0104), np.float32(2380.0547), np.float32(4429.9717)]
2025-09-14 15:51:57,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:51:57,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 42 minutes, 49 seconds)
2025-09-14 15:54:52,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:55:01,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2071.48901 ± 951.703
2025-09-14 15:55:01,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1270.0408), np.float32(3682.587), np.float32(1537.0867), np.float32(1768.0969), np.float32(1863.2649), np.float32(1919.5112), np.float32(4134.743), np.float32(1857.9402), np.float32(1291.121), np.float32(1390.4977)]
2025-09-14 15:55:01,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:55:01,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 39 minutes, 35 seconds)
2025-09-14 15:57:55,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:58:05,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2753.35010 ± 932.996
2025-09-14 15:58:05,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2757.7676), np.float32(3677.6313), np.float32(1883.5175), np.float32(3234.977), np.float32(1365.6276), np.float32(1959.292), np.float32(4630.736), np.float32(2218.625), np.float32(2428.0461), np.float32(3377.282)]
2025-09-14 15:58:05,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:58:05,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 35 minutes, 20 seconds)
2025-09-14 16:01:00,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:01:10,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1709.67834 ± 910.826
2025-09-14 16:01:10,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2013.0676), np.float32(309.1194), np.float32(3077.6333), np.float32(1113.6779), np.float32(1883.856), np.float32(1781.2349), np.float32(1328.0194), np.float32(279.513), np.float32(2676.4446), np.float32(2634.2175)]
2025-09-14 16:01:10,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:01:10,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 28 minutes, 27 seconds)
2025-09-14 16:04:06,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:04:15,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2296.41406 ± 925.921
2025-09-14 16:04:15,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1516.806), np.float32(1316.8414), np.float32(2239.4343), np.float32(3679.7078), np.float32(2667.2642), np.float32(1667.409), np.float32(1786.4692), np.float32(1816.3447), np.float32(1985.2861), np.float32(4288.5776)]
2025-09-14 16:04:15,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:04:15,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 22 minutes, 5 seconds)
2025-09-14 16:07:08,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:07:17,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3068.09912 ± 1233.045
2025-09-14 16:07:17,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3307.1648), np.float32(2619.703), np.float32(1404.2661), np.float32(1275.8633), np.float32(3267.268), np.float32(4509.576), np.float32(4302.6626), np.float32(3853.665), np.float32(1536.3759), np.float32(4604.4497)]
2025-09-14 16:07:17,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:07:17,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 18 minutes, 1 second)
2025-09-14 16:10:17,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:10:28,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1452.33960 ± 269.968
2025-09-14 16:10:28,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2010.927), np.float32(1120.8647), np.float32(1878.7638), np.float32(1267.308), np.float32(1277.8374), np.float32(1363.4365), np.float32(1255.0864), np.float32(1495.3334), np.float32(1354.6202), np.float32(1499.2195)]
2025-09-14 16:10:28,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:10:28,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 15 minutes, 51 seconds)
2025-09-14 16:13:46,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:13:57,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3362.41992 ± 1109.777
2025-09-14 16:13:57,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3530.717), np.float32(2253.04), np.float32(4347.502), np.float32(1912.8966), np.float32(2050.8926), np.float32(3763.788), np.float32(2079.271), np.float32(4307.343), np.float32(4542.1973), np.float32(4836.5522)]
2025-09-14 16:13:57,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:13:57,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 16 minutes, 21 seconds)
2025-09-14 16:17:12,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:17:23,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2830.65601 ± 1183.682
2025-09-14 16:17:23,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4410.559), np.float32(2873.334), np.float32(2621.3228), np.float32(2095.843), np.float32(2957.361), np.float32(4633.954), np.float32(1460.1617), np.float32(4271.738), np.float32(1248.3964), np.float32(1733.8909)]
2025-09-14 16:17:23,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:17:23,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 16 minutes, 8 seconds)
2025-09-14 16:20:22,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:20:32,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2815.86768 ± 1255.221
2025-09-14 16:20:32,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4449.9863), np.float32(3088.071), np.float32(4536.56), np.float32(1740.9204), np.float32(1797.8137), np.float32(1719.5067), np.float32(4562.9326), np.float32(1672.3816), np.float32(1358.5675), np.float32(3231.9346)]
2025-09-14 16:20:32,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:20:32,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 13 minutes, 35 seconds)
2025-09-14 16:23:27,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:23:37,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1643.78357 ± 830.215
2025-09-14 16:23:37,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1301.0234), np.float32(1326.4479), np.float32(2387.6658), np.float32(1777.9113), np.float32(3692.455), np.float32(1449.2095), np.float32(1220.1262), np.float32(400.13455), np.float32(1614.7574), np.float32(1268.1039)]
2025-09-14 16:23:37,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:23:37,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 10 minutes, 42 seconds)
2025-09-14 16:26:32,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:26:41,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2588.93628 ± 1136.574
2025-09-14 16:26:41,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1218.8822), np.float32(2913.4749), np.float32(4138.9834), np.float32(1453.8834), np.float32(1543.5173), np.float32(4247.8506), np.float32(3070.8193), np.float32(2416.4941), np.float32(3686.7224), np.float32(1198.7347)]
2025-09-14 16:26:41,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:26:41,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 6 minutes, 35 seconds)
2025-09-14 16:29:36,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:29:46,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3512.52393 ± 1271.206
2025-09-14 16:29:46,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2245.8926), np.float32(2118.7122), np.float32(4211.5376), np.float32(4969.0366), np.float32(4849.0415), np.float32(2143.643), np.float32(4583.199), np.float32(4980.267), np.float32(1791.0062), np.float32(3232.8997)]
2025-09-14 16:29:46,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:29:46,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 15 seconds)
2025-09-14 16:32:40,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:32:48,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3240.02588 ± 973.465
2025-09-14 16:32:48,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4403.9824), np.float32(3997.7317), np.float32(2251.8062), np.float32(4107.367), np.float32(3729.972), np.float32(1627.2935), np.float32(4231.48), np.float32(2895.7285), np.float32(3283.8777), np.float32(1871.019)]
2025-09-14 16:32:48,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:32:48,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 54 minutes, 11 seconds)
2025-09-14 16:35:43,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:35:52,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2422.39966 ± 1153.724
2025-09-14 16:35:52,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(160.91345), np.float32(2577.321), np.float32(1571.4421), np.float32(1985.7938), np.float32(2195.6501), np.float32(1863.7097), np.float32(3569.6174), np.float32(4542.407), np.float32(2300.8801), np.float32(3456.2607)]
2025-09-14 16:35:52,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:35:52,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 50 minutes, 26 seconds)
2025-09-14 16:38:53,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:39:04,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3951.75977 ± 890.497
2025-09-14 16:39:04,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4775.169), np.float32(4402.3726), np.float32(4156.6694), np.float32(2034.6469), np.float32(4178.3794), np.float32(4787.724), np.float32(2446.594), np.float32(4218.9326), np.float32(4091.0164), np.float32(4426.0933)]
2025-09-14 16:39:04,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:39:04,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (3951.76) for latency 18
2025-09-14 16:39:04,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 48 minutes, 10 seconds)
2025-09-14 16:42:22,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:42:33,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3228.80127 ± 1166.500
2025-09-14 16:42:33,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2918.01), np.float32(2959.8433), np.float32(1194.5781), np.float32(2857.607), np.float32(4530.7026), np.float32(4705.7495), np.float32(1292.1951), np.float32(3949.327), np.float32(3989.6365), np.float32(3890.3618)]
2025-09-14 16:42:33,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:42:33,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 47 minutes, 53 seconds)
2025-09-14 16:45:48,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:45:58,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3366.45557 ± 1026.758
2025-09-14 16:45:58,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3481.8567), np.float32(2152.9194), np.float32(2698.086), np.float32(2570.1633), np.float32(1443.9064), np.float32(4149.2183), np.float32(3900.8118), np.float32(4317.702), np.float32(4378.2144), np.float32(4571.6753)]
2025-09-14 16:45:58,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:45:58,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 46 minutes, 54 seconds)
2025-09-14 16:48:57,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:49:07,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2826.47290 ± 1385.038
2025-09-14 16:49:07,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1628.1017), np.float32(1431.2374), np.float32(4629.062), np.float32(1543.7654), np.float32(4748.1694), np.float32(2828.5415), np.float32(1409.6868), np.float32(3931.696), np.float32(1664.8196), np.float32(4449.648)]
2025-09-14 16:49:07,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:49:07,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 44 minutes, 22 seconds)
2025-09-14 16:52:02,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:52:12,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3345.71484 ± 1117.435
2025-09-14 16:52:12,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3130.4111), np.float32(4572.6167), np.float32(2140.134), np.float32(1869.4269), np.float32(2027.2628), np.float32(2705.3125), np.float32(4411.0493), np.float32(4794.7524), np.float32(3097.5806), np.float32(4708.605)]
2025-09-14 16:52:12,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:52:12,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 41 minutes, 13 seconds)
2025-09-14 16:55:06,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:55:16,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2589.15308 ± 1180.810
2025-09-14 16:55:16,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4078.2773), np.float32(2019.8683), np.float32(632.6884), np.float32(1743.393), np.float32(4461.197), np.float32(2569.2031), np.float32(4162.709), np.float32(2310.4685), np.float32(1967.192), np.float32(1946.5349)]
2025-09-14 16:55:16,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:55:16,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 37 minutes, 10 seconds)
2025-09-14 16:58:11,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:58:21,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2459.19434 ± 1303.203
2025-09-14 16:58:21,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3206.5872), np.float32(2189.42), np.float32(1304.2502), np.float32(981.6896), np.float32(4183.568), np.float32(1478.9736), np.float32(1607.0864), np.float32(4634.483), np.float32(3814.2417), np.float32(1191.6434)]
2025-09-14 16:58:21,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:58:21,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 31 minutes, 34 seconds)
2025-09-14 17:01:14,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:01:23,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3849.76440 ± 1209.068
2025-09-14 17:01:23,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2760.818), np.float32(4688.1367), np.float32(4815.3965), np.float32(4548.295), np.float32(1314.9017), np.float32(4715.065), np.float32(4254.666), np.float32(4778.2964), np.float32(2186.638), np.float32(4435.4307)]
2025-09-14 17:01:23,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:01:23,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 26 minutes, 21 seconds)
2025-09-14 17:04:18,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:04:28,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3704.85205 ± 1266.011
2025-09-14 17:04:28,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2564.611), np.float32(4860.207), np.float32(4207.0493), np.float32(4913.7354), np.float32(2334.7532), np.float32(4892.4766), np.float32(4526.715), np.float32(4722.118), np.float32(1320.7905), np.float32(2706.0652)]
2025-09-14 17:04:28,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:04:28,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 22 minutes, 54 seconds)
2025-09-14 17:07:32,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:07:43,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2380.32178 ± 863.578
2025-09-14 17:07:43,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1597.3463), np.float32(2334.4526), np.float32(1210.0457), np.float32(1627.9065), np.float32(2779.8125), np.float32(2266.0276), np.float32(1632.6327), np.float32(3467.6174), np.float32(2838.9446), np.float32(4048.4314)]
2025-09-14 17:07:43,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:07:43,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 20 minutes, 41 seconds)
2025-09-14 17:11:01,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:11:12,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2887.29810 ± 1305.951
2025-09-14 17:11:12,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1708.7), np.float32(1458.1161), np.float32(4452.143), np.float32(3897.298), np.float32(4448.95), np.float32(1434.0687), np.float32(1295.4832), np.float32(2735.3132), np.float32(4631.2197), np.float32(2811.6875)]
2025-09-14 17:11:12,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:11:12,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 19 minutes, 40 seconds)
2025-09-14 17:14:25,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:14:35,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1945.56567 ± 418.560
2025-09-14 17:14:35,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2450.687), np.float32(1578.4818), np.float32(1963.0396), np.float32(1651.6776), np.float32(2195.367), np.float32(2799.6467), np.float32(1794.4034), np.float32(2046.3207), np.float32(1614.0919), np.float32(1361.9403)]
2025-09-14 17:14:35,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:14:35,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 17 minutes, 58 seconds)
2025-09-14 17:17:35,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:17:45,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3637.86646 ± 1259.020
2025-09-14 17:17:45,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3473.772), np.float32(1284.3469), np.float32(4327.54), np.float32(1887.5629), np.float32(3580.959), np.float32(4659.726), np.float32(4953.7866), np.float32(2563.4563), np.float32(4820.6465), np.float32(4826.8647)]
2025-09-14 17:17:45,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:17:45,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 15 minutes, 16 seconds)
2025-09-14 17:20:39,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:20:47,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4152.94629 ± 724.905
2025-09-14 17:20:47,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4326.5684), np.float32(2686.5383), np.float32(4660.57), np.float32(4544.6294), np.float32(4173.636), np.float32(4384.9834), np.float32(4806.859), np.float32(4555.376), np.float32(4589.6655), np.float32(2800.6409)]
2025-09-14 17:20:47,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:20:47,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (4152.95) for latency 18
2025-09-14 17:20:47,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 11 minutes, 47 seconds)
2025-09-14 17:23:41,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:23:51,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3258.91650 ± 1263.157
2025-09-14 17:23:51,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4851.3086), np.float32(1749.9547), np.float32(1261.0791), np.float32(3917.604), np.float32(4004.7085), np.float32(3132.7842), np.float32(1318.931), np.float32(3780.8516), np.float32(4285.9785), np.float32(4285.9624)]
2025-09-14 17:23:51,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:23:51,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 7 minutes, 45 seconds)
2025-09-14 17:26:46,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:26:56,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4686.43213 ± 760.939
2025-09-14 17:26:56,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5031.055), np.float32(2422.291), np.float32(4730.3965), np.float32(4848.8975), np.float32(4852.78), np.float32(5076.914), np.float32(4956.2627), np.float32(4946.101), np.float32(4994.358), np.float32(5005.2656)]
2025-09-14 17:26:56,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:26:56,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (4686.43) for latency 18
2025-09-14 17:26:56,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 2 minutes, 56 seconds)
2025-09-14 17:29:51,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:30:01,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4611.27002 ± 709.042
2025-09-14 17:30:01,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4862.1455), np.float32(4834.865), np.float32(4604.5845), np.float32(5013.2), np.float32(4954.558), np.float32(4901.687), np.float32(4914.227), np.float32(5019.949), np.float32(2544.9258), np.float32(4462.5557)]
2025-09-14 17:30:01,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:30:01,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 58 minutes, 37 seconds)
2025-09-14 17:32:56,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:33:06,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3435.95190 ± 1150.407
2025-09-14 17:33:06,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2263.66), np.float32(4007.9944), np.float32(4759.7695), np.float32(2891.0125), np.float32(3767.0415), np.float32(4512.992), np.float32(1953.2606), np.float32(4768.705), np.float32(3983.821), np.float32(1451.26)]
2025-09-14 17:33:06,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:33:06,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 55 minutes, 14 seconds)
2025-09-14 17:36:18,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:36:29,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3837.13428 ± 1057.966
2025-09-14 17:36:29,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4586.1733), np.float32(4955.098), np.float32(2642.6804), np.float32(3223.6333), np.float32(3416.079), np.float32(4664.0415), np.float32(5035.9727), np.float32(4759.6816), np.float32(1807.4062), np.float32(3280.577)]
2025-09-14 17:36:29,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:36:29,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 53 minutes, 21 seconds)
2025-09-14 17:39:47,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:39:59,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3861.49951 ± 897.163
2025-09-14 17:39:59,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3870.9614), np.float32(4568.2056), np.float32(4908.226), np.float32(3763.6465), np.float32(2449.8447), np.float32(4148.7344), np.float32(4354.056), np.float32(4250.654), np.float32(1933.0741), np.float32(4367.5923)]
2025-09-14 17:39:59,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:39:59,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 51 minutes, 36 seconds)
2025-09-14 17:43:03,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:43:14,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3184.48145 ± 1573.764
2025-09-14 17:43:14,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4624.06), np.float32(1340.3496), np.float32(1297.0664), np.float32(1255.6165), np.float32(4920.051), np.float32(3297.4893), np.float32(5085.293), np.float32(3884.3835), np.float32(4628.6577), np.float32(1511.8474)]
2025-09-14 17:43:14,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:43:14,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 48 minutes, 51 seconds)
2025-09-14 17:46:10,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:46:20,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4104.11523 ± 1078.177
2025-09-14 17:46:20,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4679.033), np.float32(4898.8955), np.float32(4336.9033), np.float32(4269.656), np.float32(2766.272), np.float32(4852.8125), np.float32(4815.9287), np.float32(1387.5826), np.float32(4476.653), np.float32(4557.413)]
2025-09-14 17:46:20,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:46:20,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 45 minutes, 40 seconds)
2025-09-14 17:49:15,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:49:25,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2074.74976 ± 843.153
2025-09-14 17:49:25,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1369.9901), np.float32(2149.2935), np.float32(1359.8969), np.float32(2309.0496), np.float32(1438.1893), np.float32(1323.5952), np.float32(1240.8754), np.float32(3679.6301), np.float32(3331.17), np.float32(2545.8086)]
2025-09-14 17:49:25,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:49:25,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 42 minutes, 25 seconds)
2025-09-14 17:52:20,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:52:30,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4008.18286 ± 1004.281
2025-09-14 17:52:30,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3806.8071), np.float32(4350.1367), np.float32(4651.054), np.float32(3824.9866), np.float32(4934.181), np.float32(4617.8223), np.float32(2205.5059), np.float32(2077.6606), np.float32(4800.403), np.float32(4813.2686)]
2025-09-14 17:52:30,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:52:30,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 38 minutes, 25 seconds)
2025-09-14 17:55:23,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:55:32,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4132.13721 ± 1046.226
2025-09-14 17:55:32,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3403.568), np.float32(4764.2), np.float32(4997.768), np.float32(2480.6462), np.float32(2045.6013), np.float32(4178.588), np.float32(4738.6987), np.float32(4970.256), np.float32(4972.9727), np.float32(4769.0703)]
2025-09-14 17:55:32,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:55:32,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 34 minutes, 13 seconds)
2025-09-14 17:58:27,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:58:36,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4697.07373 ± 152.552
2025-09-14 17:58:36,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4612.722), np.float32(4798.5312), np.float32(4842.911), np.float32(4514.664), np.float32(4740.1396), np.float32(4638.471), np.float32(4865.243), np.float32(4901.6846), np.float32(4409.258), np.float32(4647.1133)]
2025-09-14 17:58:36,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:58:36,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (4697.07) for latency 18
2025-09-14 17:58:36,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 30 minutes, 45 seconds)
2025-09-14 18:01:38,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:01:49,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3889.70752 ± 1501.101
2025-09-14 18:01:49,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4580.7544), np.float32(4911.2656), np.float32(1253.1976), np.float32(4961.929), np.float32(2348.736), np.float32(4829.265), np.float32(4921.3794), np.float32(1319.9723), np.float32(4926.844), np.float32(4843.732)]
2025-09-14 18:01:49,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:01:49,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 27 minutes, 53 seconds)
2025-09-14 18:05:07,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:05:18,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4988.69727 ± 179.344
2025-09-14 18:05:18,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4493.719), np.float32(5124.5005), np.float32(5148.942), np.float32(4969.7783), np.float32(4961.188), np.float32(5028.8115), np.float32(5055.8594), np.float32(4927.4204), np.float32(5059.589), np.float32(5117.165)]
2025-09-14 18:05:18,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:05:18,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (4988.70) for latency 18
2025-09-14 18:05:18,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 25 minutes, 26 seconds)
2025-09-14 18:08:33,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:08:43,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3269.24585 ± 1618.689
2025-09-14 18:08:43,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4989.1445), np.float32(1597.8381), np.float32(1860.4001), np.float32(4861.4946), np.float32(1398.3246), np.float32(4870.208), np.float32(2184.3962), np.float32(5077.237), np.float32(4536.982), np.float32(1316.4329)]
2025-09-14 18:08:43,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:08:43,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 22 minutes, 43 seconds)
2025-09-14 18:11:44,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:11:53,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3350.86377 ± 1285.126
2025-09-14 18:11:53,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1381.7137), np.float32(3302.419), np.float32(1299.06), np.float32(2714.922), np.float32(3439.7458), np.float32(4804.662), np.float32(4411.057), np.float32(4733.376), np.float32(2569.4998), np.float32(4852.18)]
2025-09-14 18:11:53,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:11:53,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 19 minutes, 37 seconds)
2025-09-14 18:14:49,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:14:59,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4080.84375 ± 1138.724
2025-09-14 18:14:59,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4861.297), np.float32(3677.578), np.float32(4995.9336), np.float32(5154.42), np.float32(2193.3984), np.float32(4893.682), np.float32(4789.8403), np.float32(2940.8743), np.float32(2270.5186), np.float32(5030.894)]
2025-09-14 18:14:59,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:14:59,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 16 minutes, 22 seconds)
2025-09-14 18:17:51,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:18:01,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4351.01367 ± 953.719
2025-09-14 18:18:01,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4411.6504), np.float32(4843.385), np.float32(2838.0237), np.float32(4428.235), np.float32(5141.5107), np.float32(5000.385), np.float32(4820.9673), np.float32(2195.218), np.float32(4952.058), np.float32(4878.703)]
2025-09-14 18:18:01,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:18:01,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 12 minutes, 57 seconds)
2025-09-14 18:20:56,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:21:06,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3880.53564 ± 1525.418
2025-09-14 18:21:06,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1418.1315), np.float32(5007.383), np.float32(4944.126), np.float32(1254.6328), np.float32(4457.9956), np.float32(5067.5273), np.float32(4910.477), np.float32(4988.719), np.float32(2079.4863), np.float32(4676.88)]
2025-09-14 18:21:06,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:21:06,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 9 minutes, 28 seconds)
2025-09-14 18:24:01,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:24:11,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4347.76660 ± 1133.004
2025-09-14 18:24:11,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5182.594), np.float32(1951.4668), np.float32(5104.3584), np.float32(5021.688), np.float32(5194.082), np.float32(5035.5474), np.float32(4422.5073), np.float32(2565.4631), np.float32(3809.5544), np.float32(5190.4033)]
2025-09-14 18:24:11,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:24:11,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 11 seconds)
2025-09-14 18:27:06,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:27:16,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4464.41211 ± 1018.865
2025-09-14 18:27:16,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5080.4683), np.float32(5178.559), np.float32(3113.3806), np.float32(5213.8677), np.float32(2015.7941), np.float32(4809.714), np.float32(4200.24), np.float32(5044.4067), np.float32(4887.5093), np.float32(5100.1826)]
2025-09-14 18:27:16,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:27:16,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 4 seconds)
2025-09-14 18:30:28,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:30:39,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3891.07031 ± 1256.262
2025-09-14 18:30:39,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4939.963), np.float32(4684.0923), np.float32(2245.8113), np.float32(4866.188), np.float32(3439.0203), np.float32(5142.1226), np.float32(5008.9863), np.float32(1587.0004), np.float32(2560.841), np.float32(4436.6777)]
2025-09-14 18:30:39,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:30:39,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1251 [DEBUG]: Training session finished
