2025-09-14 08:43:01,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.025-delay_6
2025-09-14 08:43:01,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.025-delay_6
2025-09-14 08:43:01,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'6': <latency_env.delayed_mdp.ConstantDelay object at 0x7f1952743d70>}
2025-09-14 08:43:01,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 08:43:01,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 08:43:01,766 baseline-bpql-noisepromille25-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=53, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 08:43:01,767 baseline-bpql-noisepromille25-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 08:43:03,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 08:43:03,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 08:46:36,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 08:46:44,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: -403.90295 ± 53.439
2025-09-14 08:46:44,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-473.67053), np.float32(-396.93246), np.float32(-404.5783), np.float32(-380.91757), np.float32(-460.86914), np.float32(-417.03116), np.float32(-309.93027), np.float32(-337.85974), np.float32(-377.2362), np.float32(-480.00424)]
2025-09-14 08:46:44,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:46:44,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (-403.90) for latency 6
2025-09-14 08:46:44,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 6 hours, 4 minutes, 10 seconds)
2025-09-14 08:50:16,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 08:50:24,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: -160.73792 ± 66.934
2025-09-14 08:50:24,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-123.633514), np.float32(-163.98825), np.float32(-243.5775), np.float32(-220.48653), np.float32(-199.8842), np.float32(-145.38612), np.float32(6.730043), np.float32(-145.07216), np.float32(-214.69797), np.float32(-157.38297)]
2025-09-14 08:50:24,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:50:24,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (-160.74) for latency 6
2025-09-14 08:50:24,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 59 minutes, 55 seconds)
2025-09-14 08:53:52,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 08:54:01,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 169.57797 ± 79.206
2025-09-14 08:54:01,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(198.30406), np.float32(298.54248), np.float32(217.85532), np.float32(174.82303), np.float32(205.84856), np.float32(-5.858197), np.float32(209.1725), np.float32(102.24054), np.float32(190.6878), np.float32(104.16366)]
2025-09-14 08:54:01,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:54:01,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (169.58) for latency 6
2025-09-14 08:54:01,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 54 minutes, 19 seconds)
2025-09-14 08:57:37,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 08:57:45,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 679.20740 ± 173.826
2025-09-14 08:57:45,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(825.4668), np.float32(849.3187), np.float32(458.43585), np.float32(452.68173), np.float32(590.79535), np.float32(482.8792), np.float32(996.9756), np.float32(738.2403), np.float32(688.7863), np.float32(708.4949)]
2025-09-14 08:57:45,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:57:45,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (679.21) for latency 6
2025-09-14 08:57:45,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 52 minutes, 50 seconds)
2025-09-14 09:01:16,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:01:25,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1067.23303 ± 456.607
2025-09-14 09:01:25,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1161.4818), np.float32(717.9976), np.float32(1088.3147), np.float32(599.67883), np.float32(1726.1324), np.float32(1437.6377), np.float32(602.5308), np.float32(1481.929), np.float32(321.21722), np.float32(1535.41)]
2025-09-14 09:01:25,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:01:25,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (1067.23) for latency 6
2025-09-14 09:01:25,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 48 minutes, 55 seconds)
2025-09-14 09:04:46,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:04:55,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1142.33911 ± 562.360
2025-09-14 09:04:55,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(754.9361), np.float32(2339.1533), np.float32(553.71594), np.float32(1375.1727), np.float32(454.08832), np.float32(1423.7451), np.float32(1026.6971), np.float32(1779.7034), np.float32(707.63776), np.float32(1008.5421)]
2025-09-14 09:04:55,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:04:55,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (1142.34) for latency 6
2025-09-14 09:04:55,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 41 minutes, 52 seconds)
2025-09-14 09:08:14,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:08:22,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1951.57422 ± 761.909
2025-09-14 09:08:22,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2857.0303), np.float32(818.05286), np.float32(2280.7424), np.float32(2552.2007), np.float32(2128.3853), np.float32(596.5033), np.float32(1851.4994), np.float32(2669.0322), np.float32(2524.9875), np.float32(1237.3069)]
2025-09-14 09:08:22,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:08:22,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (1951.57) for latency 6
2025-09-14 09:08:22,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 34 minutes, 18 seconds)
2025-09-14 09:11:20,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:11:27,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1841.23047 ± 874.689
2025-09-14 09:11:27,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2682.5947), np.float32(1774.206), np.float32(2213.9478), np.float32(596.0781), np.float32(3198.7388), np.float32(1617.9119), np.float32(1477.0354), np.float32(732.41266), np.float32(3034.0684), np.float32(1085.3098)]
2025-09-14 09:11:27,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:11:27,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 21 minutes, 1 second)
2025-09-14 09:14:16,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:14:23,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2103.75830 ± 805.883
2025-09-14 09:14:23,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2366.1099), np.float32(2901.3716), np.float32(678.2233), np.float32(2462.1519), np.float32(1216.6991), np.float32(2687.1284), np.float32(1379.79), np.float32(3128.041), np.float32(1427.7408), np.float32(2790.327)]
2025-09-14 09:14:23,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:14:23,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (2103.76) for latency 6
2025-09-14 09:14:23,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 2 minutes, 37 seconds)
2025-09-14 09:17:11,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:17:18,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2632.16260 ± 854.149
2025-09-14 09:17:18,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1044.6956), np.float32(3225.0125), np.float32(3232.676), np.float32(1059.4735), np.float32(3079.7993), np.float32(3167.8813), np.float32(3309.3823), np.float32(3207.545), np.float32(2125.4788), np.float32(2869.6829)]
2025-09-14 09:17:18,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:17:18,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (2632.16) for latency 6
2025-09-14 09:17:18,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 46 minutes, 5 seconds)
2025-09-14 09:19:56,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:20:02,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2085.26416 ± 730.984
2025-09-14 09:20:02,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1161.0497), np.float32(1862.419), np.float32(3258.889), np.float32(1718.0696), np.float32(2996.9814), np.float32(2093.9265), np.float32(1740.9424), np.float32(3136.0193), np.float32(1594.2107), np.float32(1290.1333)]
2025-09-14 09:20:02,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:20:02,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 29 minutes, 8 seconds)
2025-09-14 09:23:16,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:23:25,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3256.66675 ± 725.524
2025-09-14 09:23:25,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3854.0442), np.float32(3183.7622), np.float32(2890.3755), np.float32(3569.9658), np.float32(3356.1904), np.float32(1244.3281), np.float32(3619.6587), np.float32(3373.1377), np.float32(3705.891), np.float32(3769.3135)]
2025-09-14 09:23:25,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:23:25,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3256.67) for latency 6
2025-09-14 09:23:25,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 24 minutes, 55 seconds)
2025-09-14 09:26:48,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:26:57,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3321.55200 ± 607.565
2025-09-14 09:26:57,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3435.492), np.float32(3550.8984), np.float32(3475.1611), np.float32(3556.5178), np.float32(3682.6426), np.float32(3706.0913), np.float32(3542.6472), np.float32(1545.6067), np.float32(3182.9595), np.float32(3537.5027)]
2025-09-14 09:26:57,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:26:57,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3321.55) for latency 6
2025-09-14 09:26:57,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 29 minutes, 36 seconds)
2025-09-14 09:30:20,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:30:29,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3427.20630 ± 637.048
2025-09-14 09:30:29,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3605.6975), np.float32(3874.142), np.float32(3547.4143), np.float32(3793.9944), np.float32(3349.4775), np.float32(3632.4888), np.float32(3302.095), np.float32(3665.3638), np.float32(1601.3859), np.float32(3900.001)]
2025-09-14 09:30:29,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:30:29,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3427.21) for latency 6
2025-09-14 09:30:29,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 36 minutes, 53 seconds)
2025-09-14 09:33:52,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:34:00,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3731.67456 ± 527.024
2025-09-14 09:34:00,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3867.4966), np.float32(3829.799), np.float32(3936.5847), np.float32(3672.8745), np.float32(4057.2324), np.float32(3965.4915), np.float32(2201.8464), np.float32(4076.9585), np.float32(4018.1335), np.float32(3690.3274)]
2025-09-14 09:34:00,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:34:00,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3731.67) for latency 6
2025-09-14 09:34:00,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 43 minutes, 53 seconds)
2025-09-14 09:37:24,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:37:33,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3838.68945 ± 127.840
2025-09-14 09:37:33,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3751.5046), np.float32(3902.5222), np.float32(3701.2217), np.float32(3612.8506), np.float32(3952.9817), np.float32(3689.9104), np.float32(3902.3667), np.float32(3958.8982), np.float32(3955.5251), np.float32(3959.114)]
2025-09-14 09:37:33,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:37:33,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3838.69) for latency 6
2025-09-14 09:37:33,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 54 minutes, 13 seconds)
2025-09-14 09:40:56,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:41:04,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3972.37451 ± 119.639
2025-09-14 09:41:04,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3926.717), np.float32(4218.6616), np.float32(3985.9172), np.float32(3939.4285), np.float32(4060.1526), np.float32(3986.2056), np.float32(4007.4014), np.float32(3930.9524), np.float32(3711.527), np.float32(3956.7808)]
2025-09-14 09:41:04,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:41:04,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3972.37) for latency 6
2025-09-14 09:41:04,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 52 minutes, 59 seconds)
2025-09-14 09:44:28,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:44:37,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3846.79492 ± 615.226
2025-09-14 09:44:37,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4398.227), np.float32(3912.8928), np.float32(2122.4739), np.float32(4320.942), np.float32(4062.1875), np.float32(4094.3752), np.float32(3911.2593), np.float32(3917.3494), np.float32(3580.3452), np.float32(4147.898)]
2025-09-14 09:44:37,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:44:37,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 49 minutes, 39 seconds)
2025-09-14 09:48:02,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:48:11,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4103.93311 ± 87.278
2025-09-14 09:48:11,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4043.8647), np.float32(4041.37), np.float32(4152.32), np.float32(4275.465), np.float32(3986.6462), np.float32(4162.2656), np.float32(4106.809), np.float32(4076.9165), np.float32(4194.022), np.float32(3999.651)]
2025-09-14 09:48:11,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:48:11,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (4103.93) for latency 6
2025-09-14 09:48:11,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 46 minutes, 46 seconds)
2025-09-14 09:51:35,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:51:44,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4126.46582 ± 152.129
2025-09-14 09:51:44,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4017.4612), np.float32(4166.872), np.float32(4104.2886), np.float32(4256.682), np.float32(4034.6067), np.float32(4316.074), np.float32(4137.7896), np.float32(4243.254), np.float32(4225.248), np.float32(3762.3855)]
2025-09-14 09:51:44,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:51:44,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (4126.47) for latency 6
2025-09-14 09:51:44,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 43 minutes, 31 seconds)
2025-09-14 09:54:59,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:55:06,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4429.96045 ± 162.914
2025-09-14 09:55:06,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4193.654), np.float32(4531.936), np.float32(4344.5176), np.float32(4291.1865), np.float32(4716.4736), np.float32(4429.6816), np.float32(4578.2524), np.float32(4236.5312), np.float32(4378.6494), np.float32(4598.7266)]
2025-09-14 09:55:06,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:55:06,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (4429.96) for latency 6
2025-09-14 09:55:06,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 37 minutes, 25 seconds)
2025-09-14 09:57:58,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:58:05,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4232.06348 ± 135.958
2025-09-14 09:58:05,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3995.1897), np.float32(4176.649), np.float32(4056.1086), np.float32(4445.525), np.float32(4419.181), np.float32(4197.0483), np.float32(4332.8525), np.float32(4253.0825), np.float32(4189.963), np.float32(4255.0317)]
2025-09-14 09:58:05,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:58:05,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 25 minutes, 21 seconds)
2025-09-14 10:00:36,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:00:42,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4055.06128 ± 106.523
2025-09-14 10:00:42,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3950.0137), np.float32(3987.5369), np.float32(3937.7803), np.float32(4050.3271), np.float32(3929.648), np.float32(4083.7043), np.float32(4263.361), np.float32(4067.6743), np.float32(4209.1187), np.float32(4071.451)]
2025-09-14 10:00:42,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:00:42,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 7 minutes, 38 seconds)
2025-09-14 10:03:00,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:03:05,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4528.92236 ± 152.812
2025-09-14 10:03:05,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4421.432), np.float32(4280.2417), np.float32(4613.2495), np.float32(4412.4204), np.float32(4530.9165), np.float32(4792.8145), np.float32(4738.3765), np.float32(4618.7104), np.float32(4407.1216), np.float32(4473.9414)]
2025-09-14 10:03:05,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:03:05,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (4528.92) for latency 6
2025-09-14 10:03:05,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 46 minutes, 34 seconds)
2025-09-14 10:05:14,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:05:19,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3827.30005 ± 740.931
2025-09-14 10:05:19,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4113.248), np.float32(4023.99), np.float32(2089.6633), np.float32(4405.3604), np.float32(4084.15), np.float32(4133.843), np.float32(4218.5693), np.float32(4419.7026), np.float32(2694.644), np.float32(4089.833)]
2025-09-14 10:05:19,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:05:19,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 23 minutes, 55 seconds)
2025-09-14 10:07:27,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:07:32,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3794.25830 ± 865.405
2025-09-14 10:07:32,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4219.6797), np.float32(4459.526), np.float32(4437.6333), np.float32(1969.008), np.float32(4496.6123), np.float32(4241.5366), np.float32(4380.2715), np.float32(3017.071), np.float32(4103.375), np.float32(2617.8708)]
2025-09-14 10:07:32,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:07:32,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 3 minutes, 56 seconds)
2025-09-14 10:09:40,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:09:45,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4412.18506 ± 175.950
2025-09-14 10:09:45,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4361.284), np.float32(4664.197), np.float32(4496.6157), np.float32(4365.5264), np.float32(4002.8198), np.float32(4278.649), np.float32(4552.404), np.float32(4577.692), np.float32(4440.566), np.float32(4382.0967)]
2025-09-14 10:09:45,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:09:45,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 50 minutes, 20 seconds)
2025-09-14 10:11:53,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:11:58,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4418.76270 ± 174.389
2025-09-14 10:11:58,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4326.8413), np.float32(4357.9243), np.float32(4272.7007), np.float32(4314.1235), np.float32(4629.33), np.float32(4617.297), np.float32(4559.786), np.float32(4557.781), np.float32(4494.907), np.float32(4056.9424)]
2025-09-14 10:11:58,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:11:58,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 42 minutes, 21 seconds)
2025-09-14 10:14:06,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:14:11,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4554.79395 ± 118.482
2025-09-14 10:14:11,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4527.577), np.float32(4615.2646), np.float32(4459.159), np.float32(4425.948), np.float32(4425.895), np.float32(4695.9624), np.float32(4803.2334), np.float32(4598.3022), np.float32(4541.2637), np.float32(4455.327)]
2025-09-14 10:14:11,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:14:11,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (4554.79) for latency 6
2025-09-14 10:14:11,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 37 minutes, 34 seconds)
2025-09-14 10:16:19,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:16:24,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4607.94580 ± 158.781
2025-09-14 10:16:24,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4710.3467), np.float32(4522.132), np.float32(4380.031), np.float32(4613.6665), np.float32(4608.4287), np.float32(4881.706), np.float32(4392.591), np.float32(4844.812), np.float32(4542.9414), np.float32(4582.806)]
2025-09-14 10:16:24,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:16:24,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (4607.95) for latency 6
2025-09-14 10:16:24,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 35 minutes, 9 seconds)
2025-09-14 10:18:32,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:18:38,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4705.03613 ± 252.980
2025-09-14 10:18:38,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4706.838), np.float32(4646.0537), np.float32(4948.322), np.float32(4681.6216), np.float32(4641.0107), np.float32(4071.7249), np.float32(4646.474), np.float32(4739.9023), np.float32(4909.8105), np.float32(5058.607)]
2025-09-14 10:18:38,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:18:38,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (4705.04) for latency 6
2025-09-14 10:18:38,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 33 minutes, 5 seconds)
2025-09-14 10:20:46,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:20:51,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4478.98828 ± 650.785
2025-09-14 10:20:51,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4521.102), np.float32(4062.2493), np.float32(4628.8765), np.float32(4152.843), np.float32(5002.5117), np.float32(4994.5996), np.float32(5153.5835), np.float32(4634.3687), np.float32(2809.2026), np.float32(4830.548)]
2025-09-14 10:20:51,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:20:51,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 30 minutes, 58 seconds)
2025-09-14 10:22:59,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:23:05,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4997.37598 ± 450.082
2025-09-14 10:23:05,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4612.3013), np.float32(5332.658), np.float32(5014.6387), np.float32(4798.0566), np.float32(5378.8257), np.float32(5562.118), np.float32(5302.03), np.float32(5012.635), np.float32(5042.692), np.float32(3917.8042)]
2025-09-14 10:23:05,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:23:05,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (4997.38) for latency 6
2025-09-14 10:23:05,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 28 minutes, 52 seconds)
2025-09-14 10:25:12,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:25:18,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4751.39014 ± 756.366
2025-09-14 10:25:18,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5251.749), np.float32(5423.133), np.float32(4381.315), np.float32(4953.9663), np.float32(2795.1616), np.float32(5304.4604), np.float32(5369.7324), np.float32(4996.1323), np.float32(4800.5933), np.float32(4237.657)]
2025-09-14 10:25:18,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:25:18,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 26 minutes, 37 seconds)
2025-09-14 10:27:25,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:27:30,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4608.09131 ± 1198.867
2025-09-14 10:27:30,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5541.5254), np.float32(1824.4668), np.float32(2775.037), np.float32(5145.185), np.float32(5177.943), np.float32(5598.131), np.float32(5289.66), np.float32(4706.1494), np.float32(5011.736), np.float32(5011.0815)]
2025-09-14 10:27:30,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:27:30,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 24 minutes, 19 seconds)
2025-09-14 10:29:38,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:29:43,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4480.69678 ± 522.203
2025-09-14 10:29:43,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4827.2495), np.float32(5076.8154), np.float32(3364.6536), np.float32(4477.7266), np.float32(4806.543), np.float32(4772.444), np.float32(4661.353), np.float32(3613.6519), np.float32(4593.2314), np.float32(4613.302)]
2025-09-14 10:29:43,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:29:43,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 21 minutes, 57 seconds)
2025-09-14 10:31:50,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:31:56,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4808.33447 ± 1347.195
2025-09-14 10:31:56,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1266.5443), np.float32(5349.5947), np.float32(5524.068), np.float32(4683.3257), np.float32(5460.5537), np.float32(5621.0723), np.float32(5590.4614), np.float32(3436.3892), np.float32(5460.0913), np.float32(5691.248)]
2025-09-14 10:31:56,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:31:56,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 19 minutes, 35 seconds)
2025-09-14 10:34:03,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:34:09,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5194.48633 ± 867.989
2025-09-14 10:34:09,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5206.2705), np.float32(5628.7676), np.float32(5734.1606), np.float32(5225.7607), np.float32(5520.1216), np.float32(2633.915), np.float32(5565.0786), np.float32(5448.369), np.float32(5570.7114), np.float32(5411.7085)]
2025-09-14 10:34:09,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:34:09,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5194.49) for latency 6
2025-09-14 10:34:09,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 17 minutes, 12 seconds)
2025-09-14 10:36:16,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:36:21,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5032.71680 ± 239.103
2025-09-14 10:36:21,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5006.6147), np.float32(5436.8477), np.float32(5035.251), np.float32(5115.977), np.float32(4960.2397), np.float32(4589.1255), np.float32(5063.856), np.float32(5080.9033), np.float32(4706.8647), np.float32(5331.487)]
2025-09-14 10:36:21,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:36:21,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 14 minutes, 59 seconds)
2025-09-14 10:38:29,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:38:34,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4574.84277 ± 1201.609
2025-09-14 10:38:34,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4546.828), np.float32(5315.7495), np.float32(5346.552), np.float32(2005.8467), np.float32(5500.763), np.float32(5057.68), np.float32(2558.6086), np.float32(4658.383), np.float32(5034.307), np.float32(5723.7065)]
2025-09-14 10:38:34,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:38:34,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 12 minutes, 44 seconds)
2025-09-14 10:40:41,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:40:47,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5467.68848 ± 751.828
2025-09-14 10:40:47,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5276.313), np.float32(5489.2124), np.float32(5843.047), np.float32(5834.808), np.float32(5745.944), np.float32(5820.514), np.float32(5965.881), np.float32(3302.9624), np.float32(5479.511), np.float32(5918.6924)]
2025-09-14 10:40:47,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:40:47,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5467.69) for latency 6
2025-09-14 10:40:47,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 10 minutes, 32 seconds)
2025-09-14 10:42:54,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:43:00,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5552.79004 ± 222.136
2025-09-14 10:43:00,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5718.66), np.float32(5247.232), np.float32(5799.507), np.float32(5510.45), np.float32(5186.967), np.float32(5665.884), np.float32(5626.371), np.float32(5805.817), np.float32(5688.751), np.float32(5278.263)]
2025-09-14 10:43:00,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:43:00,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5552.79) for latency 6
2025-09-14 10:43:00,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 8 minutes, 21 seconds)
2025-09-14 10:45:07,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:45:13,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5330.94043 ± 124.760
2025-09-14 10:45:13,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5452.4663), np.float32(5368.644), np.float32(5237.9644), np.float32(5433.2583), np.float32(5333.0757), np.float32(5307.5435), np.float32(5585.036), np.float32(5176.262), np.float32(5228.422), np.float32(5186.733)]
2025-09-14 10:45:13,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:45:13,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 6 minutes, 8 seconds)
2025-09-14 10:47:20,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:47:25,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5451.46143 ± 191.960
2025-09-14 10:47:25,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5452.4834), np.float32(5430.716), np.float32(5588.747), np.float32(5159.302), np.float32(5412.6465), np.float32(5555.2534), np.float32(5702.1836), np.float32(5464.3013), np.float32(5077.256), np.float32(5671.723)]
2025-09-14 10:47:25,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:47:25,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 3 minutes, 54 seconds)
2025-09-14 10:49:32,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:49:38,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4965.40674 ± 1005.326
2025-09-14 10:49:38,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5005.3623), np.float32(5294.8667), np.float32(5393.112), np.float32(5403.21), np.float32(5331.637), np.float32(5278.6025), np.float32(1966.8619), np.float32(5345.892), np.float32(5372.9033), np.float32(5261.616)]
2025-09-14 10:49:38,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:49:38,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 1 minute, 40 seconds)
2025-09-14 10:51:45,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:51:51,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5263.95801 ± 853.811
2025-09-14 10:51:51,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5556.4424), np.float32(5882.352), np.float32(5485.5605), np.float32(5777.877), np.float32(5479.888), np.float32(5311.65), np.float32(5617.4385), np.float32(5389.7183), np.float32(5385.5312), np.float32(2753.1204)]
2025-09-14 10:51:51,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:51:51,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 59 minutes, 27 seconds)
2025-09-14 10:53:58,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:54:03,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5200.14746 ± 128.386
2025-09-14 10:54:03,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5371.228), np.float32(5263.1245), np.float32(5220.476), np.float32(5276.3623), np.float32(5198.034), np.float32(5288.7026), np.float32(5046.46), np.float32(5200.1934), np.float32(5240.517), np.float32(4896.375)]
2025-09-14 10:54:03,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:54:03,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 57 minutes, 14 seconds)
2025-09-14 10:56:11,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:56:16,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5367.72217 ± 295.577
2025-09-14 10:56:16,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5787.8433), np.float32(5539.41), np.float32(5653.426), np.float32(5245.965), np.float32(5643.172), np.float32(5054.8423), np.float32(4828.0093), np.float32(5455.551), np.float32(5411.9287), np.float32(5057.0728)]
2025-09-14 10:56:16,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:56:16,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 55 minutes, 3 seconds)
2025-09-14 10:58:24,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:58:29,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4904.34473 ± 1062.831
2025-09-14 10:58:29,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5496.421), np.float32(5665.771), np.float32(5599.7026), np.float32(2191.0073), np.float32(3784.2815), np.float32(5765.3613), np.float32(5504.4116), np.float32(5267.4316), np.float32(4759.0938), np.float32(5009.964)]
2025-09-14 10:58:29,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:58:29,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 52 minutes, 51 seconds)
2025-09-14 11:00:36,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:00:42,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4793.07617 ± 1341.960
2025-09-14 11:00:42,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1783.6782), np.float32(5519.2046), np.float32(4946.7954), np.float32(5679.5635), np.float32(5068.8174), np.float32(2577.928), np.float32(5631.9443), np.float32(5651.5317), np.float32(5324.7666), np.float32(5746.5317)]
2025-09-14 11:00:42,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:00:42,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 50 minutes, 39 seconds)
2025-09-14 11:02:49,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:02:55,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5668.25488 ± 197.318
2025-09-14 11:02:55,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5622.623), np.float32(5213.1616), np.float32(5881.229), np.float32(5523.734), np.float32(5628.424), np.float32(5738.685), np.float32(5971.916), np.float32(5725.6704), np.float32(5762.8135), np.float32(5614.2905)]
2025-09-14 11:02:55,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:02:55,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5668.25) for latency 6
2025-09-14 11:02:55,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 48 minutes, 26 seconds)
2025-09-14 11:05:02,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:05:07,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4912.89307 ± 1535.708
2025-09-14 11:05:07,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2001.3123), np.float32(5782.1885), np.float32(4491.339), np.float32(5681.2944), np.float32(5705.123), np.float32(5564.5186), np.float32(5970.47), np.float32(5963.908), np.float32(1912.762), np.float32(6056.0166)]
2025-09-14 11:05:07,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:05:07,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 46 minutes, 11 seconds)
2025-09-14 11:07:14,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:07:20,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5682.24365 ± 297.719
2025-09-14 11:07:20,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5668.474), np.float32(6038.53), np.float32(5785.28), np.float32(5354.326), np.float32(5746.9565), np.float32(5878.1543), np.float32(4974.4873), np.float32(5989.388), np.float32(5732.2534), np.float32(5654.59)]
2025-09-14 11:07:20,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:07:20,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5682.24) for latency 6
2025-09-14 11:07:20,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 43 minutes, 57 seconds)
2025-09-14 11:09:27,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:09:33,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5180.66016 ± 1095.165
2025-09-14 11:09:33,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5824.5156), np.float32(5608.1753), np.float32(5711.786), np.float32(5466.079), np.float32(2687.4827), np.float32(5767.331), np.float32(5947.825), np.float32(3362.7842), np.float32(5778.3784), np.float32(5652.244)]
2025-09-14 11:09:33,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:09:33,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 41 minutes, 44 seconds)
2025-09-14 11:11:40,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:11:45,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5527.76025 ± 202.030
2025-09-14 11:11:45,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5607.8267), np.float32(5324.899), np.float32(5486.471), np.float32(5505.9585), np.float32(5752.1455), np.float32(5408.4897), np.float32(5167.5063), np.float32(5909.69), np.float32(5466.6313), np.float32(5647.9795)]
2025-09-14 11:11:45,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:11:45,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 39 minutes, 32 seconds)
2025-09-14 11:13:53,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:13:58,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5331.51660 ± 141.952
2025-09-14 11:13:58,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5559.9175), np.float32(5332.7764), np.float32(5301.3027), np.float32(5020.2886), np.float32(5403.382), np.float32(5356.716), np.float32(5356.868), np.float32(5277.9263), np.float32(5496.7344), np.float32(5209.253)]
2025-09-14 11:13:58,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:13:58,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 37 minutes, 21 seconds)
2025-09-14 11:16:06,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:16:11,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5538.68408 ± 625.513
2025-09-14 11:16:11,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5797.812), np.float32(5706.167), np.float32(5880.792), np.float32(5795.8696), np.float32(5577.797), np.float32(5808.3564), np.float32(5694.581), np.float32(3677.0793), np.float32(5749.977), np.float32(5698.4087)]
2025-09-14 11:16:11,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:16:11,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 35 minutes, 9 seconds)
2025-09-14 11:18:18,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:18:24,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5335.50488 ± 1071.451
2025-09-14 11:18:24,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5532.327), np.float32(5331.108), np.float32(5755.991), np.float32(6099.1655), np.float32(5864.255), np.float32(5674.779), np.float32(2203.8562), np.float32(5315.993), np.float32(5973.3267), np.float32(5604.2427)]
2025-09-14 11:18:24,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:18:24,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 32 minutes, 58 seconds)
2025-09-14 11:20:32,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:20:37,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5432.55811 ± 1258.790
2025-09-14 11:20:37,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5808.0874), np.float32(6031.9087), np.float32(5344.647), np.float32(5647.944), np.float32(1709.0862), np.float32(5958.499), np.float32(5772.3994), np.float32(6088.7), np.float32(5972.793), np.float32(5991.518)]
2025-09-14 11:20:37,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:20:37,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 30 minutes, 48 seconds)
2025-09-14 11:22:45,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:22:50,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5605.25488 ± 268.544
2025-09-14 11:22:50,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5810.821), np.float32(5293.038), np.float32(5803.9746), np.float32(5950.326), np.float32(5485.129), np.float32(5857.396), np.float32(5510.599), np.float32(5559.202), np.float32(5738.877), np.float32(5043.1885)]
2025-09-14 11:22:50,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:22:50,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 28 minutes, 39 seconds)
2025-09-14 11:24:58,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:25:03,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4832.34375 ± 1268.231
2025-09-14 11:25:03,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5797.6426), np.float32(5842.1426), np.float32(1948.892), np.float32(5466.456), np.float32(5676.4985), np.float32(5367.923), np.float32(5542.0195), np.float32(5571.427), np.float32(3784.1277), np.float32(3326.3093)]
2025-09-14 11:25:03,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:25:03,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 26 minutes, 26 seconds)
2025-09-14 11:27:11,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:27:16,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5191.25195 ± 1172.162
2025-09-14 11:27:16,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4739.188), np.float32(5795.0425), np.float32(3665.4028), np.float32(5328.147), np.float32(5888.1196), np.float32(6220.2285), np.float32(2446.105), np.float32(5944.003), np.float32(5791.1743), np.float32(6095.109)]
2025-09-14 11:27:16,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:27:16,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 24 minutes, 16 seconds)
2025-09-14 11:29:24,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:29:30,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5523.79102 ± 133.927
2025-09-14 11:29:30,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5282.829), np.float32(5360.95), np.float32(5778.5737), np.float32(5496.4854), np.float32(5608.1353), np.float32(5475.1406), np.float32(5542.711), np.float32(5473.8574), np.float32(5641.711), np.float32(5577.521)]
2025-09-14 11:29:30,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:29:30,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 22 minutes, 6 seconds)
2025-09-14 11:31:37,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:31:43,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5741.37695 ± 155.163
2025-09-14 11:31:43,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5403.8306), np.float32(5761.8647), np.float32(6030.0435), np.float32(5738.248), np.float32(5788.3643), np.float32(5856.631), np.float32(5686.953), np.float32(5594.0815), np.float32(5790.839), np.float32(5762.914)]
2025-09-14 11:31:43,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:31:43,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5741.38) for latency 6
2025-09-14 11:31:43,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 19 minutes, 53 seconds)
2025-09-14 11:33:50,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:33:56,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5741.13770 ± 123.983
2025-09-14 11:33:56,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5764.5225), np.float32(5852.163), np.float32(5828.9414), np.float32(5698.5737), np.float32(5791.3228), np.float32(5946.028), np.float32(5686.1763), np.float32(5766.758), np.float32(5560.6147), np.float32(5516.282)]
2025-09-14 11:33:56,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:33:56,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 17 minutes, 36 seconds)
2025-09-14 11:36:03,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:36:08,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5684.06250 ± 143.554
2025-09-14 11:36:08,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5723.7207), np.float32(5743.9043), np.float32(5459.1963), np.float32(5957.527), np.float32(5652.223), np.float32(5522.643), np.float32(5535.8325), np.float32(5677.614), np.float32(5726.872), np.float32(5841.0938)]
2025-09-14 11:36:08,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:36:08,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 15 minutes, 22 seconds)
2025-09-14 11:38:16,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:38:21,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5781.96533 ± 288.743
2025-09-14 11:38:21,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5841.022), np.float32(5065.0684), np.float32(5909.2925), np.float32(5682.661), np.float32(5832.278), np.float32(6018.225), np.float32(5821.326), np.float32(5956.1763), np.float32(6156.289), np.float32(5537.316)]
2025-09-14 11:38:21,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:38:21,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5781.97) for latency 6
2025-09-14 11:38:21,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 13 minutes, 6 seconds)
2025-09-14 11:40:28,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:40:34,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5288.24756 ± 1294.596
2025-09-14 11:40:34,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5395.1025), np.float32(5441.2505), np.float32(1447.4003), np.float32(5786.267), np.float32(5716.0835), np.float32(5613.625), np.float32(6032.98), np.float32(5873.551), np.float32(5927.7246), np.float32(5648.493)]
2025-09-14 11:40:34,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:40:34,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 10 minutes, 49 seconds)
2025-09-14 11:42:41,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:42:46,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5658.78320 ± 135.767
2025-09-14 11:42:46,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5767.5566), np.float32(5848.6987), np.float32(5498.946), np.float32(5572.0225), np.float32(5601.3115), np.float32(5459.66), np.float32(5630.647), np.float32(5900.814), np.float32(5631.8926), np.float32(5676.2837)]
2025-09-14 11:42:46,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:42:46,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 8 minutes, 34 seconds)
2025-09-14 11:44:54,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:44:59,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5732.83496 ± 145.435
2025-09-14 11:44:59,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5740.153), np.float32(5584.786), np.float32(5799.878), np.float32(5503.7085), np.float32(5912.9224), np.float32(5952.2065), np.float32(5857.58), np.float32(5717.8843), np.float32(5715.93), np.float32(5543.3047)]
2025-09-14 11:44:59,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:44:59,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 6 minutes, 21 seconds)
2025-09-14 11:47:06,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:47:12,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5977.79883 ± 148.196
2025-09-14 11:47:12,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5911.7197), np.float32(6013.2183), np.float32(5941.8613), np.float32(6075.7627), np.float32(5653.3564), np.float32(6191.1763), np.float32(5928.0186), np.float32(6059.642), np.float32(5856.688), np.float32(6146.5425)]
2025-09-14 11:47:12,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:47:12,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5977.80) for latency 6
2025-09-14 11:47:12,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 4 minutes, 7 seconds)
2025-09-14 11:49:20,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:49:25,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5692.04395 ± 193.754
2025-09-14 11:49:25,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5920.0913), np.float32(5837.293), np.float32(5730.179), np.float32(5914.733), np.float32(5582.2275), np.float32(5731.8413), np.float32(5645.9287), np.float32(5618.4507), np.float32(5726.1), np.float32(5213.5957)]
2025-09-14 11:49:25,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:49:25,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 1 minute, 58 seconds)
2025-09-14 11:51:33,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:51:38,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5759.94336 ± 109.079
2025-09-14 11:51:38,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5911.669), np.float32(5696.718), np.float32(5817.9854), np.float32(5820.2227), np.float32(5674.913), np.float32(5833.7173), np.float32(5805.373), np.float32(5689.0415), np.float32(5515.6304), np.float32(5834.1646)]
2025-09-14 11:51:38,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:51:38,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 59 minutes, 46 seconds)
2025-09-14 11:53:45,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:53:51,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5412.43262 ± 1291.435
2025-09-14 11:53:51,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5851.701), np.float32(5933.901), np.float32(1585.8641), np.float32(5345.402), np.float32(5581.189), np.float32(5944.6895), np.float32(5985.8203), np.float32(5958.956), np.float32(6033.9834), np.float32(5902.8164)]
2025-09-14 11:53:51,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:53:51,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 57 minutes, 35 seconds)
2025-09-14 11:55:58,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:56:04,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5586.94824 ± 648.782
2025-09-14 11:56:04,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5689.397), np.float32(5865.1724), np.float32(5926.5215), np.float32(5890.7095), np.float32(5748.3477), np.float32(6048.316), np.float32(3723.2417), np.float32(6009.698), np.float32(5456.769), np.float32(5511.3096)]
2025-09-14 11:56:04,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:56:04,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 55 minutes, 22 seconds)
2025-09-14 11:58:11,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:58:16,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5811.11279 ± 158.435
2025-09-14 11:58:16,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5929.6313), np.float32(5716.5547), np.float32(5563.4023), np.float32(5594.698), np.float32(5887.8823), np.float32(5716.4336), np.float32(5893.432), np.float32(6117.881), np.float32(5816.3516), np.float32(5874.864)]
2025-09-14 11:58:16,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:58:16,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 53 minutes, 9 seconds)
2025-09-14 12:00:23,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:00:29,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5896.67139 ± 137.977
2025-09-14 12:00:29,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6019.8594), np.float32(6188.9106), np.float32(5871.0073), np.float32(5823.4697), np.float32(5977.0396), np.float32(6004.901), np.float32(5752.0327), np.float32(5796.098), np.float32(5746.86), np.float32(5786.5386)]
2025-09-14 12:00:29,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:00:29,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 50 minutes, 53 seconds)
2025-09-14 12:02:36,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:02:42,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5753.66699 ± 177.006
2025-09-14 12:02:42,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5883.596), np.float32(5322.79), np.float32(5859.607), np.float32(5883.396), np.float32(6019.2847), np.float32(5737.6104), np.float32(5754.885), np.float32(5680.228), np.float32(5708.5625), np.float32(5686.7075)]
2025-09-14 12:02:42,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:02:42,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 48 minutes, 39 seconds)
2025-09-14 12:04:49,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:04:54,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5702.71680 ± 195.112
2025-09-14 12:04:54,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5839.9155), np.float32(5553.2417), np.float32(5796.859), np.float32(5860.9814), np.float32(5308.3384), np.float32(5944.226), np.float32(5576.923), np.float32(5751.7656), np.float32(5505.8066), np.float32(5889.1147)]
2025-09-14 12:04:54,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:04:54,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 46 minutes, 25 seconds)
2025-09-14 12:07:02,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:07:07,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5700.93652 ± 203.733
2025-09-14 12:07:07,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5756.6875), np.float32(5728.8745), np.float32(5603.9717), np.float32(6010.4717), np.float32(5607.409), np.float32(6049.482), np.float32(5356.1143), np.float32(5485.96), np.float32(5777.941), np.float32(5632.4585)]
2025-09-14 12:07:07,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:07:07,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 44 minutes, 14 seconds)
2025-09-14 12:09:14,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:09:20,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4992.70264 ± 855.583
2025-09-14 12:09:20,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3398.8523), np.float32(5514.796), np.float32(5326.552), np.float32(5228.083), np.float32(5754.116), np.float32(5875.546), np.float32(5319.834), np.float32(3991.1956), np.float32(3820.4033), np.float32(5697.65)]
2025-09-14 12:09:20,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:09:20,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 42 minutes, 2 seconds)
2025-09-14 12:11:27,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:11:33,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5681.82471 ± 360.166
2025-09-14 12:11:33,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4700.7954), np.float32(5460.905), np.float32(5894.03), np.float32(5883.415), np.float32(5797.6895), np.float32(5743.5825), np.float32(6089.122), np.float32(5797.1646), np.float32(5727.038), np.float32(5724.504)]
2025-09-14 12:11:33,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:11:33,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 39 minutes, 49 seconds)
2025-09-14 12:13:40,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:13:46,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5788.17676 ± 182.970
2025-09-14 12:13:46,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5776.9927), np.float32(5991.9854), np.float32(5427.382), np.float32(5808.277), np.float32(5852.478), np.float32(5752.6826), np.float32(5701.5146), np.float32(5589.087), np.float32(5872.406), np.float32(6108.965)]
2025-09-14 12:13:46,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:13:46,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 37 minutes, 37 seconds)
2025-09-14 12:15:53,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:15:58,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5955.49072 ± 168.738
2025-09-14 12:15:58,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5622.4697), np.float32(6019.239), np.float32(5662.7505), np.float32(5932.3574), np.float32(6050.3604), np.float32(6153.8486), np.float32(5941.9863), np.float32(6006.875), np.float32(6075.8506), np.float32(6089.169)]
2025-09-14 12:15:58,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:15:58,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 35 minutes, 25 seconds)
2025-09-14 12:18:06,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:18:11,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5246.57178 ± 995.711
2025-09-14 12:18:11,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2288.0286), np.float32(5877.5063), np.float32(5342.9014), np.float32(5455.519), np.float32(5532.7656), np.float32(5538.388), np.float32(5513.3125), np.float32(5710.2344), np.float32(5569.0493), np.float32(5638.0117)]
2025-09-14 12:18:11,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:18:11,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 33 minutes, 12 seconds)
2025-09-14 12:20:19,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:20:24,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5955.39209 ± 184.146
2025-09-14 12:20:24,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5746.8677), np.float32(5977.3315), np.float32(6016.1577), np.float32(5580.126), np.float32(6112.189), np.float32(6269.9097), np.float32(5895.3574), np.float32(6039.66), np.float32(5867.638), np.float32(6048.6846)]
2025-09-14 12:20:24,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:20:24,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 31 minutes)
2025-09-14 12:22:32,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:22:37,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5830.56494 ± 114.526
2025-09-14 12:22:37,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5866.7495), np.float32(5674.1084), np.float32(5893.5845), np.float32(5865.908), np.float32(5708.637), np.float32(5982.1714), np.float32(5910.556), np.float32(5664.935), np.float32(5982.2803), np.float32(5756.7197)]
2025-09-14 12:22:37,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:22:37,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 28 minutes, 47 seconds)
2025-09-14 12:24:45,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:24:51,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5851.54688 ± 155.104
2025-09-14 12:24:51,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5569.621), np.float32(6121.886), np.float32(5835.7817), np.float32(5805.4795), np.float32(6016.613), np.float32(5752.3506), np.float32(6013.224), np.float32(5720.325), np.float32(5889.7324), np.float32(5790.4526)]
2025-09-14 12:24:51,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:24:51,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 26 minutes, 36 seconds)
2025-09-14 12:26:59,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:27:04,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5914.21924 ± 127.911
2025-09-14 12:27:04,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5851.992), np.float32(5830.842), np.float32(6139.584), np.float32(5820.4937), np.float32(5982.134), np.float32(6121.6455), np.float32(5706.367), np.float32(5882.5435), np.float32(5879.517), np.float32(5927.0728)]
2025-09-14 12:27:04,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:27:04,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 24 minutes, 24 seconds)
2025-09-14 12:29:12,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:29:17,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5954.52100 ± 125.084
2025-09-14 12:29:17,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6134.086), np.float32(5995.7153), np.float32(5974.9604), np.float32(6110.213), np.float32(5837.539), np.float32(6039.8984), np.float32(5852.879), np.float32(5711.803), np.float32(6004.996), np.float32(5883.1216)]
2025-09-14 12:29:17,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:29:17,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 22 minutes, 12 seconds)
2025-09-14 12:31:25,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:31:31,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5714.42871 ± 209.724
2025-09-14 12:31:31,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5622.1484), np.float32(5589.7837), np.float32(5951.217), np.float32(5415.804), np.float32(5713.347), np.float32(5462.8154), np.float32(5822.0977), np.float32(6075.347), np.float32(5923.9146), np.float32(5567.811)]
2025-09-14 12:31:31,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:31:31,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 19 minutes, 59 seconds)
2025-09-14 12:33:39,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:33:44,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5981.20020 ± 137.598
2025-09-14 12:33:44,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5768.8022), np.float32(6029.0713), np.float32(6012.4736), np.float32(5701.9854), np.float32(5924.4287), np.float32(6083.1523), np.float32(6117.892), np.float32(6071.382), np.float32(6131.078), np.float32(5971.7363)]
2025-09-14 12:33:44,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:33:44,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5981.20) for latency 6
2025-09-14 12:33:44,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 17 minutes, 47 seconds)
2025-09-14 12:35:52,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:35:57,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5363.75439 ± 90.829
2025-09-14 12:35:57,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5268.6436), np.float32(5369.9756), np.float32(5385.8516), np.float32(5241.5166), np.float32(5560.96), np.float32(5446.498), np.float32(5255.088), np.float32(5369.577), np.float32(5382.6533), np.float32(5356.7817)]
2025-09-14 12:35:57,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:35:57,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 33 seconds)
2025-09-14 12:38:05,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:38:10,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5813.04590 ± 103.364
2025-09-14 12:38:10,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5681.116), np.float32(5963.8403), np.float32(5737.277), np.float32(5972.699), np.float32(5881.532), np.float32(5822.125), np.float32(5890.9565), np.float32(5755.125), np.float32(5737.4707), np.float32(5688.32)]
2025-09-14 12:38:10,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:38:10,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 19 seconds)
2025-09-14 12:40:18,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:40:23,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5813.48193 ± 156.282
2025-09-14 12:40:23,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5933.0273), np.float32(5572.5015), np.float32(5684.0327), np.float32(5628.265), np.float32(5834.027), np.float32(5840.3813), np.float32(6033.3853), np.float32(5879.089), np.float32(6040.6816), np.float32(5689.429)]
2025-09-14 12:40:23,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:40:23,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 5 seconds)
2025-09-14 12:42:31,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:42:36,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5607.12354 ± 172.300
2025-09-14 12:42:36,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5365.501), np.float32(5734.9517), np.float32(5650.89), np.float32(5805.069), np.float32(5564.544), np.float32(5354.1646), np.float32(5389.4873), np.float32(5729.649), np.float32(5834.3003), np.float32(5642.6753)]
2025-09-14 12:42:36,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:42:36,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 52 seconds)
2025-09-14 12:44:44,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:44:49,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5376.65088 ± 1209.037
2025-09-14 12:44:49,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5884.597), np.float32(6070.4316), np.float32(5517.984), np.float32(1825.1039), np.float32(5778.566), np.float32(5379.097), np.float32(6112.1177), np.float32(5875.582), np.float32(5917.659), np.float32(5405.3667)]
2025-09-14 12:44:49,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:44:49,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 38 seconds)
2025-09-14 12:46:56,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:47:02,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5647.84131 ± 89.170
2025-09-14 12:47:02,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5580.508), np.float32(5651.2583), np.float32(5763.228), np.float32(5649.089), np.float32(5554.371), np.float32(5491.4336), np.float32(5748.504), np.float32(5777.031), np.float32(5654.896), np.float32(5608.096)]
2025-09-14 12:47:02,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:47:02,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 25 seconds)
2025-09-14 12:49:09,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:49:15,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5864.36523 ± 156.375
2025-09-14 12:49:15,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5749.9873), np.float32(5865.5234), np.float32(6003.631), np.float32(5769.352), np.float32(6120.129), np.float32(5995.963), np.float32(5991.8745), np.float32(5739.519), np.float32(5839.686), np.float32(5567.9897)]
2025-09-14 12:49:15,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:49:15,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 12 seconds)
2025-09-14 12:51:22,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:51:28,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5473.36328 ± 191.225
2025-09-14 12:51:28,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5603.4087), np.float32(5730.424), np.float32(5617.9146), np.float32(5708.273), np.float32(5430.457), np.float32(5412.1133), np.float32(5555.5728), np.float32(5221.033), np.float32(5158.652), np.float32(5295.785)]
2025-09-14 12:51:28,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:51:28,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1251 [DEBUG]: Training session finished
