2025-09-14 08:43:01,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.050-delay_9
2025-09-14 08:43:01,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.050-delay_9
2025-09-14 08:43:01,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'9': <latency_env.delayed_mdp.ConstantDelay object at 0x7f8f162479b0>}
2025-09-14 08:43:01,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 08:43:01,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 08:43:01,628 baseline-bpql-noisepromille50-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=71, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 08:43:01,629 baseline-bpql-noisepromille50-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 08:43:03,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 08:43:03,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 08:45:34,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 08:45:40,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: -479.14365 ± 32.791
2025-09-14 08:45:40,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-420.63522), np.float32(-467.94733), np.float32(-445.21515), np.float32(-502.1062), np.float32(-517.9325), np.float32(-530.683), np.float32(-483.3995), np.float32(-468.71622), np.float32(-502.2199), np.float32(-452.5815)]
2025-09-14 08:45:40,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:45:40,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (-479.14) for latency 9
2025-09-14 08:45:40,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 18 minutes, 53 seconds)
2025-09-14 08:48:12,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 08:48:19,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: -261.69568 ± 43.646
2025-09-14 08:48:19,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-224.26382), np.float32(-275.76877), np.float32(-236.2015), np.float32(-235.59409), np.float32(-267.35498), np.float32(-311.53113), np.float32(-219.33192), np.float32(-280.47208), np.float32(-210.65767), np.float32(-355.78073)]
2025-09-14 08:48:19,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:48:19,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (-261.70) for latency 9
2025-09-14 08:48:19,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 17 minutes, 48 seconds)
2025-09-14 08:51:00,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 08:51:06,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: -5.64922 ± 90.077
2025-09-14 08:51:06,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-118.00896), np.float32(-176.92911), np.float32(154.64307), np.float32(-12.529487), np.float32(36.84601), np.float32(-33.266567), np.float32(57.63261), np.float32(24.509623), np.float32(58.754414), np.float32(-48.143845)]
2025-09-14 08:51:06,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:51:06,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (-5.65) for latency 9
2025-09-14 08:51:06,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 20 minutes, 30 seconds)
2025-09-14 08:53:47,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 08:53:54,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 287.63708 ± 122.761
2025-09-14 08:53:54,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(525.2621), np.float32(108.71253), np.float32(351.1971), np.float32(287.63025), np.float32(360.22528), np.float32(217.91048), np.float32(303.76486), np.float32(239.54285), np.float32(97.63953), np.float32(384.48566)]
2025-09-14 08:53:54,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:53:54,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (287.64) for latency 9
2025-09-14 08:53:54,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 20 minutes, 12 seconds)
2025-09-14 08:56:41,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 08:56:49,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 447.47113 ± 320.777
2025-09-14 08:56:49,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(398.6462), np.float32(579.0819), np.float32(102.3741), np.float32(116.258026), np.float32(572.333), np.float32(107.939514), np.float32(383.98886), np.float32(707.5474), np.float32(1195.374), np.float32(311.16837)]
2025-09-14 08:56:49,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:56:49,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (447.47) for latency 9
2025-09-14 08:56:49,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 21 minutes, 29 seconds)
2025-09-14 09:00:01,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:00:10,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 505.38071 ± 389.320
2025-09-14 09:00:10,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(760.09686), np.float32(472.8017), np.float32(1475.246), np.float32(263.79785), np.float32(43.94868), np.float32(559.6602), np.float32(693.08844), np.float32(189.92096), np.float32(380.73898), np.float32(214.5079)]
2025-09-14 09:00:10,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:00:10,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (505.38) for latency 9
2025-09-14 09:00:10,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 32 minutes, 28 seconds)
2025-09-14 09:03:25,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:03:34,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 626.85333 ± 223.678
2025-09-14 09:03:34,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(321.30698), np.float32(952.1563), np.float32(606.3244), np.float32(740.2463), np.float32(894.9503), np.float32(394.4201), np.float32(335.87595), np.float32(889.2909), np.float32(591.9326), np.float32(542.02924)]
2025-09-14 09:03:34,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:03:34,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (626.85) for latency 9
2025-09-14 09:03:34,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 43 minutes, 40 seconds)
2025-09-14 09:06:44,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:06:52,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1206.93286 ± 368.493
2025-09-14 09:06:52,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1581.9742), np.float32(1134.8086), np.float32(1418.517), np.float32(1229.9852), np.float32(2043.6796), np.float32(841.4473), np.float32(752.96075), np.float32(1012.7807), np.float32(1136.4788), np.float32(916.69684)]
2025-09-14 09:06:52,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:06:52,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1206.93) for latency 9
2025-09-14 09:06:52,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 50 minutes, 7 seconds)
2025-09-14 09:10:00,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:10:08,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1304.12866 ± 383.646
2025-09-14 09:10:08,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(915.6433), np.float32(1786.0828), np.float32(915.69684), np.float32(1139.3615), np.float32(1128.8665), np.float32(1005.7087), np.float32(1075.9777), np.float32(1285.925), np.float32(1748.182), np.float32(2039.8429)]
2025-09-14 09:10:08,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:10:08,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1304.13) for latency 9
2025-09-14 09:10:08,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 55 minutes, 42 seconds)
2025-09-14 09:13:17,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:13:25,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1432.13965 ± 256.594
2025-09-14 09:13:25,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1390.2941), np.float32(1683.6051), np.float32(1071.6117), np.float32(1238.4795), np.float32(1253.3954), np.float32(1692.9658), np.float32(1813.1362), np.float32(1269.0739), np.float32(1177.6378), np.float32(1731.1969)]
2025-09-14 09:13:25,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:13:25,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1432.14) for latency 9
2025-09-14 09:13:25,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 58 minutes, 54 seconds)
2025-09-14 09:16:33,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:16:42,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1435.97375 ± 422.612
2025-09-14 09:16:42,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2109.9746), np.float32(711.32184), np.float32(2067.3167), np.float32(1361.6195), np.float32(1310.203), np.float32(1060.3143), np.float32(1709.4946), np.float32(1651.0122), np.float32(1152.2307), np.float32(1226.2504)]
2025-09-14 09:16:42,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:16:42,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1435.97) for latency 9
2025-09-14 09:16:42,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 54 minutes, 18 seconds)
2025-09-14 09:19:50,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:19:59,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1792.81702 ± 474.669
2025-09-14 09:19:59,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1405.7823), np.float32(2463.2837), np.float32(1284.9414), np.float32(2063.315), np.float32(2311.7014), np.float32(2535.5642), np.float32(1644.5103), np.float32(1271.8544), np.float32(1425.751), np.float32(1521.4664)]
2025-09-14 09:19:59,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:19:59,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1792.82) for latency 9
2025-09-14 09:19:59,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 48 minutes, 54 seconds)
2025-09-14 09:23:19,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:23:28,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1456.31689 ± 355.450
2025-09-14 09:23:28,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1306.9868), np.float32(1417.1481), np.float32(1186.4161), np.float32(2432.1575), np.float32(1331.016), np.float32(1540.675), np.float32(1424.0448), np.float32(1298.1965), np.float32(1562.975), np.float32(1063.5538)]
2025-09-14 09:23:28,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:23:28,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 48 minutes, 49 seconds)
2025-09-14 09:26:49,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:26:58,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2064.50269 ± 549.857
2025-09-14 09:26:58,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2644.2852), np.float32(2418.799), np.float32(1779.465), np.float32(1293.1754), np.float32(2958.1724), np.float32(2562.8682), np.float32(1608.4554), np.float32(1616.029), np.float32(1456.0098), np.float32(2307.7673)]
2025-09-14 09:26:58,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:26:58,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (2064.50) for latency 9
2025-09-14 09:26:58,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 49 minutes, 25 seconds)
2025-09-14 09:30:19,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:30:28,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1455.78247 ± 392.685
2025-09-14 09:30:28,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(596.5753), np.float32(1330.4688), np.float32(1146.6415), np.float32(1605.2168), np.float32(1841.641), np.float32(1575.0635), np.float32(1497.0565), np.float32(1614.6161), np.float32(2113.3225), np.float32(1237.223)]
2025-09-14 09:30:28,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:30:28,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 49 minutes, 48 seconds)
2025-09-14 09:33:39,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:33:46,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1934.26892 ± 517.890
2025-09-14 09:33:46,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2431.0059), np.float32(1735.8469), np.float32(1897.0029), np.float32(1338.8418), np.float32(2249.2563), np.float32(1613.5334), np.float32(1894.0443), np.float32(3117.6975), np.float32(1770.8269), np.float32(1294.634)]
2025-09-14 09:33:46,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:33:46,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 46 minutes, 54 seconds)
2025-09-14 09:36:34,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:36:41,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1995.00549 ± 573.102
2025-09-14 09:36:41,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2543.1462), np.float32(2565.2551), np.float32(1355.4143), np.float32(1975.2052), np.float32(2871.7693), np.float32(1417.3013), np.float32(2289.102), np.float32(1266.8306), np.float32(2321.7231), np.float32(1344.3065)]
2025-09-14 09:36:41,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:36:41,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 37 minutes, 13 seconds)
2025-09-14 09:39:13,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:39:19,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2622.02295 ± 655.193
2025-09-14 09:39:19,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1994.0198), np.float32(3040.265), np.float32(2671.2747), np.float32(2948.6667), np.float32(1623.2799), np.float32(3111.9836), np.float32(2935.7446), np.float32(1401.6138), np.float32(3181.2795), np.float32(3312.104)]
2025-09-14 09:39:19,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:39:19,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (2622.02) for latency 9
2025-09-14 09:39:19,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 19 minutes, 57 seconds)
2025-09-14 09:41:51,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:41:58,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1969.15137 ± 681.550
2025-09-14 09:41:58,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1951.3549), np.float32(1170.2253), np.float32(3121.3809), np.float32(3058.8894), np.float32(2084.7444), np.float32(2026.364), np.float32(1125.7218), np.float32(1373.6428), np.float32(2334.5913), np.float32(1444.5977)]
2025-09-14 09:41:58,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:41:58,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 3 minutes)
2025-09-14 09:44:47,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:44:56,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1978.75806 ± 381.661
2025-09-14 09:44:56,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2860.744), np.float32(1685.9519), np.float32(2020.3871), np.float32(1851.3395), np.float32(2225.5632), np.float32(1917.5929), np.float32(2176.3604), np.float32(1457.3961), np.float32(2053.824), np.float32(1538.422)]
2025-09-14 09:44:56,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:44:56,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 51 minutes, 27 seconds)
2025-09-14 09:48:19,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:48:28,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2164.21167 ± 709.312
2025-09-14 09:48:28,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2808.916), np.float32(1616.74), np.float32(2326.265), np.float32(1279.564), np.float32(1809.6473), np.float32(3291.2087), np.float32(1419.2297), np.float32(1387.6989), np.float32(2785.5806), np.float32(2917.2678)]
2025-09-14 09:48:28,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:48:28,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 52 minutes, 16 seconds)
2025-09-14 09:51:53,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:52:02,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2091.64453 ± 692.358
2025-09-14 09:52:02,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2414.9087), np.float32(3454.9219), np.float32(1438.2079), np.float32(1742.078), np.float32(3265.854), np.float32(1489.7239), np.float32(1945.7717), np.float32(1467.6853), np.float32(1924.8445), np.float32(1772.4493)]
2025-09-14 09:52:02,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:52:02,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 59 minutes, 36 seconds)
2025-09-14 09:55:26,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:55:36,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2247.60034 ± 650.793
2025-09-14 09:55:36,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1796.1969), np.float32(1116.5004), np.float32(2711.8975), np.float32(2878.9417), np.float32(3152.895), np.float32(2509.447), np.float32(1354.5447), np.float32(2844.3381), np.float32(1949.5052), np.float32(2161.7368)]
2025-09-14 09:55:36,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:55:36,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 10 minutes, 34 seconds)
2025-09-14 09:58:59,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:59:09,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2229.83545 ± 694.858
2025-09-14 09:59:09,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1286.4756), np.float32(1850.3376), np.float32(3272.0183), np.float32(3039.2231), np.float32(1827.7002), np.float32(3233.6604), np.float32(2424.35), np.float32(1850.7584), np.float32(1399.942), np.float32(2113.8892)]
2025-09-14 09:59:09,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:59:09,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 21 minutes, 10 seconds)
2025-09-14 10:02:33,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:02:43,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2260.00000 ± 637.320
2025-09-14 10:02:43,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3135.246), np.float32(2493.5447), np.float32(1741.4192), np.float32(2750.2668), np.float32(1405.7468), np.float32(1900.886), np.float32(3113.513), np.float32(2409.623), np.float32(2434.6465), np.float32(1215.1073)]
2025-09-14 10:02:43,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:02:43,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 26 minutes, 41 seconds)
2025-09-14 10:06:06,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:06:16,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2391.43115 ± 599.482
2025-09-14 10:06:16,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2210.224), np.float32(2735.4053), np.float32(3179.9229), np.float32(1539.6089), np.float32(1428.7173), np.float32(2001.9349), np.float32(2065.9968), np.float32(3004.894), np.float32(3059.7837), np.float32(2687.826)]
2025-09-14 10:06:16,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:06:16,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 23 minutes, 18 seconds)
2025-09-14 10:09:40,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:09:50,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2803.29004 ± 360.034
2025-09-14 10:09:50,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2422.8818), np.float32(2753.5613), np.float32(1877.3146), np.float32(3017.3333), np.float32(3064.6584), np.float32(2938.708), np.float32(2935.7317), np.float32(2928.7559), np.float32(3106.6099), np.float32(2987.345)]
2025-09-14 10:09:50,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:09:50,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (2803.29) for latency 9
2025-09-14 10:09:50,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 19 minutes, 42 seconds)
2025-09-14 10:13:14,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:13:23,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1796.59448 ± 510.934
2025-09-14 10:13:23,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1662.948), np.float32(2889.7292), np.float32(1450.3529), np.float32(2088.011), np.float32(1367.8107), np.float32(1557.6321), np.float32(2397.248), np.float32(1237.9131), np.float32(2003.6837), np.float32(1310.6173)]
2025-09-14 10:13:23,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:13:23,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 16 minutes, 10 seconds)
2025-09-14 10:16:47,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:16:57,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2681.22266 ± 762.318
2025-09-14 10:16:57,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3488.038), np.float32(2446.813), np.float32(2253.7205), np.float32(3468.2527), np.float32(2738.2402), np.float32(1652.1586), np.float32(1251.4807), np.float32(3300.1958), np.float32(3581.9023), np.float32(2631.4246)]
2025-09-14 10:16:57,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:16:57,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 12 minutes, 45 seconds)
2025-09-14 10:20:21,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:20:31,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1992.28772 ± 717.266
2025-09-14 10:20:31,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3040.785), np.float32(1471.2234), np.float32(1769.2247), np.float32(1410.1499), np.float32(2013.0432), np.float32(1510.5846), np.float32(2290.786), np.float32(1688.3708), np.float32(1203.6638), np.float32(3525.0452)]
2025-09-14 10:20:31,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:20:31,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 9 minutes, 13 seconds)
2025-09-14 10:23:55,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:24:05,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2361.30078 ± 743.252
2025-09-14 10:24:05,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1289.273), np.float32(2210.0369), np.float32(3419.9739), np.float32(1996.2385), np.float32(3214.164), np.float32(2799.5247), np.float32(2200.936), np.float32(1874.5735), np.float32(1327.0356), np.float32(3281.2522)]
2025-09-14 10:24:05,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:24:05,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 5 minutes, 49 seconds)
2025-09-14 10:27:28,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:27:37,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2581.56152 ± 717.116
2025-09-14 10:27:37,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3277.8242), np.float32(1449.9141), np.float32(1474.065), np.float32(3370.3804), np.float32(2526.378), np.float32(3056.8374), np.float32(1871.9794), np.float32(2967.2485), np.float32(3374.5933), np.float32(2446.3953)]
2025-09-14 10:27:37,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:27:37,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 1 minute, 56 seconds)
2025-09-14 10:31:00,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:31:09,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2972.03174 ± 925.421
2025-09-14 10:31:09,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3827.91), np.float32(3642.888), np.float32(2736.6643), np.float32(3437.6187), np.float32(3230.402), np.float32(3231.5063), np.float32(3505.2048), np.float32(1114.8357), np.float32(1315.5908), np.float32(3677.6953)]
2025-09-14 10:31:09,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:31:09,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (2972.03) for latency 9
2025-09-14 10:31:09,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 58 minutes, 6 seconds)
2025-09-14 10:34:32,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:34:42,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3112.94800 ± 624.196
2025-09-14 10:34:42,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2925.3733), np.float32(1671.7705), np.float32(3415.5154), np.float32(2361.1074), np.float32(3723.0613), np.float32(2989.7222), np.float32(3663.685), np.float32(3649.8052), np.float32(3468.3672), np.float32(3261.0752)]
2025-09-14 10:34:42,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:34:42,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (3112.95) for latency 9
2025-09-14 10:34:42,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 54 minutes, 16 seconds)
2025-09-14 10:38:05,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:38:15,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2804.59155 ± 632.618
2025-09-14 10:38:15,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1853.4409), np.float32(1931.7362), np.float32(3358.6729), np.float32(3200.1414), np.float32(2322.3833), np.float32(2360.8267), np.float32(3492.0889), np.float32(3369.8901), np.float32(2578.1428), np.float32(3578.592)]
2025-09-14 10:38:15,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:38:15,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 50 minutes, 31 seconds)
2025-09-14 10:41:37,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:41:47,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2852.08765 ± 727.162
2025-09-14 10:41:47,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3682.202), np.float32(3125.361), np.float32(1546.4698), np.float32(3012.9534), np.float32(3533.7222), np.float32(3114.233), np.float32(1599.492), np.float32(3536.6755), np.float32(2973.3677), np.float32(2396.4004)]
2025-09-14 10:41:47,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:41:47,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 46 minutes, 40 seconds)
2025-09-14 10:45:10,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:45:20,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2398.61914 ± 709.514
2025-09-14 10:45:20,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2171.2869), np.float32(2175.985), np.float32(2216.8542), np.float32(3377.0151), np.float32(1174.8097), np.float32(2394.5945), np.float32(1627.146), np.float32(2171.0293), np.float32(3215.789), np.float32(3461.6821)]
2025-09-14 10:45:20,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:45:20,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 43 minutes, 14 seconds)
2025-09-14 10:48:41,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:48:51,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2607.93213 ± 870.962
2025-09-14 10:48:51,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3460.7622), np.float32(1429.9772), np.float32(3077.4573), np.float32(1479.9045), np.float32(3628.1384), np.float32(1772.6403), np.float32(2309.0364), np.float32(3443.1768), np.float32(3581.1426), np.float32(1897.0831)]
2025-09-14 10:48:51,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:48:51,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 39 minutes, 23 seconds)
2025-09-14 10:52:12,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:52:21,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2872.01489 ± 862.183
2025-09-14 10:52:21,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1697.2185), np.float32(3414.0908), np.float32(3388.581), np.float32(3804.45), np.float32(2232.125), np.float32(3011.342), np.float32(2397.999), np.float32(3850.4663), np.float32(3618.674), np.float32(1305.2012)]
2025-09-14 10:52:21,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:52:21,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 35 minutes, 25 seconds)
2025-09-14 10:55:39,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:55:48,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2951.25195 ± 663.786
2025-09-14 10:55:48,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2789.6042), np.float32(3288.869), np.float32(3076.9265), np.float32(3599.401), np.float32(3454.8552), np.float32(2930.9602), np.float32(3420.3857), np.float32(2479.2476), np.float32(1208.9005), np.float32(3263.3682)]
2025-09-14 10:55:48,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:55:48,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 30 minutes, 37 seconds)
2025-09-14 10:59:05,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:59:15,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2929.46558 ± 853.830
2025-09-14 10:59:15,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3844.4917), np.float32(3557.4421), np.float32(3835.734), np.float32(2614.4856), np.float32(3480.231), np.float32(1929.4618), np.float32(3498.8982), np.float32(1572.8588), np.float32(1684.5941), np.float32(3276.4614)]
2025-09-14 10:59:15,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:59:15,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 25 minutes, 59 seconds)
2025-09-14 11:02:19,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:02:28,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3504.28076 ± 212.749
2025-09-14 11:02:28,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3268.4226), np.float32(3676.5186), np.float32(3693.4233), np.float32(3493.1194), np.float32(3399.4402), np.float32(3784.7175), np.float32(3542.0596), np.float32(3026.0576), np.float32(3592.6533), np.float32(3566.396)]
2025-09-14 11:02:28,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:02:28,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (3504.28) for latency 9
2025-09-14 11:02:28,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 18 minutes, 44 seconds)
2025-09-14 11:05:31,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:05:40,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2971.96387 ± 672.248
2025-09-14 11:05:40,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3952.9707), np.float32(3812.5115), np.float32(3105.0269), np.float32(3276.3792), np.float32(2151.0046), np.float32(2367.5444), np.float32(3584.02), np.float32(3085.9878), np.float32(2411.8835), np.float32(1972.311)]
2025-09-14 11:05:40,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:05:40,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 11 minutes, 38 seconds)
2025-09-14 11:08:32,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:08:40,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2159.77734 ± 986.166
2025-09-14 11:08:40,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3121.4722), np.float32(3276.6646), np.float32(1643.4879), np.float32(4024.3103), np.float32(1665.1702), np.float32(2596.2212), np.float32(1557.8905), np.float32(748.45154), np.float32(1256.1667), np.float32(1707.9402)]
2025-09-14 11:08:40,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:08:40,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 2 minutes, 39 seconds)
2025-09-14 11:11:15,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:11:22,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3239.74756 ± 525.615
2025-09-14 11:11:22,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3379.7808), np.float32(3730.3953), np.float32(3644.0796), np.float32(2974.3804), np.float32(3182.4285), np.float32(2390.954), np.float32(2209.5676), np.float32(3546.8035), np.float32(3768.8594), np.float32(3570.2239)]
2025-09-14 11:11:22,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:11:22,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 51 minutes, 13 seconds)
2025-09-14 11:13:51,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:13:58,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3366.41357 ± 927.491
2025-09-14 11:13:58,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2229.7), np.float32(3137.8765), np.float32(4076.2964), np.float32(3572.5972), np.float32(4455.842), np.float32(4415.246), np.float32(3524.703), np.float32(4025.7153), np.float32(1471.4911), np.float32(2754.6682)]
2025-09-14 11:13:58,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:13:58,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 39 minutes, 1 second)
2025-09-14 11:16:27,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:16:34,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3071.28760 ± 737.509
2025-09-14 11:16:34,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2456.146), np.float32(3722.7148), np.float32(3589.6458), np.float32(3755.919), np.float32(2223.0935), np.float32(3048.54), np.float32(2876.164), np.float32(3477.3074), np.float32(3966.662), np.float32(1596.6843)]
2025-09-14 11:16:34,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:16:34,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 29 minutes, 29 seconds)
2025-09-14 11:19:04,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:19:11,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3730.69849 ± 313.009
2025-09-14 11:19:11,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4004.9326), np.float32(3495.0867), np.float32(3993.1162), np.float32(3897.6143), np.float32(3447.8357), np.float32(3877.5513), np.float32(4096.5166), np.float32(3835.0881), np.float32(3026.0715), np.float32(3633.1702)]
2025-09-14 11:19:11,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:19:11,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (3730.70) for latency 9
2025-09-14 11:19:11,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 20 minutes, 36 seconds)
2025-09-14 11:21:40,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:21:47,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3425.44263 ± 998.282
2025-09-14 11:21:47,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4603.0205), np.float32(3194.7185), np.float32(2866.153), np.float32(2082.6885), np.float32(3574.3374), np.float32(2547.7183), np.float32(1931.4467), np.float32(4372.9727), np.float32(4538.7173), np.float32(4542.653)]
2025-09-14 11:21:47,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:21:47,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 13 minutes, 44 seconds)
2025-09-14 11:24:16,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:24:23,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3119.98779 ± 1004.644
2025-09-14 11:24:23,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4114.1797), np.float32(2414.0645), np.float32(2108.3901), np.float32(3284.953), np.float32(1866.4451), np.float32(1476.9205), np.float32(3958.1895), np.float32(3855.5515), np.float32(3692.034), np.float32(4429.149)]
2025-09-14 11:24:23,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:24:23,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 10 minutes, 9 seconds)
2025-09-14 11:26:52,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:26:59,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3285.72803 ± 938.703
2025-09-14 11:26:59,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3028.3955), np.float32(2214.398), np.float32(4167.502), np.float32(3744.0823), np.float32(3752.029), np.float32(2267.088), np.float32(4415.369), np.float32(1842.0905), np.float32(2810.5532), np.float32(4615.7734)]
2025-09-14 11:26:59,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:26:59,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 7 minutes, 31 seconds)
2025-09-14 11:29:28,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:29:35,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2555.40771 ± 1021.190
2025-09-14 11:29:35,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1858.321), np.float32(2913.5452), np.float32(1447.6566), np.float32(4140.3716), np.float32(4308.5376), np.float32(1607.7612), np.float32(3135.0237), np.float32(1257.6195), np.float32(2612.5159), np.float32(2272.7268)]
2025-09-14 11:29:35,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:29:35,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 4 minutes, 52 seconds)
2025-09-14 11:32:04,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:32:11,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3644.40503 ± 814.858
2025-09-14 11:32:11,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4034.7544), np.float32(2017.4004), np.float32(3981.8826), np.float32(3999.3635), np.float32(3876.8022), np.float32(2044.1946), np.float32(3989.5862), np.float32(4052.484), np.float32(4103.8013), np.float32(4343.782)]
2025-09-14 11:32:11,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:32:11,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 2 minutes, 11 seconds)
2025-09-14 11:34:40,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:34:47,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3852.33447 ± 930.633
2025-09-14 11:34:47,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4410.929), np.float32(3837.398), np.float32(4644.804), np.float32(4310.0303), np.float32(4858.605), np.float32(4751.6323), np.float32(3296.8118), np.float32(2439.3252), np.float32(2015.6719), np.float32(3958.138)]
2025-09-14 11:34:47,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:34:47,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (3852.33) for latency 9
2025-09-14 11:34:47,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 59 minutes, 35 seconds)
2025-09-14 11:37:16,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:37:23,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3129.66113 ± 859.681
2025-09-14 11:37:23,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1848.5465), np.float32(2152.1409), np.float32(3822.5295), np.float32(3660.7114), np.float32(3159.3848), np.float32(4222.997), np.float32(1667.3505), np.float32(3758.698), np.float32(3611.3557), np.float32(3392.8972)]
2025-09-14 11:37:23,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:37:23,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 57 minutes, 1 second)
2025-09-14 11:39:52,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:39:59,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3445.00464 ± 997.479
2025-09-14 11:39:59,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2605.257), np.float32(2433.0474), np.float32(4626.0405), np.float32(4488.6514), np.float32(4224.219), np.float32(2328.334), np.float32(2077.6968), np.float32(2979.4763), np.float32(4644.343), np.float32(4042.9802)]
2025-09-14 11:39:59,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:39:59,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 54 minutes, 24 seconds)
2025-09-14 11:42:28,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:42:35,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3412.39722 ± 941.608
2025-09-14 11:42:35,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3641.3535), np.float32(4756.5903), np.float32(3963.527), np.float32(3290.6008), np.float32(1533.3605), np.float32(2580.7412), np.float32(2397.1453), np.float32(4081.045), np.float32(3475.14), np.float32(4404.4697)]
2025-09-14 11:42:35,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:42:35,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 51 minutes, 51 seconds)
2025-09-14 11:45:04,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:45:11,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3353.63916 ± 1247.101
2025-09-14 11:45:11,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4581.129), np.float32(3620.5723), np.float32(4820.6274), np.float32(2053.1965), np.float32(4662.1216), np.float32(2123.9736), np.float32(1873.7313), np.float32(1635.8666), np.float32(3489.4573), np.float32(4675.716)]
2025-09-14 11:45:11,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:45:11,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 49 minutes, 13 seconds)
2025-09-14 11:47:40,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:47:47,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4011.36475 ± 931.341
2025-09-14 11:47:47,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4386.7974), np.float32(4504.203), np.float32(2447.5493), np.float32(4359.828), np.float32(4548.9844), np.float32(4537.467), np.float32(1901.6583), np.float32(4373.6206), np.float32(4675.055), np.float32(4378.486)]
2025-09-14 11:47:47,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:47:47,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (4011.36) for latency 9
2025-09-14 11:47:47,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 46 minutes, 36 seconds)
2025-09-14 11:50:16,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:50:23,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4185.69189 ± 1167.181
2025-09-14 11:50:23,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4817.757), np.float32(4841.9155), np.float32(1895.1859), np.float32(4662.868), np.float32(4117.001), np.float32(5091.542), np.float32(1930.2513), np.float32(4859.588), np.float32(4549.553), np.float32(5091.2524)]
2025-09-14 11:50:23,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:50:23,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (4185.69) for latency 9
2025-09-14 11:50:23,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 44 minutes)
2025-09-14 11:52:52,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:52:59,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3953.50342 ± 1174.681
2025-09-14 11:52:59,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4605.2734), np.float32(4167.6943), np.float32(4485.96), np.float32(1488.0172), np.float32(2213.6658), np.float32(4873.539), np.float32(4807.119), np.float32(4783.009), np.float32(4965.0347), np.float32(3145.722)]
2025-09-14 11:52:59,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:52:59,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 41 minutes, 21 seconds)
2025-09-14 11:55:28,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:55:35,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3858.59814 ± 1110.201
2025-09-14 11:55:35,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3335.3914), np.float32(5001.5454), np.float32(1795.8499), np.float32(3479.904), np.float32(3017.3313), np.float32(4723.86), np.float32(4484.8687), np.float32(5378.422), np.float32(4715.8745), np.float32(2652.9326)]
2025-09-14 11:55:35,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:55:35,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 38 minutes, 45 seconds)
2025-09-14 11:58:04,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:58:11,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3674.48389 ± 1317.062
2025-09-14 11:58:11,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4147.1533), np.float32(4829.7017), np.float32(1510.1316), np.float32(3619.8005), np.float32(2075.487), np.float32(4929.8076), np.float32(5019.6763), np.float32(1806.6198), np.float32(4954.558), np.float32(3851.901)]
2025-09-14 11:58:11,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:58:11,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 36 minutes, 9 seconds)
2025-09-14 12:00:40,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:00:47,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4302.96240 ± 436.974
2025-09-14 12:00:47,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4328.534), np.float32(3169.974), np.float32(4177.431), np.float32(4259.2944), np.float32(4858.054), np.float32(4445.8135), np.float32(4800.51), np.float32(4191.4927), np.float32(4384.7656), np.float32(4413.7563)]
2025-09-14 12:00:47,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:00:47,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (4302.96) for latency 9
2025-09-14 12:00:47,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 33 minutes, 37 seconds)
2025-09-14 12:03:16,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:03:23,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4656.94922 ± 381.164
2025-09-14 12:03:23,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4398.105), np.float32(5115.7656), np.float32(4974.456), np.float32(4710.8813), np.float32(4365.9297), np.float32(3943.5261), np.float32(5154.49), np.float32(4843.899), np.float32(4817.447), np.float32(4244.9907)]
2025-09-14 12:03:23,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:03:23,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (4656.95) for latency 9
2025-09-14 12:03:23,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 31 minutes, 1 second)
2025-09-14 12:05:52,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:05:59,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4650.81592 ± 419.157
2025-09-14 12:05:59,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4712.172), np.float32(5030.895), np.float32(5020.576), np.float32(4821.177), np.float32(4996.9053), np.float32(4339.5337), np.float32(4688.93), np.float32(4853.368), np.float32(3578.6003), np.float32(4466.0)]
2025-09-14 12:05:59,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:05:59,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 28 minutes, 26 seconds)
2025-09-14 12:08:28,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:08:35,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4070.83911 ± 813.681
2025-09-14 12:08:35,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3596.5435), np.float32(4311.81), np.float32(4505.4565), np.float32(4697.5874), np.float32(4553.391), np.float32(2022.2754), np.float32(3318.358), np.float32(4651.4565), np.float32(4457.461), np.float32(4594.0513)]
2025-09-14 12:08:35,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:08:35,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 25 minutes, 49 seconds)
2025-09-14 12:11:04,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:11:11,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4723.52979 ± 348.188
2025-09-14 12:11:11,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5051.445), np.float32(4589.534), np.float32(5126.9814), np.float32(4748.299), np.float32(5164.851), np.float32(4933.4077), np.float32(4808.4243), np.float32(4360.4224), np.float32(4067.1067), np.float32(4384.826)]
2025-09-14 12:11:11,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:11:11,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (4723.53) for latency 9
2025-09-14 12:11:11,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 23 minutes, 15 seconds)
2025-09-14 12:13:40,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:13:47,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3877.20776 ± 1270.243
2025-09-14 12:13:47,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5046.613), np.float32(4790.9077), np.float32(2142.2695), np.float32(2146.1604), np.float32(4766.555), np.float32(4659.4873), np.float32(4963.968), np.float32(1780.5594), np.float32(3651.9624), np.float32(4823.5996)]
2025-09-14 12:13:47,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:13:47,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 20 minutes, 34 seconds)
2025-09-14 12:16:16,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:16:23,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4786.04980 ± 347.022
2025-09-14 12:16:23,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4024.0396), np.float32(4779.9736), np.float32(5089.54), np.float32(4900.9204), np.float32(4576.464), np.float32(5025.013), np.float32(5138.7446), np.float32(5210.833), np.float32(4494.6562), np.float32(4620.314)]
2025-09-14 12:16:23,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:16:23,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (4786.05) for latency 9
2025-09-14 12:16:23,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 17 minutes, 56 seconds)
2025-09-14 12:18:52,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:18:59,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4321.45752 ± 542.630
2025-09-14 12:18:59,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4645.2397), np.float32(2931.668), np.float32(4830.687), np.float32(4291.717), np.float32(4154.6353), np.float32(3968.3599), np.float32(4426.1177), np.float32(4953.3936), np.float32(4597.8315), np.float32(4414.922)]
2025-09-14 12:18:59,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:18:59,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 15 minutes, 21 seconds)
2025-09-14 12:21:28,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:21:35,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4218.56250 ± 948.597
2025-09-14 12:21:35,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4991.984), np.float32(1983.2778), np.float32(4537.294), np.float32(4823.7534), np.float32(4700.2144), np.float32(4700.5522), np.float32(2814.75), np.float32(4258.4805), np.float32(4515.684), np.float32(4859.635)]
2025-09-14 12:21:35,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:21:35,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 12 minutes, 49 seconds)
2025-09-14 12:24:04,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:24:11,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4062.46362 ± 641.893
2025-09-14 12:24:11,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4306.883), np.float32(4304.8506), np.float32(4115.491), np.float32(2230.9517), np.float32(4251.831), np.float32(4180.71), np.float32(4482.393), np.float32(3795.3784), np.float32(4505.915), np.float32(4450.2334)]
2025-09-14 12:24:11,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:24:11,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 10 minutes, 12 seconds)
2025-09-14 12:26:40,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:26:47,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3644.34619 ± 957.390
2025-09-14 12:26:47,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3892.5806), np.float32(4061.756), np.float32(2924.381), np.float32(4309.3105), np.float32(4490.36), np.float32(1609.0902), np.float32(2438.6396), np.float32(4281.8794), np.float32(3672.8196), np.float32(4762.643)]
2025-09-14 12:26:47,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:26:47,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 7 minutes, 36 seconds)
2025-09-14 12:29:16,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:29:23,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4063.20312 ± 1254.664
2025-09-14 12:29:23,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4911.4404), np.float32(5031.8438), np.float32(2657.2214), np.float32(4722.312), np.float32(4603.074), np.float32(1992.6766), np.float32(2023.3593), np.float32(4144.7534), np.float32(5249.983), np.float32(5295.368)]
2025-09-14 12:29:23,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:29:23,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 5 minutes)
2025-09-14 12:31:52,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:31:59,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4566.97119 ± 1157.886
2025-09-14 12:31:59,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4177.298), np.float32(5020.8594), np.float32(1203.1765), np.float32(4928.3496), np.float32(5127.7256), np.float32(4741.6343), np.float32(4937.285), np.float32(5169.106), np.float32(5213.089), np.float32(5151.1865)]
2025-09-14 12:31:59,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:31:59,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 2 minutes, 24 seconds)
2025-09-14 12:34:28,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:34:34,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4588.83691 ± 749.651
2025-09-14 12:34:34,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5187.9336), np.float32(4148.7217), np.float32(2618.9219), np.float32(4912.0557), np.float32(5092.084), np.float32(5082.116), np.float32(4681.5034), np.float32(5022.606), np.float32(5011.761), np.float32(4130.6675)]
2025-09-14 12:34:34,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:34:34,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 59 minutes, 45 seconds)
2025-09-14 12:37:03,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:37:10,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4286.33789 ± 658.954
2025-09-14 12:37:10,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4490.8657), np.float32(4869.412), np.float32(4628.508), np.float32(3819.3225), np.float32(2713.163), np.float32(4357.6055), np.float32(4620.7256), np.float32(3733.8635), np.float32(4523.973), np.float32(5105.94)]
2025-09-14 12:37:10,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:37:10,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 57 minutes, 7 seconds)
2025-09-14 12:39:40,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:39:46,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4199.66797 ± 1315.008
2025-09-14 12:39:46,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1944.0098), np.float32(4888.808), np.float32(4465.439), np.float32(5105.8745), np.float32(1307.5804), np.float32(4548.064), np.float32(4582.7773), np.float32(5152.603), np.float32(5083.7954), np.float32(4917.7285)]
2025-09-14 12:39:46,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:39:46,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 54 minutes, 33 seconds)
2025-09-14 12:42:16,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:42:23,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4315.66016 ± 775.902
2025-09-14 12:42:23,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4691.6543), np.float32(2167.6055), np.float32(4393.6914), np.float32(4801.552), np.float32(4835.1997), np.float32(4625.3164), np.float32(4186.883), np.float32(4737.2764), np.float32(4837.6104), np.float32(3879.8105)]
2025-09-14 12:42:23,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:42:23,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 51 minutes, 59 seconds)
2025-09-14 12:44:52,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:44:59,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4245.74902 ± 1111.486
2025-09-14 12:44:59,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3553.894), np.float32(5222.4854), np.float32(4966.7275), np.float32(5031.373), np.float32(2754.4158), np.float32(5073.4536), np.float32(2080.0125), np.float32(5139.806), np.float32(5146.836), np.float32(3488.4875)]
2025-09-14 12:44:59,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:44:59,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 49 minutes, 24 seconds)
2025-09-14 12:47:28,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:47:35,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2976.72729 ± 1431.284
2025-09-14 12:47:35,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4763.5005), np.float32(1878.3625), np.float32(2050.7925), np.float32(2245.7642), np.float32(1854.4993), np.float32(2817.7002), np.float32(5228.1826), np.float32(2457.962), np.float32(1242.945), np.float32(5227.564)]
2025-09-14 12:47:35,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:47:35,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 46 minutes, 49 seconds)
2025-09-14 12:50:04,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:50:10,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3880.36182 ± 1328.255
2025-09-14 12:50:10,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1379.6522), np.float32(1909.7062), np.float32(2412.4446), np.float32(4664.0464), np.float32(5014.69), np.float32(4903.9946), np.float32(4915.526), np.float32(4447.2974), np.float32(4711.1016), np.float32(4445.1553)]
2025-09-14 12:50:10,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:50:10,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 44 minutes, 13 seconds)
2025-09-14 12:52:39,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:52:46,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4402.59814 ± 737.417
2025-09-14 12:52:46,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4005.4727), np.float32(3310.8376), np.float32(4658.9565), np.float32(5086.38), np.float32(4832.8755), np.float32(4215.6675), np.float32(2903.1262), np.float32(5000.18), np.float32(5022.5137), np.float32(4989.974)]
2025-09-14 12:52:46,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:52:46,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 41 minutes, 35 seconds)
2025-09-14 12:55:16,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:55:23,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4582.77002 ± 240.194
2025-09-14 12:55:23,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4645.5527), np.float32(4417.36), np.float32(4538.914), np.float32(4538.157), np.float32(4635.92), np.float32(4551.483), np.float32(4948.636), np.float32(4771.89), np.float32(4773.223), np.float32(4006.5618)]
2025-09-14 12:55:23,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:55:23,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 39 minutes)
2025-09-14 12:57:52,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:57:59,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4946.32764 ± 151.331
2025-09-14 12:57:59,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4736.0327), np.float32(5190.778), np.float32(5196.0625), np.float32(4909.229), np.float32(4895.2866), np.float32(4911.2446), np.float32(5054.6763), np.float32(4818.1255), np.float32(4777.5317), np.float32(4974.313)]
2025-09-14 12:57:59,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:57:59,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (4946.33) for latency 9
2025-09-14 12:57:59,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 36 minutes, 24 seconds)
2025-09-14 13:00:28,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:00:35,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4921.99854 ± 196.366
2025-09-14 13:00:35,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4947.0723), np.float32(4454.5146), np.float32(5031.88), np.float32(5084.1147), np.float32(4938.6323), np.float32(4971.623), np.float32(5093.96), np.float32(5103.3687), np.float32(4923.543), np.float32(4671.2754)]
2025-09-14 13:00:35,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:00:35,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 33 minutes, 48 seconds)
2025-09-14 13:03:04,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:03:11,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4568.52637 ± 958.515
2025-09-14 13:03:11,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4878.339), np.float32(5249.262), np.float32(4628.072), np.float32(4247.9795), np.float32(4982.968), np.float32(4493.4004), np.float32(5150.603), np.float32(4983.96), np.float32(5223.252), np.float32(1847.4305)]
2025-09-14 13:03:11,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:03:11,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 31 minutes, 12 seconds)
2025-09-14 13:05:40,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:05:47,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4252.59473 ± 1227.799
2025-09-14 13:05:47,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5145.4087), np.float32(5351.3364), np.float32(3088.3975), np.float32(4714.267), np.float32(4911.02), np.float32(5323.489), np.float32(4717.378), np.float32(2047.002), np.float32(2195.1963), np.float32(5032.451)]
2025-09-14 13:05:47,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:05:47,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 28 minutes, 37 seconds)
2025-09-14 13:08:16,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:08:23,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4807.54492 ± 955.850
2025-09-14 13:08:23,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4991.9575), np.float32(5227.57), np.float32(5262.4536), np.float32(5194.855), np.float32(5075.3315), np.float32(5209.6123), np.float32(1964.8739), np.float32(4885.6514), np.float32(5270.9956), np.float32(4992.149)]
2025-09-14 13:08:23,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:08:23,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 26 minutes)
2025-09-14 13:10:52,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:10:59,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4710.23877 ± 1065.094
2025-09-14 13:10:59,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5035.8193), np.float32(4983.0273), np.float32(4940.8594), np.float32(1533.4752), np.float32(5279.2373), np.float32(4951.2256), np.float32(5226.4336), np.float32(4935.9365), np.float32(5146.104), np.float32(5070.264)]
2025-09-14 13:10:59,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:10:59,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 23 minutes, 24 seconds)
2025-09-14 13:13:29,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:13:36,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4259.03076 ± 1011.255
2025-09-14 13:13:36,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4373.147), np.float32(4640.42), np.float32(4355.9224), np.float32(4415.0586), np.float32(1332.8044), np.float32(5089.701), np.float32(4145.1655), np.float32(4730.3105), np.float32(4585.8213), np.float32(4921.957)]
2025-09-14 13:13:36,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:13:36,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 20 minutes, 49 seconds)
2025-09-14 13:16:05,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:16:12,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4551.54395 ± 237.240
2025-09-14 13:16:12,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4640.6187), np.float32(4723.7944), np.float32(4510.2256), np.float32(4907.0635), np.float32(4367.5186), np.float32(4815.761), np.float32(4274.8325), np.float32(4090.208), np.float32(4610.818), np.float32(4574.601)]
2025-09-14 13:16:12,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:16:12,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 18 minutes, 13 seconds)
2025-09-14 13:18:41,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:18:48,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 5010.51074 ± 211.436
2025-09-14 13:18:48,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4993.136), np.float32(4815.7246), np.float32(5410.4775), np.float32(5028.261), np.float32(4830.3247), np.float32(5167.857), np.float32(4841.1807), np.float32(4977.151), np.float32(4736.4966), np.float32(5304.495)]
2025-09-14 13:18:48,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:18:48,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (5010.51) for latency 9
2025-09-14 13:18:48,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 15 minutes, 37 seconds)
2025-09-14 13:21:17,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:21:24,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4711.34668 ± 255.094
2025-09-14 13:21:24,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4633.4565), np.float32(4912.737), np.float32(4476.7236), np.float32(5078.8613), np.float32(4807.9443), np.float32(4245.2603), np.float32(4754.6416), np.float32(4539.3413), np.float32(5083.973), np.float32(4580.528)]
2025-09-14 13:21:24,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:21:24,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 13 minutes, 1 second)
2025-09-14 13:23:54,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:24:00,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4689.38525 ± 848.261
2025-09-14 13:24:00,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4703.6567), np.float32(5130.7236), np.float32(5132.2466), np.float32(5084.203), np.float32(5288.54), np.float32(4629.0605), np.float32(5045.574), np.float32(5382.2363), np.float32(4122.9893), np.float32(2374.621)]
2025-09-14 13:24:00,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:24:00,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 10 minutes, 24 seconds)
2025-09-14 13:26:29,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:26:36,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4788.24658 ± 590.639
2025-09-14 13:26:36,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4849.0996), np.float32(5121.721), np.float32(5185.6714), np.float32(4387.4927), np.float32(5211.2725), np.float32(5035.3525), np.float32(3166.5947), np.float32(4808.3115), np.float32(4919.0596), np.float32(5197.89)]
2025-09-14 13:26:36,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:26:36,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes, 48 seconds)
2025-09-14 13:29:05,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:29:12,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4542.70215 ± 974.213
2025-09-14 13:29:12,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4708.9097), np.float32(4449.29), np.float32(3294.294), np.float32(5139.3965), np.float32(5025.3286), np.float32(5019.676), np.float32(5267.451), np.float32(2150.942), np.float32(5242.693), np.float32(5129.045)]
2025-09-14 13:29:12,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:29:12,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 12 seconds)
2025-09-14 13:31:42,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:31:49,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3644.63989 ± 1216.877
2025-09-14 13:31:49,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1858.4536), np.float32(5245.6406), np.float32(4542.88), np.float32(3013.3523), np.float32(5056.283), np.float32(4244.756), np.float32(2971.5632), np.float32(1473.4526), np.float32(3866.556), np.float32(4173.4614)]
2025-09-14 13:31:49,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:31:49,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 36 seconds)
2025-09-14 13:34:18,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:34:24,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4578.41309 ± 803.107
2025-09-14 13:34:24,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4631.1113), np.float32(4952.9136), np.float32(4111.515), np.float32(4885.149), np.float32(5114.5996), np.float32(4941.5825), np.float32(5054.2705), np.float32(4939.2764), np.float32(2309.4917), np.float32(4844.2183)]
2025-09-14 13:34:24,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:34:24,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1251 [DEBUG]: Training session finished
