2025-09-14 08:43:01,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.150-delay_9
2025-09-14 08:43:01,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.150-delay_9
2025-09-14 08:43:01,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'9': <latency_env.delayed_mdp.ConstantDelay object at 0x7fd006097ce0>}
2025-09-14 08:43:01,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 08:43:01,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 08:43:01,628 baseline-bpql-noisepromille150-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=71, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 08:43:01,628 baseline-bpql-noisepromille150-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 08:43:03,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 08:43:03,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 08:45:34,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 08:45:40,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: -300.85486 ± 34.830
2025-09-14 08:45:40,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-314.74374), np.float32(-241.3036), np.float32(-338.61777), np.float32(-298.48624), np.float32(-361.4241), np.float32(-327.2838), np.float32(-283.14163), np.float32(-308.33188), np.float32(-265.07986), np.float32(-270.13596)]
2025-09-14 08:45:40,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:45:40,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (-300.85) for latency 9
2025-09-14 08:45:40,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 19 minutes, 41 seconds)
2025-09-14 08:48:13,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 08:48:20,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: -249.16211 ± 48.139
2025-09-14 08:48:20,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-244.77919), np.float32(-149.8357), np.float32(-258.70743), np.float32(-189.74336), np.float32(-265.77472), np.float32(-302.5775), np.float32(-216.49211), np.float32(-313.1587), np.float32(-277.64752), np.float32(-272.9048)]
2025-09-14 08:48:20,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:48:20,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (-249.16) for latency 9
2025-09-14 08:48:20,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 18 minutes, 31 seconds)
2025-09-14 08:51:01,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 08:51:07,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: -86.72356 ± 78.104
2025-09-14 08:51:07,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-52.84816), np.float32(-118.08363), np.float32(-143.95338), np.float32(-81.82017), np.float32(-109.370636), np.float32(-139.92645), np.float32(120.40433), np.float32(-87.04384), np.float32(-182.222), np.float32(-72.37158)]
2025-09-14 08:51:07,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:51:07,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (-86.72) for latency 9
2025-09-14 08:51:07,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 20 minutes, 49 seconds)
2025-09-14 08:53:47,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 08:53:54,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 40.43241 ± 97.652
2025-09-14 08:53:54,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-125.48876), np.float32(17.131424), np.float32(185.85553), np.float32(-12.61692), np.float32(136.16032), np.float32(81.13321), np.float32(149.20226), np.float32(35.505318), np.float32(40.704308), np.float32(-103.26261)]
2025-09-14 08:53:54,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:53:54,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (40.43) for latency 9
2025-09-14 08:53:54,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 20 minutes, 19 seconds)
2025-09-14 08:56:41,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 08:56:49,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 294.76251 ± 104.558
2025-09-14 08:56:49,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(292.6426), np.float32(232.84802), np.float32(332.4852), np.float32(369.657), np.float32(15.693935), np.float32(297.7641), np.float32(346.59836), np.float32(411.1426), np.float32(360.2927), np.float32(288.5007)]
2025-09-14 08:56:49,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:56:49,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (294.76) for latency 9
2025-09-14 08:56:49,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 21 minutes, 32 seconds)
2025-09-14 09:00:00,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:00:09,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 583.07300 ± 129.854
2025-09-14 09:00:09,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(522.1645), np.float32(397.77396), np.float32(661.8391), np.float32(582.2014), np.float32(443.09363), np.float32(725.4991), np.float32(488.57172), np.float32(787.8632), np.float32(737.50446), np.float32(484.21854)]
2025-09-14 09:00:09,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:00:09,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (583.07) for latency 9
2025-09-14 09:00:09,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 32 minutes, 10 seconds)
2025-09-14 09:03:24,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:03:33,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 834.51184 ± 302.229
2025-09-14 09:03:33,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(466.43808), np.float32(1019.4746), np.float32(719.9177), np.float32(1108.0428), np.float32(1404.456), np.float32(1057.2227), np.float32(524.36554), np.float32(520.1154), np.float32(567.66833), np.float32(957.4163)]
2025-09-14 09:03:33,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:03:33,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (834.51) for latency 9
2025-09-14 09:03:33,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 43 minutes, 4 seconds)
2025-09-14 09:06:43,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:06:51,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 799.30566 ± 212.191
2025-09-14 09:06:51,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(513.5433), np.float32(1201.6849), np.float32(536.73755), np.float32(676.156), np.float32(1032.4171), np.float32(600.0159), np.float32(847.8204), np.float32(833.05457), np.float32(957.4145), np.float32(794.2125)]
2025-09-14 09:06:51,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:06:51,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 49 minutes, 36 seconds)
2025-09-14 09:09:59,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:10:08,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1259.27271 ± 387.603
2025-09-14 09:10:08,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1427.0736), np.float32(1484.1282), np.float32(799.3283), np.float32(1583.2156), np.float32(896.55286), np.float32(1876.2314), np.float32(831.9577), np.float32(1739.49), np.float32(1084.178), np.float32(870.5723)]
2025-09-14 09:10:08,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:10:08,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1259.27) for latency 9
2025-09-14 09:10:08,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 55 minutes, 27 seconds)
2025-09-14 09:13:16,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:13:25,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1301.99878 ± 498.102
2025-09-14 09:13:25,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(798.7092), np.float32(2452.806), np.float32(1624.2723), np.float32(933.0356), np.float32(804.4643), np.float32(1632.3265), np.float32(893.62286), np.float32(1318.5737), np.float32(1544.5515), np.float32(1017.6248)]
2025-09-14 09:13:25,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:13:25,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1302.00) for latency 9
2025-09-14 09:13:25,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 58 minutes, 43 seconds)
2025-09-14 09:16:33,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:16:41,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1407.06946 ± 415.912
2025-09-14 09:16:41,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1802.3165), np.float32(1037.732), np.float32(1815.9121), np.float32(1006.11957), np.float32(2002.6108), np.float32(1133.852), np.float32(1118.1771), np.float32(1845.62), np.float32(782.5118), np.float32(1525.843)]
2025-09-14 09:16:41,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:16:41,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1407.07) for latency 9
2025-09-14 09:16:41,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 54 minutes, 22 seconds)
2025-09-14 09:19:49,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:19:58,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1214.59265 ± 535.112
2025-09-14 09:19:58,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1766.5587), np.float32(1000.8014), np.float32(911.8759), np.float32(1732.1956), np.float32(-44.266273), np.float32(967.67474), np.float32(1739.9213), np.float32(1025.3154), np.float32(1468.6152), np.float32(1577.2341)]
2025-09-14 09:19:58,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:19:58,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 49 minutes, 5 seconds)
2025-09-14 09:23:19,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:23:28,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1437.06812 ± 345.302
2025-09-14 09:23:28,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1395.2565), np.float32(1024.8693), np.float32(1545.3965), np.float32(1153.9639), np.float32(1513.107), np.float32(1779.9237), np.float32(1154.0443), np.float32(1499.4454), np.float32(1091.6293), np.float32(2213.0447)]
2025-09-14 09:23:28,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:23:28,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1437.07) for latency 9
2025-09-14 09:23:28,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 48 minutes, 59 seconds)
2025-09-14 09:26:49,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:26:58,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1506.02051 ± 606.154
2025-09-14 09:26:58,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1027.0015), np.float32(1580.2833), np.float32(2167.633), np.float32(1021.77734), np.float32(2122.3982), np.float32(1177.5159), np.float32(2057.695), np.float32(2220.3787), np.float32(302.06207), np.float32(1383.4587)]
2025-09-14 09:26:58,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:26:58,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1506.02) for latency 9
2025-09-14 09:26:58,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 49 minutes, 40 seconds)
2025-09-14 09:30:19,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:30:28,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1563.29321 ± 626.503
2025-09-14 09:30:28,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(261.81403), np.float32(1390.6333), np.float32(966.41895), np.float32(2417.5876), np.float32(1897.9155), np.float32(2112.3755), np.float32(1380.6549), np.float32(1627.1624), np.float32(1258.9161), np.float32(2319.4543)]
2025-09-14 09:30:28,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:30:28,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1563.29) for latency 9
2025-09-14 09:30:28,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 50 minutes, 2 seconds)
2025-09-14 09:33:39,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:33:47,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1836.83789 ± 458.609
2025-09-14 09:33:47,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2085.4902), np.float32(2040.9214), np.float32(2464.795), np.float32(1011.5168), np.float32(2162.9993), np.float32(1021.46796), np.float32(1607.541), np.float32(2102.274), np.float32(2008.0607), np.float32(1863.3135)]
2025-09-14 09:33:47,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:33:47,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1836.84) for latency 9
2025-09-14 09:33:47,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 47 minutes, 5 seconds)
2025-09-14 09:36:35,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:36:41,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1492.26538 ± 346.241
2025-09-14 09:36:41,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1942.9883), np.float32(1061.6024), np.float32(1452.0172), np.float32(1256.5281), np.float32(2072.0002), np.float32(1105.574), np.float32(1061.6196), np.float32(1634.7426), np.float32(1670.5818), np.float32(1664.9987)]
2025-09-14 09:36:41,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:36:41,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 37 minutes, 28 seconds)
2025-09-14 09:39:13,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:39:19,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1685.93555 ± 584.651
2025-09-14 09:39:19,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1051.2074), np.float32(1179.6603), np.float32(978.7244), np.float32(2594.9048), np.float32(1535.5629), np.float32(2225.184), np.float32(2316.858), np.float32(2257.3108), np.float32(1661.7719), np.float32(1058.1707)]
2025-09-14 09:39:19,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:39:19,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 20 minutes, 5 seconds)
2025-09-14 09:41:51,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:41:58,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1591.34985 ± 436.887
2025-09-14 09:41:58,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(988.22015), np.float32(2245.0574), np.float32(992.6865), np.float32(1740.1486), np.float32(1392.3512), np.float32(1359.5416), np.float32(2138.1348), np.float32(1580.6896), np.float32(1350.1127), np.float32(2126.5562)]
2025-09-14 09:41:58,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:41:58,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 2 minutes, 56 seconds)
2025-09-14 09:44:47,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:44:56,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1880.13049 ± 640.898
2025-09-14 09:44:56,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2151.416), np.float32(941.3094), np.float32(1695.8344), np.float32(1955.1555), np.float32(1151.2802), np.float32(2755.0142), np.float32(1948.2576), np.float32(2255.166), np.float32(1064.708), np.float32(2883.1624)]
2025-09-14 09:44:56,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:44:56,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1880.13) for latency 9
2025-09-14 09:44:56,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 51 minutes, 20 seconds)
2025-09-14 09:48:20,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:48:29,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1215.97241 ± 484.965
2025-09-14 09:48:29,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2014.8336), np.float32(289.97937), np.float32(1123.9998), np.float32(960.3334), np.float32(1125.7684), np.float32(1889.3723), np.float32(1051.0319), np.float32(1077.0078), np.float32(1670.9363), np.float32(956.4621)]
2025-09-14 09:48:29,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:48:29,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 52 minutes, 20 seconds)
2025-09-14 09:51:54,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:52:03,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1497.06543 ± 594.346
2025-09-14 09:52:03,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1695.917), np.float32(1775.2986), np.float32(1879.4131), np.float32(1395.724), np.float32(2237.9043), np.float32(1020.1583), np.float32(1172.9515), np.float32(1760.585), np.float32(55.594284), np.float32(1977.1083)]
2025-09-14 09:52:03,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:52:03,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 59 minutes, 43 seconds)
2025-09-14 09:55:28,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:55:37,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1731.77185 ± 487.952
2025-09-14 09:55:37,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1456.6211), np.float32(2457.287), np.float32(1959.931), np.float32(2569.729), np.float32(2056.2246), np.float32(1345.6072), np.float32(1514.3489), np.float32(1582.6976), np.float32(954.6), np.float32(1420.6738)]
2025-09-14 09:55:37,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:55:37,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 10 minutes, 54 seconds)
2025-09-14 09:59:01,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:59:10,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1714.74341 ± 534.597
2025-09-14 09:59:10,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2306.8218), np.float32(1300.3485), np.float32(1301.122), np.float32(1282.4595), np.float32(1845.1971), np.float32(2430.7751), np.float32(2374.667), np.float32(1075.418), np.float32(1074.9507), np.float32(2155.6738)]
2025-09-14 09:59:10,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:59:10,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 21 minutes, 27 seconds)
2025-09-14 10:02:34,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:02:43,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1332.66284 ± 257.530
2025-09-14 10:02:43,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1085.2776), np.float32(1802.116), np.float32(1146.9147), np.float32(1578.972), np.float32(1503.126), np.float32(1062.0359), np.float32(1550.6995), np.float32(1430.3136), np.float32(1122.3915), np.float32(1044.782)]
2025-09-14 10:02:43,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:02:43,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 26 minutes, 50 seconds)
2025-09-14 10:06:07,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:06:16,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2046.43530 ± 594.057
2025-09-14 10:06:16,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2687.342), np.float32(2294.1553), np.float32(2530.8008), np.float32(1295.7292), np.float32(1105.9166), np.float32(2812.4282), np.float32(2058.2366), np.float32(2537.9824), np.float32(1786.938), np.float32(1354.8262)]
2025-09-14 10:06:16,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:06:16,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2046.44) for latency 9
2025-09-14 10:06:16,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 23 minutes, 14 seconds)
2025-09-14 10:09:40,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:09:50,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1631.50366 ± 593.786
2025-09-14 10:09:50,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2002.3738), np.float32(2594.5522), np.float32(2102.9653), np.float32(1321.5659), np.float32(2520.4397), np.float32(978.31006), np.float32(1279.6824), np.float32(1505.6837), np.float32(992.3109), np.float32(1017.15344)]
2025-09-14 10:09:50,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:09:50,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 19 minutes, 29 seconds)
2025-09-14 10:13:14,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:13:23,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1753.90369 ± 591.019
2025-09-14 10:13:23,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1173.5681), np.float32(1045.6749), np.float32(2381.3782), np.float32(2268.8188), np.float32(1197.5873), np.float32(1086.1401), np.float32(2301.8352), np.float32(1445.9106), np.float32(2640.3916), np.float32(1997.7317)]
2025-09-14 10:13:23,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:13:23,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 15 minutes, 50 seconds)
2025-09-14 10:16:48,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:16:57,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2047.76819 ± 425.440
2025-09-14 10:16:57,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1760.9266), np.float32(1365.0923), np.float32(2047.2646), np.float32(1810.2567), np.float32(2370.7256), np.float32(2673.697), np.float32(2321.349), np.float32(1639.3624), np.float32(1802.848), np.float32(2686.1575)]
2025-09-14 10:16:57,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:16:57,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2047.77) for latency 9
2025-09-14 10:16:57,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 12 minutes, 29 seconds)
2025-09-14 10:20:21,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:20:31,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1595.67358 ± 663.747
2025-09-14 10:20:31,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2615.7185), np.float32(1036.8546), np.float32(1681.5416), np.float32(2521.9983), np.float32(1015.1469), np.float32(1858.523), np.float32(1007.35144), np.float32(946.76996), np.float32(2347.3164), np.float32(925.5157)]
2025-09-14 10:20:31,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:20:31,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 9 minutes, 5 seconds)
2025-09-14 10:23:55,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:24:05,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1573.12866 ± 539.187
2025-09-14 10:24:05,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1452.1288), np.float32(1754.7007), np.float32(2715.9082), np.float32(1293.7814), np.float32(975.5585), np.float32(2390.367), np.float32(1112.595), np.float32(1283.5591), np.float32(1572.9087), np.float32(1179.7814)]
2025-09-14 10:24:05,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:24:05,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 5 minutes, 42 seconds)
2025-09-14 10:27:29,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:27:38,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1855.75061 ± 657.344
2025-09-14 10:27:38,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2192.6707), np.float32(1927.9495), np.float32(2577.2065), np.float32(1122.9437), np.float32(1043.4772), np.float32(1022.0362), np.float32(1760.2727), np.float32(1435.6638), np.float32(2893.9287), np.float32(2581.3562)]
2025-09-14 10:27:38,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:27:38,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 2 minutes, 10 seconds)
2025-09-14 10:31:02,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:31:11,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1708.87927 ± 503.194
2025-09-14 10:31:11,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2207.958), np.float32(1974.9215), np.float32(1124.539), np.float32(1252.8439), np.float32(1488.9185), np.float32(1160.5249), np.float32(2273.9285), np.float32(2525.9348), np.float32(1906.7117), np.float32(1172.5123)]
2025-09-14 10:31:11,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:31:11,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 58 minutes, 36 seconds)
2025-09-14 10:34:36,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:34:46,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1560.28540 ± 478.692
2025-09-14 10:34:46,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1046.0195), np.float32(1393.555), np.float32(1069.7809), np.float32(1170.8822), np.float32(1377.2063), np.float32(1262.7684), np.float32(2208.272), np.float32(2532.5007), np.float32(1895.6117), np.float32(1646.2592)]
2025-09-14 10:34:46,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:34:46,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 55 minutes, 6 seconds)
2025-09-14 10:38:09,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:38:19,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1848.38770 ± 603.607
2025-09-14 10:38:19,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2933.0813), np.float32(2449.5422), np.float32(1204.6675), np.float32(1192.7316), np.float32(1054.3906), np.float32(2112.7024), np.float32(1966.2852), np.float32(2097.8523), np.float32(1256.9061), np.float32(2215.7173)]
2025-09-14 10:38:19,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:38:19,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 51 minutes, 24 seconds)
2025-09-14 10:41:42,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:41:52,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1570.61536 ± 419.605
2025-09-14 10:41:52,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1088.779), np.float32(1258.8243), np.float32(1554.8431), np.float32(2192.4016), np.float32(1261.0792), np.float32(1046.1494), np.float32(1749.4142), np.float32(1485.0444), np.float32(2353.6484), np.float32(1715.9694)]
2025-09-14 10:41:52,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:41:52,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 47 minutes, 40 seconds)
2025-09-14 10:45:15,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:45:25,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1592.50378 ± 430.058
2025-09-14 10:45:25,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1404.347), np.float32(1446.3708), np.float32(1464.3905), np.float32(1486.0208), np.float32(2612.9707), np.float32(1608.2649), np.float32(2169.483), np.float32(1115.6455), np.float32(1276.4902), np.float32(1341.0553)]
2025-09-14 10:45:25,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:45:25,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 44 minutes)
2025-09-14 10:48:48,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:48:58,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1867.65198 ± 682.091
2025-09-14 10:48:58,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2465.2207), np.float32(2868.7021), np.float32(1411.5232), np.float32(1264.3254), np.float32(1491.7035), np.float32(1257.0353), np.float32(1121.2291), np.float32(3108.8508), np.float32(1611.3676), np.float32(2076.5627)]
2025-09-14 10:48:58,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:48:58,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 40 minutes, 19 seconds)
2025-09-14 10:52:19,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:52:29,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1708.27930 ± 517.563
2025-09-14 10:52:29,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1312.4169), np.float32(1281.2435), np.float32(1878.8986), np.float32(2573.0547), np.float32(1194.1575), np.float32(2213.3806), np.float32(2442.5396), np.float32(1618.8087), np.float32(1545.2307), np.float32(1023.0634)]
2025-09-14 10:52:29,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:52:29,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 36 minutes, 6 seconds)
2025-09-14 10:55:46,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:55:56,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1826.08435 ± 563.323
2025-09-14 10:55:56,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2438.1084), np.float32(1161.0887), np.float32(1385.0061), np.float32(2397.4482), np.float32(1196.2821), np.float32(1482.9338), np.float32(1851.0695), np.float32(1409.7649), np.float32(2868.9624), np.float32(2070.1802)]
2025-09-14 10:55:56,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:55:56,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 31 minutes, 21 seconds)
2025-09-14 10:59:14,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:59:23,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2218.58667 ± 342.304
2025-09-14 10:59:23,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2265.3325), np.float32(2250.9106), np.float32(2754.0906), np.float32(2079.031), np.float32(2470.485), np.float32(1709.9082), np.float32(2291.2197), np.float32(1547.2219), np.float32(2485.149), np.float32(2332.5188)]
2025-09-14 10:59:23,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:59:23,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2218.59) for latency 9
2025-09-14 10:59:23,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 26 minutes, 44 seconds)
2025-09-14 11:02:28,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:02:37,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2229.43701 ± 571.422
2025-09-14 11:02:37,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2920.5952), np.float32(2745.563), np.float32(2232.6887), np.float32(2490.2175), np.float32(1616.3046), np.float32(2616.3833), np.float32(2725.1812), np.float32(2242.9734), np.float32(1619.1974), np.float32(1085.2659)]
2025-09-14 11:02:37,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:02:37,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2229.44) for latency 9
2025-09-14 11:02:37,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 19 minutes, 30 seconds)
2025-09-14 11:05:41,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:05:50,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1958.83594 ± 672.114
2025-09-14 11:05:50,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2107.2695), np.float32(1666.6478), np.float32(1153.2365), np.float32(3022.3376), np.float32(1181.5829), np.float32(2083.2273), np.float32(2748.5574), np.float32(2736.2534), np.float32(1088.6217), np.float32(1800.6257)]
2025-09-14 11:05:50,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:05:50,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 12 minutes, 19 seconds)
2025-09-14 11:08:43,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:08:51,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1848.06482 ± 494.281
2025-09-14 11:08:51,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1786.5391), np.float32(2246.2622), np.float32(1395.5583), np.float32(1868.4056), np.float32(1334.288), np.float32(2454.8606), np.float32(1432.9252), np.float32(1963.0016), np.float32(1220.9174), np.float32(2777.8914)]
2025-09-14 11:08:51,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:08:51,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 3 minutes, 18 seconds)
2025-09-14 11:11:25,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:11:32,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2135.45288 ± 620.053
2025-09-14 11:11:32,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2473.2698), np.float32(2752.2383), np.float32(2333.8972), np.float32(2958.9912), np.float32(1067.0176), np.float32(2397.7556), np.float32(1948.72), np.float32(2456.2961), np.float32(1017.65674), np.float32(1948.6873)]
2025-09-14 11:11:32,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:11:32,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 51 minutes, 37 seconds)
2025-09-14 11:14:02,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:14:08,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2357.93359 ± 559.911
2025-09-14 11:14:08,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2551.4866), np.float32(3223.837), np.float32(1461.1888), np.float32(2971.5796), np.float32(1605.2572), np.float32(2286.3438), np.float32(2749.2466), np.float32(2510.2449), np.float32(1725.2351), np.float32(2494.9187)]
2025-09-14 11:14:08,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:14:08,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2357.93) for latency 9
2025-09-14 11:14:08,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 39 minutes, 23 seconds)
2025-09-14 11:16:38,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:16:45,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2214.25684 ± 521.709
2025-09-14 11:16:45,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1992.5463), np.float32(2526.1558), np.float32(1893.9977), np.float32(1558.5648), np.float32(2916.1204), np.float32(1213.2642), np.float32(2775.947), np.float32(2444.4329), np.float32(2637.2136), np.float32(2184.326)]
2025-09-14 11:16:45,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:16:45,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 29 minutes, 52 seconds)
2025-09-14 11:19:14,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:19:21,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2338.58740 ± 598.891
2025-09-14 11:19:21,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2766.333), np.float32(2866.5334), np.float32(1503.2963), np.float32(2698.2104), np.float32(3044.7844), np.float32(2528.958), np.float32(1658.6597), np.float32(1307.4069), np.float32(2204.6582), np.float32(2807.0337)]
2025-09-14 11:19:21,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:19:21,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 20 minutes, 38 seconds)
2025-09-14 11:21:51,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:21:58,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2111.76025 ± 431.883
2025-09-14 11:21:58,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1965.2153), np.float32(1751.0056), np.float32(1446.6241), np.float32(2378.379), np.float32(2440.7766), np.float32(1425.8568), np.float32(2237.1147), np.float32(2422.1804), np.float32(2235.8982), np.float32(2814.553)]
2025-09-14 11:21:58,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:21:58,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 13 minutes, 48 seconds)
2025-09-14 11:24:27,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:24:34,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2160.81519 ± 380.019
2025-09-14 11:24:34,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1851.4362), np.float32(2359.3496), np.float32(2244.3333), np.float32(2370.9336), np.float32(1601.7291), np.float32(1561.1598), np.float32(2425.6577), np.float32(2741.3867), np.float32(2517.631), np.float32(1934.5355)]
2025-09-14 11:24:34,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:24:34,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 10 minutes, 23 seconds)
2025-09-14 11:27:04,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:27:11,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2238.05786 ± 583.057
2025-09-14 11:27:11,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2168.5247), np.float32(1362.5096), np.float32(2884.9329), np.float32(1084.3496), np.float32(2651.8528), np.float32(2667.7703), np.float32(2005.7402), np.float32(2124.664), np.float32(2695.5627), np.float32(2734.6714)]
2025-09-14 11:27:11,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:27:11,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 7 minutes, 44 seconds)
2025-09-14 11:29:40,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:29:47,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2228.71338 ± 625.405
2025-09-14 11:29:47,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2611.5403), np.float32(1253.9651), np.float32(3000.818), np.float32(2250.6526), np.float32(2505.0764), np.float32(2216.6833), np.float32(2520.1025), np.float32(1353.5428), np.float32(1486.2025), np.float32(3088.5483)]
2025-09-14 11:29:47,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:29:47,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 5 minutes, 9 seconds)
2025-09-14 11:32:17,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:32:24,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2061.71411 ± 621.300
2025-09-14 11:32:24,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2359.6313), np.float32(1299.9725), np.float32(1401.5973), np.float32(2591.0574), np.float32(2210.7085), np.float32(2901.6123), np.float32(2919.5444), np.float32(1518.9005), np.float32(1224.6339), np.float32(2189.4817)]
2025-09-14 11:32:24,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:32:24,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 2 minutes, 36 seconds)
2025-09-14 11:34:54,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:35:01,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2331.01367 ± 595.618
2025-09-14 11:35:01,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3142.7468), np.float32(1914.3424), np.float32(1442.6171), np.float32(2377.3213), np.float32(3102.9814), np.float32(2248.6838), np.float32(2592.6577), np.float32(1319.4202), np.float32(2360.517), np.float32(2808.8489)]
2025-09-14 11:35:01,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:35:01,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 3 seconds)
2025-09-14 11:37:30,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:37:37,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2269.09033 ± 522.150
2025-09-14 11:37:37,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2499.9521), np.float32(1197.0846), np.float32(1558.2217), np.float32(1911.8011), np.float32(2745.359), np.float32(2715.3406), np.float32(2551.1296), np.float32(2253.018), np.float32(2379.137), np.float32(2879.8608)]
2025-09-14 11:37:37,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:37:37,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 57 minutes, 25 seconds)
2025-09-14 11:40:06,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:40:13,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2007.32361 ± 529.283
2025-09-14 11:40:13,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1198.2693), np.float32(1607.2295), np.float32(2606.5242), np.float32(2672.924), np.float32(1873.32), np.float32(2071.2388), np.float32(1899.9906), np.float32(1517.7816), np.float32(1711.0541), np.float32(2914.9045)]
2025-09-14 11:40:13,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:40:13,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 54 minutes, 45 seconds)
2025-09-14 11:42:42,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:42:49,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1942.30603 ± 477.915
2025-09-14 11:42:49,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1546.944), np.float32(1970.082), np.float32(1997.1278), np.float32(2142.2104), np.float32(1162.3047), np.float32(2665.5117), np.float32(1434.9113), np.float32(2506.848), np.float32(1568.4103), np.float32(2428.7114)]
2025-09-14 11:42:49,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:42:49,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 52 minutes, 3 seconds)
2025-09-14 11:45:19,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:45:25,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1994.50842 ± 487.015
2025-09-14 11:45:25,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2304.6895), np.float32(2497.5803), np.float32(1698.1456), np.float32(1529.6825), np.float32(2572.7341), np.float32(2859.0576), np.float32(1388.7714), np.float32(1654.8306), np.float32(1762.3561), np.float32(1677.2369)]
2025-09-14 11:45:25,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:45:25,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 49 minutes, 24 seconds)
2025-09-14 11:47:55,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:48:01,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2445.49438 ± 667.668
2025-09-14 11:48:01,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1553.226), np.float32(2865.7896), np.float32(3271.532), np.float32(2994.9087), np.float32(1142.9827), np.float32(2631.2554), np.float32(2828.5798), np.float32(2424.2654), np.float32(1818.239), np.float32(2924.166)]
2025-09-14 11:48:01,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:48:01,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2445.49) for latency 9
2025-09-14 11:48:01,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 46 minutes, 43 seconds)
2025-09-14 11:50:31,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:50:38,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2609.26025 ± 462.501
2025-09-14 11:50:38,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3034.5464), np.float32(1783.6047), np.float32(3035.3955), np.float32(2899.4756), np.float32(3041.2092), np.float32(2337.017), np.float32(2572.379), np.float32(1813.4269), np.float32(2948.298), np.float32(2627.2505)]
2025-09-14 11:50:38,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:50:38,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2609.26) for latency 9
2025-09-14 11:50:38,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 44 minutes, 6 seconds)
2025-09-14 11:53:07,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:53:14,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2511.49536 ± 370.733
2025-09-14 11:53:14,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2442.9512), np.float32(2132.2742), np.float32(1642.9731), np.float32(2552.2385), np.float32(2809.819), np.float32(2574.4893), np.float32(2748.0442), np.float32(2500.833), np.float32(2650.738), np.float32(3060.5928)]
2025-09-14 11:53:14,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:53:14,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 41 minutes, 30 seconds)
2025-09-14 11:55:43,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:55:50,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2204.94116 ± 750.297
2025-09-14 11:55:50,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3119.364), np.float32(1137.5178), np.float32(2153.7), np.float32(1815.5168), np.float32(1537.8827), np.float32(2574.854), np.float32(1138.4194), np.float32(2332.765), np.float32(2907.0342), np.float32(3332.3567)]
2025-09-14 11:55:50,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:55:50,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 38 minutes, 56 seconds)
2025-09-14 11:58:19,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:58:26,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2223.83984 ± 610.563
2025-09-14 11:58:26,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2994.364), np.float32(1829.8163), np.float32(2148.1743), np.float32(2489.1453), np.float32(1472.9153), np.float32(1058.6731), np.float32(2630.5789), np.float32(2037.7677), np.float32(2520.5286), np.float32(3056.4343)]
2025-09-14 11:58:26,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:58:26,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 36 minutes, 17 seconds)
2025-09-14 12:00:55,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:01:02,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2157.43799 ± 540.789
2025-09-14 12:01:02,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2298.3599), np.float32(2509.9062), np.float32(1405.4487), np.float32(2857.4304), np.float32(2691.6775), np.float32(2645.4646), np.float32(1428.4514), np.float32(2415.1365), np.float32(1896.9987), np.float32(1425.5084)]
2025-09-14 12:01:02,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:01:02,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 33 minutes, 40 seconds)
2025-09-14 12:03:32,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:03:38,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1841.78638 ± 526.218
2025-09-14 12:03:38,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1991.0212), np.float32(1078.2178), np.float32(1141.663), np.float32(2608.0188), np.float32(2336.9602), np.float32(2063.657), np.float32(2160.461), np.float32(2128.0925), np.float32(1061.7362), np.float32(1848.036)]
2025-09-14 12:03:38,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:03:38,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 31 minutes, 5 seconds)
2025-09-14 12:06:08,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:06:15,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2149.52393 ± 597.087
2025-09-14 12:06:15,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1577.3708), np.float32(2107.698), np.float32(2875.6814), np.float32(2749.6992), np.float32(2554.039), np.float32(1178.8483), np.float32(2611.2703), np.float32(2230.7588), np.float32(1178.2876), np.float32(2431.5857)]
2025-09-14 12:06:15,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:06:15,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 28 minutes, 29 seconds)
2025-09-14 12:08:45,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:08:52,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2439.51465 ± 455.604
2025-09-14 12:08:52,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2246.4397), np.float32(2404.7537), np.float32(2661.2017), np.float32(2718.4028), np.float32(1331.3915), np.float32(2178.7336), np.float32(2732.1284), np.float32(2580.8994), np.float32(2390.104), np.float32(3151.0945)]
2025-09-14 12:08:52,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:08:52,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 25 minutes, 57 seconds)
2025-09-14 12:11:21,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:11:28,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2379.63257 ± 495.877
2025-09-14 12:11:28,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2625.165), np.float32(2886.8413), np.float32(1189.8412), np.float32(2060.0564), np.float32(2950.9062), np.float32(2554.9248), np.float32(2765.2925), np.float32(2120.0217), np.float32(2469.4946), np.float32(2173.7817)]
2025-09-14 12:11:28,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:11:28,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 23 minutes, 23 seconds)
2025-09-14 12:13:58,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:14:04,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2365.40576 ± 561.734
2025-09-14 12:14:04,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1691.0648), np.float32(2744.8965), np.float32(1085.191), np.float32(2154.5156), np.float32(2551.33), np.float32(2200.7983), np.float32(2732.8926), np.float32(2962.2886), np.float32(2759.0906), np.float32(2771.9915)]
2025-09-14 12:14:04,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:14:04,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 20 minutes, 50 seconds)
2025-09-14 12:16:34,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:16:41,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1974.63147 ± 733.319
2025-09-14 12:16:41,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2760.9844), np.float32(2282.4978), np.float32(1183.1299), np.float32(2715.6274), np.float32(1364.4415), np.float32(2864.0369), np.float32(1109.665), np.float32(1134.3295), np.float32(2797.5247), np.float32(1534.077)]
2025-09-14 12:16:41,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:16:41,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 18 minutes, 14 seconds)
2025-09-14 12:19:11,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:19:18,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2135.94800 ± 670.079
2025-09-14 12:19:18,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2836.1057), np.float32(1469.1978), np.float32(1737.0685), np.float32(2136.9556), np.float32(1731.3097), np.float32(2789.6624), np.float32(1184.2325), np.float32(2680.7112), np.float32(3253.317), np.float32(1540.92)]
2025-09-14 12:19:18,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:19:18,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 15 minutes, 40 seconds)
2025-09-14 12:21:47,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:21:54,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2801.51636 ± 353.069
2025-09-14 12:21:54,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3058.1365), np.float32(3092.056), np.float32(2702.6477), np.float32(2582.0022), np.float32(3156.1118), np.float32(2741.757), np.float32(1897.423), np.float32(2759.7803), np.float32(3032.4062), np.float32(2992.843)]
2025-09-14 12:21:54,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:21:54,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2801.52) for latency 9
2025-09-14 12:21:54,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 13 minutes, 1 second)
2025-09-14 12:24:23,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:24:30,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2622.90747 ± 678.980
2025-09-14 12:24:30,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2810.4668), np.float32(1614.6565), np.float32(3300.2734), np.float32(1517.3168), np.float32(3139.019), np.float32(2944.198), np.float32(3350.0388), np.float32(1728.4669), np.float32(2987.7434), np.float32(2836.8953)]
2025-09-14 12:24:30,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:24:30,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 10 minutes, 24 seconds)
2025-09-14 12:27:00,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:27:07,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2751.80908 ± 360.949
2025-09-14 12:27:07,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3062.273), np.float32(1918.9025), np.float32(2767.268), np.float32(2528.709), np.float32(2483.5198), np.float32(3046.2651), np.float32(2892.5762), np.float32(2608.571), np.float32(3189.2747), np.float32(3020.7312)]
2025-09-14 12:27:07,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:27:07,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 7 minutes, 47 seconds)
2025-09-14 12:29:36,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:29:43,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2202.27783 ± 493.446
2025-09-14 12:29:43,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2266.988), np.float32(1687.2682), np.float32(1622.6915), np.float32(3053.9841), np.float32(2377.5818), np.float32(2598.3535), np.float32(1505.0142), np.float32(2458.1333), np.float32(1799.9218), np.float32(2652.8416)]
2025-09-14 12:29:43,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:29:43,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 5 minutes, 10 seconds)
2025-09-14 12:32:13,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:32:19,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2275.97266 ± 793.875
2025-09-14 12:32:19,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2452.4602), np.float32(3284.2983), np.float32(2177.9387), np.float32(3121.799), np.float32(3106.1777), np.float32(1120.5852), np.float32(1211.2831), np.float32(2968.795), np.float32(1911.614), np.float32(1404.7764)]
2025-09-14 12:32:19,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:32:19,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 2 minutes, 32 seconds)
2025-09-14 12:34:49,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:34:56,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2506.16333 ± 626.849
2025-09-14 12:34:56,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2336.4639), np.float32(2859.2354), np.float32(2121.5676), np.float32(2881.2625), np.float32(3062.709), np.float32(3151.436), np.float32(2870.1472), np.float32(2919.2239), np.float32(1695.2733), np.float32(1164.3153)]
2025-09-14 12:34:56,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:34:56,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 59 minutes, 55 seconds)
2025-09-14 12:37:25,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:37:32,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2700.48877 ± 546.755
2025-09-14 12:37:32,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2578.2273), np.float32(2919.5706), np.float32(2685.9844), np.float32(2795.2751), np.float32(3099.806), np.float32(3056.0552), np.float32(1972.1067), np.float32(1461.8016), np.float32(3093.9285), np.float32(3342.1306)]
2025-09-14 12:37:32,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:37:32,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 57 minutes, 18 seconds)
2025-09-14 12:40:01,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:40:08,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2513.03271 ± 640.758
2025-09-14 12:40:08,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2704.173), np.float32(1203.9661), np.float32(1364.3713), np.float32(3018.445), np.float32(2859.5793), np.float32(3187.1748), np.float32(2494.5046), np.float32(2849.5193), np.float32(2683.9434), np.float32(2764.6528)]
2025-09-14 12:40:08,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:40:08,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 54 minutes, 41 seconds)
2025-09-14 12:42:38,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:42:44,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2192.28711 ± 618.719
2025-09-14 12:42:44,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1583.36), np.float32(2009.792), np.float32(3110.4248), np.float32(2343.672), np.float32(1658.2827), np.float32(2529.638), np.float32(1666.1821), np.float32(1241.4149), np.float32(2937.3855), np.float32(2842.7214)]
2025-09-14 12:42:44,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:42:44,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 52 minutes, 5 seconds)
2025-09-14 12:45:14,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:45:21,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2516.90869 ± 712.935
2025-09-14 12:45:21,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2464.5923), np.float32(2866.264), np.float32(2814.5505), np.float32(2288.9297), np.float32(1319.3606), np.float32(1105.7697), np.float32(3066.9775), np.float32(3328.8435), np.float32(2810.302), np.float32(3103.4956)]
2025-09-14 12:45:21,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:45:21,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 49 minutes, 28 seconds)
2025-09-14 12:47:50,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:47:57,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2398.97534 ± 609.834
2025-09-14 12:47:57,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1362.0765), np.float32(2666.0596), np.float32(2805.081), np.float32(2844.3494), np.float32(1772.4937), np.float32(2538.2024), np.float32(3196.4858), np.float32(2692.8953), np.float32(2704.0276), np.float32(1408.0818)]
2025-09-14 12:47:57,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:47:57,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 46 minutes, 52 seconds)
2025-09-14 12:50:27,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:50:33,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1962.07739 ± 662.032
2025-09-14 12:50:33,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2452.1765), np.float32(2034.2499), np.float32(1485.2562), np.float32(2811.6162), np.float32(1538.1292), np.float32(1333.2003), np.float32(3194.7102), np.float32(1230.0789), np.float32(2262.965), np.float32(1278.39)]
2025-09-14 12:50:33,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:50:33,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 44 minutes, 17 seconds)
2025-09-14 12:53:03,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:53:10,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2440.50049 ± 636.244
2025-09-14 12:53:10,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3015.066), np.float32(3199.3823), np.float32(1474.5342), np.float32(2284.654), np.float32(2810.781), np.float32(2845.805), np.float32(2654.489), np.float32(2906.5112), np.float32(1267.4003), np.float32(1946.3811)]
2025-09-14 12:53:10,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:53:10,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 41 minutes, 41 seconds)
2025-09-14 12:55:39,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:55:46,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2161.13867 ± 628.568
2025-09-14 12:55:46,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2340.7715), np.float32(2006.8657), np.float32(2793.6218), np.float32(1234.2664), np.float32(1697.7366), np.float32(3011.3518), np.float32(1379.8165), np.float32(2915.1086), np.float32(1608.4011), np.float32(2623.4473)]
2025-09-14 12:55:46,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:55:46,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 39 minutes, 4 seconds)
2025-09-14 12:58:15,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:58:22,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2245.88306 ± 733.493
2025-09-14 12:58:22,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2750.7058), np.float32(3071.8662), np.float32(977.4768), np.float32(1435.1111), np.float32(2839.745), np.float32(2065.7576), np.float32(1237.2312), np.float32(2405.2883), np.float32(2682.7056), np.float32(2992.9407)]
2025-09-14 12:58:22,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:58:22,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 36 minutes, 28 seconds)
2025-09-14 13:00:51,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:00:58,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2451.32251 ± 553.506
2025-09-14 13:00:58,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3050.2583), np.float32(2899.5798), np.float32(2623.4187), np.float32(3250.6946), np.float32(2701.7166), np.float32(1640.6727), np.float32(1726.3484), np.float32(2202.4363), np.float32(1755.9236), np.float32(2662.176)]
2025-09-14 13:00:58,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:00:58,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 33 minutes, 51 seconds)
2025-09-14 13:03:28,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:03:34,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2590.85986 ± 651.736
2025-09-14 13:03:34,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3387.1343), np.float32(1316.3483), np.float32(2663.7195), np.float32(2629.2686), np.float32(2333.2092), np.float32(1523.6915), np.float32(2976.6162), np.float32(3121.8013), np.float32(2826.6453), np.float32(3130.1643)]
2025-09-14 13:03:34,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:03:34,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 31 minutes, 14 seconds)
2025-09-14 13:06:04,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:06:11,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2019.20959 ± 730.346
2025-09-14 13:06:11,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1544.0768), np.float32(2642.4578), np.float32(2896.5618), np.float32(1524.533), np.float32(1641.9277), np.float32(1074.4242), np.float32(1317.4307), np.float32(2893.862), np.float32(3109.9902), np.float32(1546.8322)]
2025-09-14 13:06:11,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:06:11,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 28 minutes, 39 seconds)
2025-09-14 13:08:41,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:08:48,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2489.85205 ± 589.622
2025-09-14 13:08:48,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2329.0142), np.float32(2769.1594), np.float32(2684.4436), np.float32(3068.6843), np.float32(3005.9062), np.float32(3156.3098), np.float32(2495.833), np.float32(2448.663), np.float32(1147.4755), np.float32(1793.0328)]
2025-09-14 13:08:48,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:08:48,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 26 minutes, 3 seconds)
2025-09-14 13:11:17,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:11:24,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2869.58154 ± 559.420
2025-09-14 13:11:24,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1282.4998), np.float32(2885.482), np.float32(2831.7124), np.float32(3026.6794), np.float32(3259.6155), np.float32(2718.2505), np.float32(3346.3394), np.float32(3132.9866), np.float32(3124.7717), np.float32(3087.476)]
2025-09-14 13:11:24,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:11:24,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2869.58) for latency 9
2025-09-14 13:11:24,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 23 minutes, 27 seconds)
2025-09-14 13:13:54,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:14:01,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2792.62500 ± 510.015
2025-09-14 13:14:01,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3117.1973), np.float32(2732.857), np.float32(2353.6042), np.float32(3053.6084), np.float32(1554.7766), np.float32(2465.0747), np.float32(3265.576), np.float32(3032.524), np.float32(3268.7239), np.float32(3082.3071)]
2025-09-14 13:14:01,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:14:01,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 20 minutes, 52 seconds)
2025-09-14 13:16:31,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:16:37,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2115.27319 ± 896.078
2025-09-14 13:16:37,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1126.8494), np.float32(3053.2104), np.float32(3065.5918), np.float32(2492.9768), np.float32(854.9699), np.float32(3477.7073), np.float32(2632.4138), np.float32(1822.5192), np.float32(1166.8556), np.float32(1459.6394)]
2025-09-14 13:16:37,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:16:37,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 18 minutes, 16 seconds)
2025-09-14 13:19:07,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:19:14,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2553.30737 ± 694.792
2025-09-14 13:19:14,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3000.2075), np.float32(1249.3737), np.float32(2939.1375), np.float32(2637.1086), np.float32(2583.6707), np.float32(3312.7617), np.float32(2738.8801), np.float32(2859.3672), np.float32(1193.5824), np.float32(3018.9834)]
2025-09-14 13:19:14,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:19:14,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 15 minutes, 39 seconds)
2025-09-14 13:21:44,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:21:50,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2576.53979 ± 431.913
2025-09-14 13:21:50,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2094.5872), np.float32(2778.0647), np.float32(2136.8408), np.float32(2833.812), np.float32(3020.54), np.float32(2274.2996), np.float32(3065.293), np.float32(3191.8486), np.float32(2430.6074), np.float32(1939.5066)]
2025-09-14 13:21:50,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:21:50,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 13 minutes, 2 seconds)
2025-09-14 13:24:20,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:24:27,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2176.05981 ± 735.822
2025-09-14 13:24:27,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1130.2992), np.float32(2794.0535), np.float32(3059.8057), np.float32(2908.7148), np.float32(1270.2462), np.float32(2120.7827), np.float32(2257.5261), np.float32(1984.3933), np.float32(3052.0574), np.float32(1182.7207)]
2025-09-14 13:24:27,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:24:27,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 10 minutes, 26 seconds)
2025-09-14 13:26:57,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:27:04,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2710.33545 ± 512.249
2025-09-14 13:27:04,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2426.4792), np.float32(1992.4565), np.float32(2713.9878), np.float32(3286.741), np.float32(3423.436), np.float32(3001.2302), np.float32(1923.6005), np.float32(2944.7322), np.float32(2232.5642), np.float32(3158.1282)]
2025-09-14 13:27:04,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:27:04,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes, 49 seconds)
2025-09-14 13:29:33,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:29:40,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2946.32031 ± 409.989
2025-09-14 13:29:40,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2868.2588), np.float32(3238.2427), np.float32(3306.072), np.float32(2812.4062), np.float32(3326.0674), np.float32(3217.6128), np.float32(2738.7556), np.float32(2862.0044), np.float32(3200.788), np.float32(1892.9965)]
2025-09-14 13:29:40,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:29:40,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2946.32) for latency 9
2025-09-14 13:29:40,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 12 seconds)
2025-09-14 13:32:10,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:32:17,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2555.59399 ± 711.377
2025-09-14 13:32:17,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3242.426), np.float32(2633.873), np.float32(3220.2363), np.float32(1444.3577), np.float32(1808.5502), np.float32(1372.7035), np.float32(3120.7393), np.float32(2486.3708), np.float32(3148.4392), np.float32(3078.2415)]
2025-09-14 13:32:17,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:32:17,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 36 seconds)
2025-09-14 13:34:45,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:34:51,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2376.58569 ± 681.219
2025-09-14 13:34:51,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2699.163), np.float32(3231.1228), np.float32(2823.1692), np.float32(2815.1318), np.float32(2964.6228), np.float32(2068.3103), np.float32(2869.589), np.float32(1321.4473), np.float32(1615.8522), np.float32(1357.4509)]
2025-09-14 13:34:51,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:34:51,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1251 [DEBUG]: Training session finished
