2025-09-14 13:36:50,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.025-delay_21
2025-09-14 13:36:50,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.025-delay_21
2025-09-14 13:36:50,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'21': <latency_env.delayed_mdp.ConstantDelay object at 0x7f6174d23f80>}
2025-09-14 13:36:50,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 13:36:50,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 13:36:50,738 baseline-bpql-noisepromille25-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=143, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 13:36:50,738 baseline-bpql-noisepromille25-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 13:36:51,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 13:36:51,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 13:39:01,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 13:39:09,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: -387.87747 ± 55.199
2025-09-14 13:39:09,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-416.1473), np.float32(-453.02222), np.float32(-357.699), np.float32(-460.94287), np.float32(-418.45248), np.float32(-327.5249), np.float32(-364.0456), np.float32(-380.24103), np.float32(-275.65973), np.float32(-425.0396)]
2025-09-14 13:39:09,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:39:09,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (-387.88) for latency 21
2025-09-14 13:39:09,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 47 minutes, 21 seconds)
2025-09-14 13:41:22,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 13:41:30,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: -241.97733 ± 51.772
2025-09-14 13:41:30,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-290.4316), np.float32(-366.51968), np.float32(-232.63176), np.float32(-179.36711), np.float32(-231.28763), np.float32(-225.24158), np.float32(-213.32753), np.float32(-211.43071), np.float32(-198.12508), np.float32(-271.41043)]
2025-09-14 13:41:30,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:41:30,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (-241.98) for latency 21
2025-09-14 13:41:30,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 47 minutes, 46 seconds)
2025-09-14 13:43:42,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 13:43:50,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: -142.15274 ± 90.431
2025-09-14 13:43:50,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-18.628485), np.float32(-201.96605), np.float32(-159.3616), np.float32(-203.55478), np.float32(-201.90729), np.float32(-213.51163), np.float32(-153.61345), np.float32(-234.9026), np.float32(-88.36087), np.float32(54.279118)]
2025-09-14 13:43:50,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:43:50,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (-142.15) for latency 21
2025-09-14 13:43:50,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 45 minutes, 39 seconds)
2025-09-14 13:46:01,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 13:46:09,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: -131.03494 ± 74.342
2025-09-14 13:46:09,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-258.0123), np.float32(-169.76076), np.float32(-83.23579), np.float32(-148.9101), np.float32(-134.53664), np.float32(-8.261332), np.float32(-33.481617), np.float32(-84.864365), np.float32(-184.29942), np.float32(-204.98712)]
2025-09-14 13:46:09,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:46:09,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (-131.03) for latency 21
2025-09-14 13:46:09,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 42 minutes, 56 seconds)
2025-09-14 13:48:19,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 13:48:27,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 18.60951 ± 75.566
2025-09-14 13:48:27,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-49.74316), np.float32(-48.636124), np.float32(181.24777), np.float32(-36.91664), np.float32(16.870453), np.float32(9.261581), np.float32(43.185295), np.float32(126.71725), np.float32(-56.73151), np.float32(0.840209)]
2025-09-14 13:48:27,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:48:27,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (18.61) for latency 21
2025-09-14 13:48:27,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 40 minutes, 18 seconds)
2025-09-14 13:50:38,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 13:50:46,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: -18.30553 ± 110.915
2025-09-14 13:50:46,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-72.34834), np.float32(-78.73119), np.float32(184.65405), np.float32(-111.31259), np.float32(199.64182), np.float32(-24.888483), np.float32(7.0678988), np.float32(-88.794685), np.float32(-85.547905), np.float32(-112.795845)]
2025-09-14 13:50:46,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:50:46,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 38 minutes, 16 seconds)
2025-09-14 13:52:58,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 13:53:06,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 91.94106 ± 78.878
2025-09-14 13:53:06,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(116.609505), np.float32(139.03284), np.float32(-51.0472), np.float32(7.274597), np.float32(165.5298), np.float32(78.55255), np.float32(198.23462), np.float32(21.165426), np.float32(185.52876), np.float32(58.529785)]
2025-09-14 13:53:06,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:53:06,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (91.94) for latency 21
2025-09-14 13:53:06,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 35 minutes, 37 seconds)
2025-09-14 13:55:21,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 13:55:29,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 312.33893 ± 116.212
2025-09-14 13:55:29,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(383.3681), np.float32(332.4799), np.float32(100.044624), np.float32(225.14896), np.float32(471.34418), np.float32(287.53262), np.float32(522.16644), np.float32(273.5422), np.float32(283.92123), np.float32(243.84116)]
2025-09-14 13:55:29,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:55:29,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (312.34) for latency 21
2025-09-14 13:55:29,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 34 minutes, 18 seconds)
2025-09-14 13:57:39,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 13:57:48,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 429.05292 ± 56.278
2025-09-14 13:57:48,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(350.87317), np.float32(431.78128), np.float32(466.745), np.float32(496.42087), np.float32(418.28998), np.float32(506.3905), np.float32(402.32016), np.float32(489.9352), np.float32(380.94955), np.float32(346.8239)]
2025-09-14 13:57:48,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:57:48,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (429.05) for latency 21
2025-09-14 13:57:48,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 31 minutes, 59 seconds)
2025-09-14 13:59:58,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:00:06,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 597.30896 ± 119.298
2025-09-14 14:00:06,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(415.75974), np.float32(739.13666), np.float32(690.47577), np.float32(592.3511), np.float32(689.952), np.float32(433.66272), np.float32(606.4929), np.float32(771.1921), np.float32(485.64725), np.float32(548.41876)]
2025-09-14 14:00:06,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:00:06,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (597.31) for latency 21
2025-09-14 14:00:06,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 29 minutes, 43 seconds)
2025-09-14 14:02:17,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:02:25,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 784.95465 ± 120.401
2025-09-14 14:02:25,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(860.9275), np.float32(803.1705), np.float32(590.7483), np.float32(966.5806), np.float32(815.4156), np.float32(852.05975), np.float32(591.0923), np.float32(880.7105), np.float32(664.77216), np.float32(824.06946)]
2025-09-14 14:02:25,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:02:25,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (784.95) for latency 21
2025-09-14 14:02:25,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 27 minutes, 25 seconds)
2025-09-14 14:04:36,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:04:44,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 636.78168 ± 149.108
2025-09-14 14:04:44,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(572.95013), np.float32(433.48862), np.float32(721.3491), np.float32(650.76764), np.float32(717.6288), np.float32(358.8082), np.float32(727.69507), np.float32(541.76526), np.float32(828.98267), np.float32(814.3814)]
2025-09-14 14:04:44,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:04:44,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 3 hours, 24 minutes, 43 seconds)
2025-09-14 14:06:54,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:07:02,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1329.11169 ± 185.393
2025-09-14 14:07:02,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1472.0857), np.float32(1012.82715), np.float32(1188.2987), np.float32(1202.0854), np.float32(1514.4257), np.float32(1293.676), np.float32(1443.274), np.float32(1098.3989), np.float32(1499.5015), np.float32(1566.5447)]
2025-09-14 14:07:02,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:07:02,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (1329.11) for latency 21
2025-09-14 14:07:02,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 3 hours, 21 minutes, 6 seconds)
2025-09-14 14:09:13,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:09:21,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1601.42969 ± 84.433
2025-09-14 14:09:21,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1701.1447), np.float32(1649.456), np.float32(1621.6227), np.float32(1579.5732), np.float32(1769.2358), np.float32(1513.5049), np.float32(1475.3943), np.float32(1596.4575), np.float32(1584.3438), np.float32(1523.5647)]
2025-09-14 14:09:21,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:09:21,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (1601.43) for latency 21
2025-09-14 14:09:21,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 3 hours, 18 minutes, 48 seconds)
2025-09-14 14:11:32,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:11:40,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2104.88452 ± 164.191
2025-09-14 14:11:40,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1867.7614), np.float32(1899.5753), np.float32(1864.5659), np.float32(2272.5178), np.float32(2084.8198), np.float32(2243.6702), np.float32(2104.8525), np.float32(2199.6746), np.float32(2330.5713), np.float32(2180.8384)]
2025-09-14 14:11:40,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:11:40,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (2104.88) for latency 21
2025-09-14 14:11:40,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 3 hours, 16 minutes, 31 seconds)
2025-09-14 14:13:50,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:13:59,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2073.97217 ± 198.557
2025-09-14 14:13:59,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2219.557), np.float32(2179.1565), np.float32(2288.6855), np.float32(1772.5436), np.float32(2258.154), np.float32(2199.5117), np.float32(1957.5026), np.float32(2061.1768), np.float32(2127.8494), np.float32(1675.5841)]
2025-09-14 14:13:59,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:13:59,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 14 minutes, 11 seconds)
2025-09-14 14:16:09,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:16:17,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2086.42310 ± 123.786
2025-09-14 14:16:17,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1953.6321), np.float32(2148.5696), np.float32(2210.3164), np.float32(2030.5582), np.float32(2296.0857), np.float32(2059.671), np.float32(1875.6283), np.float32(2066.4172), np.float32(2217.693), np.float32(2005.6584)]
2025-09-14 14:16:17,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:16:17,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 11 minutes, 49 seconds)
2025-09-14 14:18:28,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:18:36,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2487.97925 ± 176.645
2025-09-14 14:18:36,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2440.5183), np.float32(2281.1199), np.float32(2471.8352), np.float32(2739.1013), np.float32(2292.13), np.float32(2602.9634), np.float32(2774.6584), np.float32(2285.4949), np.float32(2615.935), np.float32(2376.037)]
2025-09-14 14:18:36,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:18:36,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (2487.98) for latency 21
2025-09-14 14:18:36,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 9 minutes, 32 seconds)
2025-09-14 14:20:47,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:20:55,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3052.49878 ± 121.990
2025-09-14 14:20:55,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2827.0933), np.float32(3260.3665), np.float32(3116.014), np.float32(3095.0454), np.float32(3004.4895), np.float32(2880.1826), np.float32(3125.021), np.float32(3144.5251), np.float32(3072.8167), np.float32(2999.4338)]
2025-09-14 14:20:55,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:20:55,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3052.50) for latency 21
2025-09-14 14:20:55,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 7 minutes, 14 seconds)
2025-09-14 14:23:05,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:23:13,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3109.31787 ± 189.994
2025-09-14 14:23:13,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2755.7234), np.float32(3050.905), np.float32(3085.9907), np.float32(3168.095), np.float32(3057.596), np.float32(3244.1416), np.float32(3099.4768), np.float32(3253.8887), np.float32(2893.647), np.float32(3483.7158)]
2025-09-14 14:23:13,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:23:13,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3109.32) for latency 21
2025-09-14 14:23:13,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 4 minutes, 53 seconds)
2025-09-14 14:25:25,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:25:33,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3338.19800 ± 124.966
2025-09-14 14:25:33,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3388.1333), np.float32(3544.0725), np.float32(3221.1597), np.float32(3308.2502), np.float32(3139.7808), np.float32(3476.5696), np.float32(3178.9888), np.float32(3415.259), np.float32(3298.6096), np.float32(3411.1545)]
2025-09-14 14:25:33,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:25:33,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3338.20) for latency 21
2025-09-14 14:25:33,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 2 minutes, 52 seconds)
2025-09-14 14:27:44,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:27:52,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3332.50244 ± 125.262
2025-09-14 14:27:52,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3113.8625), np.float32(3288.8083), np.float32(3575.313), np.float32(3510.688), np.float32(3292.931), np.float32(3255.427), np.float32(3287.8433), np.float32(3265.1978), np.float32(3369.623), np.float32(3365.331)]
2025-09-14 14:27:52,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:27:52,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 42 seconds)
2025-09-14 14:30:03,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:30:11,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3330.16602 ± 269.063
2025-09-14 14:30:11,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3708.9001), np.float32(3530.766), np.float32(3436.3494), np.float32(3483.8208), np.float32(3329.7031), np.float32(3366.236), np.float32(3241.6572), np.float32(2808.705), np.float32(2890.1367), np.float32(3505.388)]
2025-09-14 14:30:11,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:30:11,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 58 minutes, 32 seconds)
2025-09-14 14:32:24,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:32:32,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3432.09033 ± 165.872
2025-09-14 14:32:32,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3273.4602), np.float32(3479.99), np.float32(3169.9753), np.float32(3397.4644), np.float32(3490.5425), np.float32(3605.9937), np.float32(3483.9492), np.float32(3201.4905), np.float32(3495.7874), np.float32(3722.2517)]
2025-09-14 14:32:32,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:32:32,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3432.09) for latency 21
2025-09-14 14:32:32,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 56 minutes, 43 seconds)
2025-09-14 14:34:43,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:34:51,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3412.01953 ± 146.050
2025-09-14 14:34:51,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3395.6433), np.float32(3065.9211), np.float32(3565.177), np.float32(3568.4622), np.float32(3567.1055), np.float32(3312.617), np.float32(3349.0083), np.float32(3472.2688), np.float32(3455.5225), np.float32(3368.4683)]
2025-09-14 14:34:51,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:34:51,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 54 minutes, 21 seconds)
2025-09-14 14:37:01,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:37:09,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3418.53516 ± 126.549
2025-09-14 14:37:09,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3382.2463), np.float32(3509.2114), np.float32(3346.1438), np.float32(3428.074), np.float32(3454.6687), np.float32(3288.5208), np.float32(3155.9224), np.float32(3569.0793), np.float32(3446.8374), np.float32(3604.6462)]
2025-09-14 14:37:09,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:37:09,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 51 minutes, 42 seconds)
2025-09-14 14:39:20,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:39:28,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3360.56982 ± 154.983
2025-09-14 14:39:28,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3430.8945), np.float32(3451.0674), np.float32(3338.6455), np.float32(3673.7812), np.float32(3207.9968), np.float32(3429.678), np.float32(3045.335), np.float32(3350.9504), np.float32(3339.3884), np.float32(3337.9622)]
2025-09-14 14:39:28,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:39:28,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 49 minutes, 15 seconds)
2025-09-14 14:41:38,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:41:46,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3347.56299 ± 191.741
2025-09-14 14:41:46,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3067.9377), np.float32(3366.5642), np.float32(3207.4429), np.float32(3435.713), np.float32(3315.724), np.float32(3139.556), np.float32(3741.6235), np.float32(3531.1719), np.float32(3216.4229), np.float32(3453.4736)]
2025-09-14 14:41:46,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:41:46,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 46 minutes, 47 seconds)
2025-09-14 14:43:57,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:44:05,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3359.53906 ± 189.361
2025-09-14 14:44:05,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2963.8013), np.float32(3463.3728), np.float32(3243.4539), np.float32(3394.15), np.float32(3477.2068), np.float32(3585.0115), np.float32(3562.5227), np.float32(3171.7927), np.float32(3234.2517), np.float32(3499.8293)]
2025-09-14 14:44:05,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:44:05,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 43 minutes, 57 seconds)
2025-09-14 14:46:15,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:46:23,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3425.15674 ± 141.399
2025-09-14 14:46:23,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3187.4436), np.float32(3319.5586), np.float32(3421.6902), np.float32(3534.2815), np.float32(3267.977), np.float32(3494.0093), np.float32(3606.4487), np.float32(3651.0571), np.float32(3427.652), np.float32(3341.4492)]
2025-09-14 14:46:23,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:46:23,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 41 minutes, 36 seconds)
2025-09-14 14:48:34,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:48:42,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3523.92822 ± 178.513
2025-09-14 14:48:42,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3287.0242), np.float32(3695.9512), np.float32(3765.461), np.float32(3663.413), np.float32(3170.2983), np.float32(3434.4897), np.float32(3510.9268), np.float32(3555.355), np.float32(3664.9653), np.float32(3491.3962)]
2025-09-14 14:48:42,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:48:42,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3523.93) for latency 21
2025-09-14 14:48:42,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 39 minutes, 16 seconds)
2025-09-14 14:50:52,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:51:00,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3370.41406 ± 218.986
2025-09-14 14:51:00,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3487.6448), np.float32(3730.2688), np.float32(3316.6233), np.float32(3485.6028), np.float32(3386.767), np.float32(3206.873), np.float32(3490.4524), np.float32(3451.0986), np.float32(2854.2092), np.float32(3294.6008)]
2025-09-14 14:51:00,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:51:00,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 36 minutes, 58 seconds)
2025-09-14 14:53:11,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:53:19,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3559.83130 ± 215.391
2025-09-14 14:53:19,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3369.431), np.float32(3043.3179), np.float32(3574.2705), np.float32(3774.0454), np.float32(3716.6191), np.float32(3573.4077), np.float32(3584.4487), np.float32(3855.3313), np.float32(3521.8423), np.float32(3585.6016)]
2025-09-14 14:53:19,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:53:19,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3559.83) for latency 21
2025-09-14 14:53:19,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 34 minutes, 39 seconds)
2025-09-14 14:55:29,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:55:37,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3366.40820 ± 99.247
2025-09-14 14:55:37,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3431.1882), np.float32(3163.3896), np.float32(3327.6458), np.float32(3420.1267), np.float32(3523.1113), np.float32(3457.2778), np.float32(3402.8123), np.float32(3286.572), np.float32(3375.5654), np.float32(3276.3896)]
2025-09-14 14:55:37,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:55:37,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 32 minutes, 19 seconds)
2025-09-14 14:57:48,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:57:56,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3415.84521 ± 173.958
2025-09-14 14:57:56,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3682.9807), np.float32(3119.6963), np.float32(3271.084), np.float32(3456.7383), np.float32(3231.7178), np.float32(3519.6404), np.float32(3642.709), np.float32(3535.74), np.float32(3321.83), np.float32(3376.318)]
2025-09-14 14:57:56,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:57:56,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 30 minutes, 4 seconds)
2025-09-14 15:00:06,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:00:14,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3491.66528 ± 145.208
2025-09-14 15:00:14,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3586.9106), np.float32(3242.7327), np.float32(3358.6514), np.float32(3503.3423), np.float32(3341.132), np.float32(3601.5198), np.float32(3719.7612), np.float32(3655.835), np.float32(3401.679), np.float32(3505.089)]
2025-09-14 15:00:14,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:00:14,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 27 minutes, 48 seconds)
2025-09-14 15:02:25,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:02:33,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3386.14795 ± 176.954
2025-09-14 15:02:33,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3422.096), np.float32(3469.8286), np.float32(3171.1252), np.float32(3511.528), np.float32(3539.7397), np.float32(3119.7808), np.float32(3494.6453), np.float32(3204.881), np.float32(3677.9438), np.float32(3249.912)]
2025-09-14 15:02:33,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:02:33,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 25 minutes, 28 seconds)
2025-09-14 15:04:43,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:04:52,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3561.76221 ± 207.249
2025-09-14 15:04:52,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3530.4739), np.float32(3375.6277), np.float32(3345.348), np.float32(3745.0745), np.float32(3572.3733), np.float32(3710.5874), np.float32(3403.966), np.float32(3968.793), np.float32(3272.7854), np.float32(3692.5928)]
2025-09-14 15:04:52,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:04:52,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3561.76) for latency 21
2025-09-14 15:04:52,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 23 minutes, 8 seconds)
2025-09-14 15:07:02,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:07:10,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3507.63721 ± 119.866
2025-09-14 15:07:10,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3585.7322), np.float32(3311.86), np.float32(3578.0488), np.float32(3533.386), np.float32(3481.504), np.float32(3540.581), np.float32(3603.44), np.float32(3383.7654), np.float32(3711.271), np.float32(3346.7832)]
2025-09-14 15:07:10,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:07:10,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 20 minutes, 50 seconds)
2025-09-14 15:09:21,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:09:29,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3393.98877 ± 115.171
2025-09-14 15:09:29,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3387.196), np.float32(3186.479), np.float32(3595.1594), np.float32(3294.0315), np.float32(3458.3965), np.float32(3397.3462), np.float32(3406.4558), np.float32(3278.1816), np.float32(3396.557), np.float32(3540.0842)]
2025-09-14 15:09:29,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:09:29,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 18 minutes, 33 seconds)
2025-09-14 15:11:42,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:11:50,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3493.93896 ± 135.322
2025-09-14 15:11:50,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3402.1433), np.float32(3204.167), np.float32(3586.7534), np.float32(3388.5398), np.float32(3454.4568), np.float32(3620.4685), np.float32(3436.6824), np.float32(3659.4556), np.float32(3554.6963), np.float32(3632.0266)]
2025-09-14 15:11:50,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:11:50,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 16 minutes, 48 seconds)
2025-09-14 15:14:04,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:14:13,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3493.39136 ± 160.750
2025-09-14 15:14:13,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3492.1797), np.float32(3251.358), np.float32(3631.9941), np.float32(3622.755), np.float32(3296.5532), np.float32(3542.7427), np.float32(3580.6035), np.float32(3316.9014), np.float32(3774.7158), np.float32(3424.1106)]
2025-09-14 15:14:13,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:14:13,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 15 minutes, 15 seconds)
2025-09-14 15:16:27,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:16:35,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3401.02930 ± 237.501
2025-09-14 15:16:35,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3128.6208), np.float32(3507.2805), np.float32(2938.0781), np.float32(3317.033), np.float32(3342.5913), np.float32(3402.8013), np.float32(3859.526), np.float32(3423.241), np.float32(3520.2078), np.float32(3570.9126)]
2025-09-14 15:16:35,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:16:35,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 13 minutes, 43 seconds)
2025-09-14 15:18:50,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:18:58,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3541.06909 ± 325.340
2025-09-14 15:18:58,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3859.6252), np.float32(3791.8333), np.float32(3776.2734), np.float32(3013.8235), np.float32(3406.475), np.float32(2882.1838), np.float32(3502.6367), np.float32(3764.4795), np.float32(3740.6152), np.float32(3672.7473)]
2025-09-14 15:18:58,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:18:58,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 12 minutes, 7 seconds)
2025-09-14 15:21:12,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:21:20,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3575.47607 ± 105.802
2025-09-14 15:21:20,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3591.7627), np.float32(3592.045), np.float32(3451.8076), np.float32(3494.7744), np.float32(3563.1772), np.float32(3765.267), np.float32(3712.1455), np.float32(3566.2952), np.float32(3395.2683), np.float32(3622.222)]
2025-09-14 15:21:20,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:21:20,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3575.48) for latency 21
2025-09-14 15:21:20,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 10 minutes, 29 seconds)
2025-09-14 15:23:46,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:23:55,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3500.78516 ± 107.717
2025-09-14 15:23:55,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3593.4517), np.float32(3514.7844), np.float32(3728.3743), np.float32(3447.688), np.float32(3383.6138), np.float32(3397.0396), np.float32(3548.9949), np.float32(3482.5771), np.float32(3559.131), np.float32(3352.196)]
2025-09-14 15:23:55,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:23:55,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 10 minutes, 28 seconds)
2025-09-14 15:26:24,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:26:32,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3688.53125 ± 201.602
2025-09-14 15:26:32,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3286.767), np.float32(3727.7627), np.float32(3730.7358), np.float32(3708.0552), np.float32(3784.9075), np.float32(3819.1982), np.float32(3449.7488), np.float32(3940.9124), np.float32(3933.59), np.float32(3503.6357)]
2025-09-14 15:26:32,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:26:32,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3688.53) for latency 21
2025-09-14 15:26:32,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 10 minutes, 39 seconds)
2025-09-14 15:28:50,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:28:58,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3391.45264 ± 146.683
2025-09-14 15:28:58,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3303.0598), np.float32(3489.38), np.float32(3328.1748), np.float32(3366.0151), np.float32(3394.887), np.float32(3489.49), np.float32(3446.1965), np.float32(3111.3486), np.float32(3694.7388), np.float32(3291.2393)]
2025-09-14 15:28:58,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:28:58,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 8 minutes, 40 seconds)
2025-09-14 15:31:15,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:31:23,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3572.67114 ± 203.952
2025-09-14 15:31:23,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3654.9604), np.float32(3802.639), np.float32(3117.5703), np.float32(3803.0623), np.float32(3562.119), np.float32(3644.8428), np.float32(3522.2542), np.float32(3466.1687), np.float32(3772.6028), np.float32(3380.4912)]
2025-09-14 15:31:23,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:31:23,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 6 minutes, 44 seconds)
2025-09-14 15:33:41,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:33:49,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3575.14844 ± 139.526
2025-09-14 15:33:49,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3523.4863), np.float32(3740.8074), np.float32(3567.1775), np.float32(3576.8108), np.float32(3693.1338), np.float32(3758.1318), np.float32(3242.5293), np.float32(3518.4058), np.float32(3526.4282), np.float32(3604.573)]
2025-09-14 15:33:49,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:33:49,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 4 minutes, 44 seconds)
2025-09-14 15:36:08,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:36:16,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3533.23755 ± 71.845
2025-09-14 15:36:16,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3533.8396), np.float32(3556.5679), np.float32(3467.9766), np.float32(3628.208), np.float32(3601.314), np.float32(3438.6023), np.float32(3579.552), np.float32(3393.2053), np.float32(3577.255), np.float32(3555.8557)]
2025-09-14 15:36:16,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:36:16,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 1 minute)
2025-09-14 15:38:34,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:38:42,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3555.10107 ± 123.920
2025-09-14 15:38:42,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3366.0127), np.float32(3540.3474), np.float32(3544.254), np.float32(3743.4695), np.float32(3690.2722), np.float32(3448.815), np.float32(3621.6223), np.float32(3466.0366), np.float32(3706.2732), np.float32(3423.9092)]
2025-09-14 15:38:42,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:38:42,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 56 minutes, 43 seconds)
2025-09-14 15:41:00,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:41:08,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3502.78271 ± 188.693
2025-09-14 15:41:08,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3634.8086), np.float32(3716.7212), np.float32(3766.654), np.float32(3303.9124), np.float32(3331.8875), np.float32(3227.038), np.float32(3542.5603), np.float32(3410.639), np.float32(3367.2393), np.float32(3726.3655)]
2025-09-14 15:41:08,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:41:08,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 54 minutes, 25 seconds)
2025-09-14 15:43:26,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:43:34,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3492.25830 ± 81.495
2025-09-14 15:43:34,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3552.7017), np.float32(3571.1885), np.float32(3399.2117), np.float32(3638.9832), np.float32(3412.4897), np.float32(3434.175), np.float32(3418.95), np.float32(3536.4377), np.float32(3546.0916), np.float32(3412.354)]
2025-09-14 15:43:34,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:43:34,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 52 minutes, 4 seconds)
2025-09-14 15:45:52,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:46:00,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3567.45117 ± 119.655
2025-09-14 15:46:00,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3389.0947), np.float32(3574.9446), np.float32(3577.4592), np.float32(3420.5737), np.float32(3381.3813), np.float32(3734.7063), np.float32(3657.363), np.float32(3638.3115), np.float32(3640.547), np.float32(3660.1277)]
2025-09-14 15:46:00,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:46:00,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 49 minutes, 42 seconds)
2025-09-14 15:48:18,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:48:26,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3601.95239 ± 149.162
2025-09-14 15:48:26,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3565.2173), np.float32(3594.5889), np.float32(3461.462), np.float32(3599.148), np.float32(3654.4548), np.float32(3746.6646), np.float32(3419.7324), np.float32(3360.766), np.float32(3789.4426), np.float32(3828.0454)]
2025-09-14 15:48:26,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:48:26,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 47 minutes, 7 seconds)
2025-09-14 15:50:44,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:50:52,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3578.81519 ± 204.766
2025-09-14 15:50:52,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3692.9314), np.float32(3273.3032), np.float32(3477.0752), np.float32(3678.0864), np.float32(3477.434), np.float32(3922.5947), np.float32(3247.4993), np.float32(3722.7314), np.float32(3758.1096), np.float32(3538.3853)]
2025-09-14 15:50:52,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:50:52,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 44 minutes, 38 seconds)
2025-09-14 15:53:10,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:53:18,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3653.97412 ± 181.816
2025-09-14 15:53:18,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3567.1504), np.float32(4048.919), np.float32(3689.8716), np.float32(3509.1091), np.float32(3692.403), np.float32(3617.5544), np.float32(3611.5872), np.float32(3743.431), np.float32(3304.7314), np.float32(3754.985)]
2025-09-14 15:53:18,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:53:18,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 42 minutes, 8 seconds)
2025-09-14 15:55:35,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:55:43,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3589.61719 ± 161.111
2025-09-14 15:55:43,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3503.627), np.float32(3489.465), np.float32(3423.818), np.float32(3610.1194), np.float32(3425.5046), np.float32(3512.2107), np.float32(3546.501), np.float32(3609.9307), np.float32(3915.532), np.float32(3859.4634)]
2025-09-14 15:55:43,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:55:43,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 39 minutes, 35 seconds)
2025-09-14 15:58:01,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:58:09,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3651.66016 ± 141.894
2025-09-14 15:58:09,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3391.8918), np.float32(3721.5789), np.float32(3864.5012), np.float32(3603.7278), np.float32(3511.7485), np.float32(3763.222), np.float32(3830.4), np.float32(3654.1406), np.float32(3655.569), np.float32(3519.821)]
2025-09-14 15:58:09,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:58:09,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 37 minutes, 7 seconds)
2025-09-14 16:00:27,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:00:35,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3602.57031 ± 179.818
2025-09-14 16:00:35,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3569.8596), np.float32(3657.474), np.float32(3667.395), np.float32(3664.9695), np.float32(3836.1394), np.float32(3430.5073), np.float32(3683.5117), np.float32(3627.926), np.float32(3152.8462), np.float32(3735.075)]
2025-09-14 16:00:35,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:00:35,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 34 minutes, 43 seconds)
2025-09-14 16:02:52,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:03:00,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3583.72729 ± 156.912
2025-09-14 16:03:00,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3777.6328), np.float32(3343.9863), np.float32(3839.429), np.float32(3450.9285), np.float32(3355.1868), np.float32(3649.3057), np.float32(3629.7368), np.float32(3662.7556), np.float32(3599.2356), np.float32(3529.0762)]
2025-09-14 16:03:00,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:03:00,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 32 minutes, 17 seconds)
2025-09-14 16:05:18,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:05:27,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3670.99731 ± 167.158
2025-09-14 16:05:27,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3800.976), np.float32(3785.0046), np.float32(3705.2324), np.float32(3538.2246), np.float32(3634.7402), np.float32(3357.0505), np.float32(3889.8542), np.float32(3656.6914), np.float32(3874.5032), np.float32(3467.6948)]
2025-09-14 16:05:27,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:05:27,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 29 minutes, 53 seconds)
2025-09-14 16:07:45,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:07:53,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3546.14771 ± 186.549
2025-09-14 16:07:53,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3242.2275), np.float32(3474.0703), np.float32(3797.086), np.float32(3478.188), np.float32(3230.4375), np.float32(3732.5469), np.float32(3668.6387), np.float32(3510.0178), np.float32(3617.8428), np.float32(3710.4211)]
2025-09-14 16:07:53,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:07:53,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 27 minutes, 33 seconds)
2025-09-14 16:10:10,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:10:18,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3638.04053 ± 124.462
2025-09-14 16:10:18,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3599.037), np.float32(3431.19), np.float32(3508.13), np.float32(3717.7205), np.float32(3697.0022), np.float32(3484.9858), np.float32(3765.221), np.float32(3795.1565), np.float32(3607.297), np.float32(3774.6648)]
2025-09-14 16:10:18,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:10:18,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 25 minutes, 7 seconds)
2025-09-14 16:12:36,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:12:45,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3589.25903 ± 158.823
2025-09-14 16:12:45,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3488.3645), np.float32(3538.7065), np.float32(3781.9238), np.float32(3525.7998), np.float32(3816.7168), np.float32(3420.526), np.float32(3603.1147), np.float32(3325.1406), np.float32(3808.7488), np.float32(3583.5474)]
2025-09-14 16:12:45,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:12:45,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 22 minutes, 41 seconds)
2025-09-14 16:15:02,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:15:10,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3669.72974 ± 159.234
2025-09-14 16:15:10,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3936.1746), np.float32(3518.8455), np.float32(3602.0723), np.float32(3539.3518), np.float32(3761.303), np.float32(3582.8132), np.float32(3697.7803), np.float32(3447.1619), np.float32(3673.8457), np.float32(3937.9473)]
2025-09-14 16:15:10,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:15:10,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 20 minutes, 17 seconds)
2025-09-14 16:17:28,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:17:36,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3528.55737 ± 81.738
2025-09-14 16:17:36,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3572.0261), np.float32(3444.3193), np.float32(3590.6848), np.float32(3545.6675), np.float32(3461.2332), np.float32(3645.3765), np.float32(3467.9788), np.float32(3381.885), np.float32(3628.178), np.float32(3548.223)]
2025-09-14 16:17:36,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:17:36,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 17 minutes, 47 seconds)
2025-09-14 16:19:53,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:20:01,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3590.21997 ± 193.540
2025-09-14 16:20:01,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3154.2798), np.float32(3736.2773), np.float32(3455.216), np.float32(3807.4226), np.float32(3553.2183), np.float32(3727.9219), np.float32(3626.312), np.float32(3579.901), np.float32(3821.5583), np.float32(3440.091)]
2025-09-14 16:20:01,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:20:01,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 15 minutes, 18 seconds)
2025-09-14 16:22:19,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:22:27,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3407.66284 ± 145.821
2025-09-14 16:22:27,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3163.8174), np.float32(3316.1887), np.float32(3219.585), np.float32(3410.4592), np.float32(3396.161), np.float32(3517.1829), np.float32(3571.064), np.float32(3351.4827), np.float32(3659.47), np.float32(3471.2183)]
2025-09-14 16:22:27,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:22:27,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 12 minutes, 51 seconds)
2025-09-14 16:24:44,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:24:52,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3566.58643 ± 113.208
2025-09-14 16:24:52,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3458.662), np.float32(3663.3672), np.float32(3740.0498), np.float32(3364.4944), np.float32(3466.8293), np.float32(3675.2944), np.float32(3594.3936), np.float32(3502.1975), np.float32(3540.0503), np.float32(3660.5225)]
2025-09-14 16:24:52,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:24:52,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 10 minutes, 21 seconds)
2025-09-14 16:27:10,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:27:18,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3648.18555 ± 90.801
2025-09-14 16:27:18,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3626.6553), np.float32(3540.31), np.float32(3756.8696), np.float32(3747.2573), np.float32(3594.6423), np.float32(3638.625), np.float32(3468.469), np.float32(3711.1404), np.float32(3745.573), np.float32(3652.312)]
2025-09-14 16:27:18,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:27:18,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 7 minutes, 52 seconds)
2025-09-14 16:29:35,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:29:43,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3544.22192 ± 141.041
2025-09-14 16:29:43,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3457.0593), np.float32(3394.7214), np.float32(3591.7698), np.float32(3415.1082), np.float32(3519.8508), np.float32(3467.6575), np.float32(3771.716), np.float32(3716.3953), np.float32(3377.9001), np.float32(3730.0422)]
2025-09-14 16:29:43,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:29:43,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 5 minutes, 26 seconds)
2025-09-14 16:32:01,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:32:09,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3420.65967 ± 111.466
2025-09-14 16:32:09,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3399.2964), np.float32(3275.9294), np.float32(3381.5515), np.float32(3288.23), np.float32(3577.3813), np.float32(3625.0273), np.float32(3314.9397), np.float32(3399.1868), np.float32(3457.877), np.float32(3487.1785)]
2025-09-14 16:32:09,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:32:09,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 3 minutes, 2 seconds)
2025-09-14 16:34:26,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:34:34,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3657.11841 ± 218.480
2025-09-14 16:34:34,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3921.0168), np.float32(3703.122), np.float32(3677.9048), np.float32(3315.6023), np.float32(3694.3008), np.float32(3484.3423), np.float32(3265.4072), np.float32(3882.2153), np.float32(3767.9531), np.float32(3859.3208)]
2025-09-14 16:34:34,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:34:34,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 37 seconds)
2025-09-14 16:36:52,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:37:00,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3514.35278 ± 141.256
2025-09-14 16:37:00,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3310.0637), np.float32(3515.56), np.float32(3794.9917), np.float32(3283.884), np.float32(3623.3562), np.float32(3543.9822), np.float32(3580.7402), np.float32(3448.882), np.float32(3567.8254), np.float32(3474.2395)]
2025-09-14 16:37:00,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:37:00,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 58 minutes, 11 seconds)
2025-09-14 16:39:18,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:39:26,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3634.69971 ± 154.559
2025-09-14 16:39:26,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3503.05), np.float32(3729.4824), np.float32(3820.4187), np.float32(3558.7446), np.float32(3599.0547), np.float32(3901.1606), np.float32(3761.7766), np.float32(3451.4766), np.float32(3411.74), np.float32(3610.0933)]
2025-09-14 16:39:26,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:39:26,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 55 minutes, 49 seconds)
2025-09-14 16:41:44,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:41:52,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3680.48975 ± 117.196
2025-09-14 16:41:52,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3706.9978), np.float32(3667.7515), np.float32(3886.0647), np.float32(3806.6775), np.float32(3604.5745), np.float32(3765.8196), np.float32(3467.508), np.float32(3698.7646), np.float32(3658.3508), np.float32(3542.3923)]
2025-09-14 16:41:52,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:41:52,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 53 minutes, 27 seconds)
2025-09-14 16:44:10,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:44:18,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3597.22192 ± 162.927
2025-09-14 16:44:18,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3666.3296), np.float32(3898.1997), np.float32(3676.2175), np.float32(3542.5928), np.float32(3638.0264), np.float32(3431.122), np.float32(3658.8225), np.float32(3626.9893), np.float32(3240.975), np.float32(3592.9448)]
2025-09-14 16:44:18,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:44:18,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 51 minutes, 1 second)
2025-09-14 16:46:35,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:46:43,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3699.55005 ± 175.130
2025-09-14 16:46:43,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3814.6428), np.float32(3842.7615), np.float32(3484.906), np.float32(3504.7375), np.float32(3567.9392), np.float32(3771.254), np.float32(3594.919), np.float32(4010.1355), np.float32(3530.289), np.float32(3873.9126)]
2025-09-14 16:46:43,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:46:43,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3699.55) for latency 21
2025-09-14 16:46:43,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 48 minutes, 34 seconds)
2025-09-14 16:49:01,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:49:09,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3622.48975 ± 109.683
2025-09-14 16:49:09,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3425.0286), np.float32(3546.1304), np.float32(3667.25), np.float32(3627.4004), np.float32(3596.049), np.float32(3587.074), np.float32(3799.9817), np.float32(3779.9856), np.float32(3681.6436), np.float32(3514.3555)]
2025-09-14 16:49:09,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:49:09,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 46 minutes, 9 seconds)
2025-09-14 16:51:26,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:51:34,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3575.29541 ± 193.071
2025-09-14 16:51:34,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3698.618), np.float32(3734.493), np.float32(3877.3613), np.float32(3627.5312), np.float32(3767.4194), np.float32(3436.8699), np.float32(3278.721), np.float32(3522.663), np.float32(3275.816), np.float32(3533.4612)]
2025-09-14 16:51:34,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:51:34,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 43 minutes, 42 seconds)
2025-09-14 16:53:52,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:54:00,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3495.23193 ± 171.455
2025-09-14 16:54:00,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3547.8123), np.float32(3481.188), np.float32(3454.7087), np.float32(3567.4111), np.float32(3362.3318), np.float32(3730.8008), np.float32(3334.8022), np.float32(3842.2224), np.float32(3303.9597), np.float32(3327.0828)]
2025-09-14 16:54:00,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:54:00,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 41 minutes, 16 seconds)
2025-09-14 16:56:18,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:56:27,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3511.96924 ± 130.702
2025-09-14 16:56:27,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3630.1208), np.float32(3705.2332), np.float32(3576.636), np.float32(3501.4868), np.float32(3541.3264), np.float32(3439.2075), np.float32(3375.6663), np.float32(3259.1394), np.float32(3435.14), np.float32(3655.7336)]
2025-09-14 16:56:27,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:56:27,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 38 minutes, 52 seconds)
2025-09-14 16:58:44,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:58:52,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3636.94385 ± 153.839
2025-09-14 16:58:52,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3718.874), np.float32(3716.797), np.float32(3891.2751), np.float32(3802.6145), np.float32(3522.528), np.float32(3465.8577), np.float32(3626.0115), np.float32(3451.979), np.float32(3429.0908), np.float32(3744.4114)]
2025-09-14 16:58:52,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:58:52,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 36 minutes, 27 seconds)
2025-09-14 17:01:10,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:01:18,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3631.54688 ± 113.857
2025-09-14 17:01:18,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3688.5557), np.float32(3502.916), np.float32(3491.3562), np.float32(3791.37), np.float32(3807.522), np.float32(3751.3347), np.float32(3520.509), np.float32(3590.6553), np.float32(3609.1433), np.float32(3562.109)]
2025-09-14 17:01:18,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:01:18,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 34 minutes, 2 seconds)
2025-09-14 17:03:35,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:03:43,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3553.66602 ± 150.468
2025-09-14 17:03:43,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3479.3176), np.float32(3334.2349), np.float32(3675.9395), np.float32(3254.5198), np.float32(3538.682), np.float32(3651.4897), np.float32(3687.7673), np.float32(3748.5757), np.float32(3602.9736), np.float32(3563.1616)]
2025-09-14 17:03:43,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:03:43,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 31 minutes, 35 seconds)
2025-09-14 17:06:01,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:06:09,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3740.27979 ± 122.898
2025-09-14 17:06:09,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3949.008), np.float32(3698.4973), np.float32(3875.1316), np.float32(3766.1292), np.float32(3617.0547), np.float32(3651.107), np.float32(3814.462), np.float32(3674.7717), np.float32(3526.2253), np.float32(3830.4094)]
2025-09-14 17:06:09,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:06:09,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3740.28) for latency 21
2025-09-14 17:06:09,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 29 minutes, 8 seconds)
2025-09-14 17:08:27,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:08:35,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3715.83325 ± 133.609
2025-09-14 17:08:35,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3722.1746), np.float32(3511.6025), np.float32(3744.9167), np.float32(3753.807), np.float32(3507.2537), np.float32(3786.4006), np.float32(3847.41), np.float32(3607.3196), np.float32(3723.185), np.float32(3954.26)]
2025-09-14 17:08:35,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:08:35,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 26 minutes, 42 seconds)
2025-09-14 17:10:52,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:11:00,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3524.58862 ± 170.803
2025-09-14 17:11:00,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3696.3042), np.float32(3705.484), np.float32(3145.5547), np.float32(3549.6096), np.float32(3584.1008), np.float32(3685.2214), np.float32(3392.3203), np.float32(3624.442), np.float32(3505.8037), np.float32(3357.0474)]
2025-09-14 17:11:00,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:11:00,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 24 minutes, 16 seconds)
2025-09-14 17:13:18,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:13:26,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3636.85742 ± 195.091
2025-09-14 17:13:26,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3470.8726), np.float32(3544.0198), np.float32(3837.1128), np.float32(3435.2056), np.float32(3295.001), np.float32(3912.3418), np.float32(3606.396), np.float32(3692.6948), np.float32(3891.1516), np.float32(3683.779)]
2025-09-14 17:13:26,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:13:26,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 21 minutes, 51 seconds)
2025-09-14 17:15:44,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:15:52,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3524.46753 ± 132.568
2025-09-14 17:15:52,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3470.1501), np.float32(3437.7378), np.float32(3460.938), np.float32(3433.8484), np.float32(3695.0544), np.float32(3614.3748), np.float32(3659.6353), np.float32(3366.5962), np.float32(3364.7595), np.float32(3741.5833)]
2025-09-14 17:15:52,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:15:52,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 19 minutes, 25 seconds)
2025-09-14 17:18:10,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:18:18,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3559.90552 ± 304.149
2025-09-14 17:18:18,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3836.6199), np.float32(3557.9146), np.float32(2842.1968), np.float32(3794.0452), np.float32(3806.1062), np.float32(3499.7283), np.float32(3632.036), np.float32(3379.5012), np.float32(3915.2666), np.float32(3335.6387)]
2025-09-14 17:18:18,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:18:18,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 17 minutes)
2025-09-14 17:20:36,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:20:44,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3740.86670 ± 127.228
2025-09-14 17:20:44,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3776.239), np.float32(3604.9617), np.float32(3721.9048), np.float32(3750.657), np.float32(3457.241), np.float32(3874.4922), np.float32(3715.9849), np.float32(3771.0762), np.float32(3937.008), np.float32(3799.1028)]
2025-09-14 17:20:44,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:20:44,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3740.87) for latency 21
2025-09-14 17:20:44,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 14 minutes, 34 seconds)
2025-09-14 17:23:02,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:23:10,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3720.79810 ± 162.997
2025-09-14 17:23:10,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3452.8098), np.float32(3769.2449), np.float32(3786.3228), np.float32(3694.798), np.float32(3440.1438), np.float32(3871.5132), np.float32(3853.7969), np.float32(3666.6853), np.float32(3973.7227), np.float32(3698.9402)]
2025-09-14 17:23:10,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:23:10,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 12 minutes, 9 seconds)
2025-09-14 17:25:29,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:25:37,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3577.98315 ± 120.753
2025-09-14 17:25:37,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3310.3855), np.float32(3446.036), np.float32(3631.67), np.float32(3658.7217), np.float32(3602.6416), np.float32(3711.2378), np.float32(3708.4001), np.float32(3648.3108), np.float32(3494.4849), np.float32(3567.947)]
2025-09-14 17:25:37,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:25:37,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 9 minutes, 44 seconds)
2025-09-14 17:27:58,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:28:06,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3504.90381 ± 179.017
2025-09-14 17:28:06,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3144.1401), np.float32(3605.6333), np.float32(3449.9746), np.float32(3850.0771), np.float32(3534.0396), np.float32(3422.2383), np.float32(3603.78), np.float32(3393.089), np.float32(3640.0686), np.float32(3405.9973)]
2025-09-14 17:28:06,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:28:06,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes, 20 seconds)
2025-09-14 17:30:29,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:30:37,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3669.00781 ± 133.618
2025-09-14 17:30:37,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3543.9514), np.float32(3657.6736), np.float32(4010.2676), np.float32(3628.9978), np.float32(3611.8088), np.float32(3689.7795), np.float32(3712.3594), np.float32(3710.4702), np.float32(3648.0376), np.float32(3476.732)]
2025-09-14 17:30:37,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:30:37,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 55 seconds)
2025-09-14 17:32:57,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:33:05,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3592.90771 ± 179.479
2025-09-14 17:33:05,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3480.5027), np.float32(3820.0825), np.float32(3702.0464), np.float32(3638.3672), np.float32(3515.1265), np.float32(3975.9075), np.float32(3389.9634), np.float32(3464.8228), np.float32(3515.3833), np.float32(3426.8765)]
2025-09-14 17:33:05,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:33:05,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 28 seconds)
2025-09-14 17:35:20,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:35:28,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3588.64844 ± 147.986
2025-09-14 17:35:28,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3727.4832), np.float32(3448.8142), np.float32(3321.7031), np.float32(3710.8303), np.float32(3412.6235), np.float32(3805.0667), np.float32(3561.3623), np.float32(3630.737), np.float32(3564.8567), np.float32(3703.0044)]
2025-09-14 17:35:28,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:35:28,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1251 [DEBUG]: Training session finished
