2025-09-14 15:59:58,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.100-delay_24
2025-09-14 15:59:58,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.100-delay_24
2025-09-14 15:59:58,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'24': <latency_env.delayed_mdp.ConstantDelay object at 0x7fe7e6b9aab0>}
2025-09-14 15:59:58,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 15:59:58,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 15:59:58,725 baseline-bpql-noisepromille100-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=161, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 15:59:58,725 baseline-bpql-noisepromille100-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 16:00:00,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 16:00:00,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 16:02:52,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:03:02,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: -246.05209 ± 38.829
2025-09-14 16:03:02,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-238.44392), np.float32(-241.19556), np.float32(-202.59239), np.float32(-265.1505), np.float32(-259.51443), np.float32(-269.00223), np.float32(-275.2661), np.float32(-285.9552), np.float32(-271.8179), np.float32(-151.5828)]
2025-09-14 16:03:02,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:03:02,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (-246.05) for latency 24
2025-09-14 16:03:02,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 59 minutes, 57 seconds)
2025-09-14 16:05:47,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:05:56,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: -237.17722 ± 36.137
2025-09-14 16:05:56,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-242.14203), np.float32(-192.87636), np.float32(-258.60388), np.float32(-230.33922), np.float32(-197.28455), np.float32(-246.89388), np.float32(-323.65836), np.float32(-201.69579), np.float32(-230.94112), np.float32(-247.33708)]
2025-09-14 16:05:56,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:05:56,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (-237.18) for latency 24
2025-09-14 16:05:56,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 50 minutes, 51 seconds)
2025-09-14 16:08:41,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:08:51,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: -145.41516 ± 69.515
2025-09-14 16:08:51,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-208.54121), np.float32(-229.25102), np.float32(-99.90878), np.float32(-139.167), np.float32(-136.41673), np.float32(-133.58469), np.float32(6.635217), np.float32(-153.96881), np.float32(-248.52325), np.float32(-111.42532)]
2025-09-14 16:08:51,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:08:51,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (-145.42) for latency 24
2025-09-14 16:08:51,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 46 minutes, 4 seconds)
2025-09-14 16:11:46,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:11:56,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: -96.60929 ± 110.006
2025-09-14 16:11:56,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-128.60901), np.float32(-95.79306), np.float32(-160.63048), np.float32(-202.73402), np.float32(-47.06569), np.float32(-29.735825), np.float32(164.7978), np.float32(-189.73102), np.float32(-45.507442), np.float32(-231.08414)]
2025-09-14 16:11:56,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:11:56,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (-96.61) for latency 24
2025-09-14 16:11:56,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 46 minutes, 20 seconds)
2025-09-14 16:14:49,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:14:58,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2.04666 ± 72.524
2025-09-14 16:14:58,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-137.03796), np.float32(72.061775), np.float32(79.541374), np.float32(-26.548107), np.float32(44.949883), np.float32(-30.467659), np.float32(-70.760925), np.float32(-35.116837), np.float32(13.051705), np.float32(110.79334)]
2025-09-14 16:14:58,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:14:58,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (2.05) for latency 24
2025-09-14 16:14:58,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 44 minutes, 26 seconds)
2025-09-14 16:17:50,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:18:00,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 39.65852 ± 161.682
2025-09-14 16:18:00,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-41.878586), np.float32(216.21112), np.float32(-395.19528), np.float32(78.80941), np.float32(191.64838), np.float32(66.24452), np.float32(32.236454), np.float32(46.779846), np.float32(128.7387), np.float32(72.99068)]
2025-09-14 16:18:00,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:18:00,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (39.66) for latency 24
2025-09-14 16:18:00,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 41 minutes, 27 seconds)
2025-09-14 16:21:00,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:21:11,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 274.12869 ± 98.478
2025-09-14 16:21:11,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(268.9565), np.float32(321.60477), np.float32(422.59048), np.float32(230.7203), np.float32(77.61557), np.float32(247.65631), np.float32(415.29324), np.float32(284.87558), np.float32(301.18704), np.float32(170.7871)]
2025-09-14 16:21:11,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:21:11,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (274.13) for latency 24
2025-09-14 16:21:11,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 43 minutes, 38 seconds)
2025-09-14 16:24:26,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:24:37,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 339.34451 ± 144.075
2025-09-14 16:24:37,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(569.16766), np.float32(391.93384), np.float32(389.24936), np.float32(385.01233), np.float32(478.18643), np.float32(264.09918), np.float32(163.81532), np.float32(71.97055), np.float32(440.44412), np.float32(239.56613)]
2025-09-14 16:24:37,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:24:37,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (339.34) for latency 24
2025-09-14 16:24:37,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 50 minutes)
2025-09-14 16:27:44,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:27:55,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 571.37219 ± 150.657
2025-09-14 16:27:55,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(637.3485), np.float32(620.19464), np.float32(479.63815), np.float32(743.47375), np.float32(522.273), np.float32(718.1039), np.float32(620.5613), np.float32(589.6484), np.float32(604.09576), np.float32(178.38382)]
2025-09-14 16:27:55,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:27:55,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (571.37) for latency 24
2025-09-14 16:27:55,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 50 minutes, 56 seconds)
2025-09-14 16:31:01,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:31:12,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 619.91333 ± 155.270
2025-09-14 16:31:12,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(514.8969), np.float32(900.3821), np.float32(595.4496), np.float32(851.8561), np.float32(483.74933), np.float32(498.7334), np.float32(697.32544), np.float32(636.8386), np.float32(380.8926), np.float32(639.0091)]
2025-09-14 16:31:12,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:31:12,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (619.91) for latency 24
2025-09-14 16:31:12,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 51 minutes, 56 seconds)
2025-09-14 16:34:18,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:34:28,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 855.42676 ± 85.363
2025-09-14 16:34:28,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(980.7725), np.float32(972.05457), np.float32(825.18585), np.float32(679.7648), np.float32(836.8103), np.float32(916.53204), np.float32(847.29486), np.float32(888.91254), np.float32(827.0982), np.float32(779.8419)]
2025-09-14 16:34:28,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:34:28,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (855.43) for latency 24
2025-09-14 16:34:28,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 53 minutes, 3 seconds)
2025-09-14 16:37:36,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:37:48,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 885.87341 ± 77.998
2025-09-14 16:37:48,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(789.9684), np.float32(756.35767), np.float32(971.80005), np.float32(890.70905), np.float32(785.80786), np.float32(895.15753), np.float32(923.8154), np.float32(953.11975), np.float32(900.35425), np.float32(991.6441)]
2025-09-14 16:37:48,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:37:48,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (885.87) for latency 24
2025-09-14 16:37:48,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 52 minutes, 22 seconds)
2025-09-14 16:41:03,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:41:15,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 616.18610 ± 460.621
2025-09-14 16:41:15,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(927.4532), np.float32(1003.69574), np.float32(859.53485), np.float32(649.28864), np.float32(550.4174), np.float32(-669.7569), np.float32(425.45422), np.float32(772.5126), np.float32(896.2756), np.float32(746.986)]
2025-09-14 16:41:15,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:41:15,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 49 minutes, 22 seconds)
2025-09-14 16:44:32,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:44:44,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 993.33459 ± 101.704
2025-09-14 16:44:44,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(827.131), np.float32(1097.8425), np.float32(971.1533), np.float32(929.4566), np.float32(899.5732), np.float32(955.55927), np.float32(1068.1666), np.float32(980.93475), np.float32(1203.8285), np.float32(999.69995)]
2025-09-14 16:44:44,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:44:44,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (993.33) for latency 24
2025-09-14 16:44:44,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 49 minutes, 12 seconds)
2025-09-14 16:47:55,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:48:06,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 957.65820 ± 118.118
2025-09-14 16:48:06,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(863.2607), np.float32(1015.89856), np.float32(725.05927), np.float32(1159.5632), np.float32(1070.8264), np.float32(1034.925), np.float32(983.14496), np.float32(848.2048), np.float32(946.6582), np.float32(929.0412)]
2025-09-14 16:48:06,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:48:06,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 47 minutes, 34 seconds)
2025-09-14 16:51:11,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:51:22,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 981.78064 ± 106.328
2025-09-14 16:51:22,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(858.73584), np.float32(955.796), np.float32(1108.7122), np.float32(842.16174), np.float32(1083.352), np.float32(1080.7715), np.float32(1095.8463), np.float32(814.4005), np.float32(975.0844), np.float32(1002.946)]
2025-09-14 16:51:22,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:51:22,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 43 minutes, 58 seconds)
2025-09-14 16:54:27,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:54:39,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1073.03723 ± 65.032
2025-09-14 16:54:39,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1101.9484), np.float32(930.06647), np.float32(1164.6904), np.float32(1119.7605), np.float32(1041.8323), np.float32(1020.31793), np.float32(1139.972), np.float32(1108.5807), np.float32(1048.7706), np.float32(1054.4323)]
2025-09-14 16:54:39,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:54:39,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (1073.04) for latency 24
2025-09-14 16:54:39,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 39 minutes, 34 seconds)
2025-09-14 16:57:50,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:58:02,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 993.66602 ± 116.616
2025-09-14 16:58:02,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1038.6882), np.float32(756.89246), np.float32(941.20715), np.float32(1046.5228), np.float32(886.97375), np.float32(1159.7079), np.float32(1031.6454), np.float32(887.0168), np.float32(1109.0741), np.float32(1078.9321)]
2025-09-14 16:58:02,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:58:02,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 35 minutes, 22 seconds)
2025-09-14 17:01:16,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:01:26,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1063.18091 ± 101.241
2025-09-14 17:01:26,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1053.601), np.float32(1112.6808), np.float32(964.1729), np.float32(1057.5234), np.float32(1084.033), np.float32(910.8751), np.float32(1294.6392), np.float32(963.68445), np.float32(1117.1359), np.float32(1073.4631)]
2025-09-14 17:01:26,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:01:26,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 30 minutes, 35 seconds)
2025-09-14 17:04:37,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:04:49,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1245.58472 ± 206.950
2025-09-14 17:04:49,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1636.878), np.float32(1315.1122), np.float32(1275.3125), np.float32(1103.0754), np.float32(1072.8038), np.float32(988.91547), np.float32(1481.4055), np.float32(1168.7087), np.float32(1417.6351), np.float32(996.00085)]
2025-09-14 17:04:49,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:04:49,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (1245.58) for latency 24
2025-09-14 17:04:49,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 27 minutes, 15 seconds)
2025-09-14 17:08:01,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:08:11,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1111.23413 ± 145.994
2025-09-14 17:08:11,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(871.4244), np.float32(1095.739), np.float32(1135.5878), np.float32(1109.9286), np.float32(1007.9993), np.float32(1295.9783), np.float32(1031.392), np.float32(1259.6758), np.float32(954.53705), np.float32(1350.0803)]
2025-09-14 17:08:11,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:08:11,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 25 minutes, 46 seconds)
2025-09-14 17:11:23,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:11:34,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1201.06458 ± 211.062
2025-09-14 17:11:34,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1584.8002), np.float32(1090.1545), np.float32(1073.0038), np.float32(1050.7722), np.float32(1639.3812), np.float32(1113.273), np.float32(1173.597), np.float32(1136.1262), np.float32(1146.286), np.float32(1003.2518)]
2025-09-14 17:11:34,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:11:34,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 24 minutes, 3 seconds)
2025-09-14 17:14:54,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:15:04,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1139.28088 ± 121.404
2025-09-14 17:15:04,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1246.4741), np.float32(1369.8962), np.float32(1021.3578), np.float32(1105.0934), np.float32(1265.8943), np.float32(933.50397), np.float32(1121.8622), np.float32(1149.3257), np.float32(1045.2819), np.float32(1134.1195)]
2025-09-14 17:15:04,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:15:04,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 22 minutes, 25 seconds)
2025-09-14 17:18:33,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:18:45,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1281.46985 ± 190.025
2025-09-14 17:18:45,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1458.1532), np.float32(1359.4292), np.float32(1160.2705), np.float32(1240.6174), np.float32(1014.6445), np.float32(1064.2726), np.float32(1207.8191), np.float32(1678.2734), np.float32(1206.3798), np.float32(1424.8379)]
2025-09-14 17:18:45,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:18:45,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (1281.47) for latency 24
2025-09-14 17:18:45,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 23 minutes, 8 seconds)
2025-09-14 17:22:05,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:22:18,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1329.13013 ± 219.756
2025-09-14 17:22:18,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1734.0406), np.float32(1176.0004), np.float32(1119.0209), np.float32(1219.6279), np.float32(1098.3019), np.float32(1214.872), np.float32(1191.9403), np.float32(1429.7373), np.float32(1700.6552), np.float32(1407.1046)]
2025-09-14 17:22:18,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:22:18,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (1329.13) for latency 24
2025-09-14 17:22:18,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 22 minutes, 17 seconds)
2025-09-14 17:25:39,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:25:51,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1315.86987 ± 254.982
2025-09-14 17:25:51,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1520.9619), np.float32(1266.7753), np.float32(1085.1796), np.float32(1190.4105), np.float32(1173.3774), np.float32(1723.9878), np.float32(1643.0161), np.float32(994.79126), np.float32(1542.0023), np.float32(1018.19745)]
2025-09-14 17:25:51,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:25:51,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 21 minutes, 26 seconds)
2025-09-14 17:29:10,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:29:22,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1283.12280 ± 186.516
2025-09-14 17:29:22,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1336.7852), np.float32(1254.3898), np.float32(1055.3464), np.float32(1394.5986), np.float32(1289.4401), np.float32(1661.0923), np.float32(991.6089), np.float32(1104.1311), np.float32(1417.6383), np.float32(1326.1974)]
2025-09-14 17:29:22,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:29:22,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 19 minutes, 50 seconds)
2025-09-14 17:32:35,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:32:47,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1343.06714 ± 200.001
2025-09-14 17:32:47,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1682.1107), np.float32(1425.079), np.float32(1297.8107), np.float32(1238.1315), np.float32(1131.9811), np.float32(1509.3059), np.float32(1221.1622), np.float32(1125.4165), np.float32(1150.4413), np.float32(1649.2329)]
2025-09-14 17:32:47,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:32:47,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (1343.07) for latency 24
2025-09-14 17:32:47,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 14 minutes, 59 seconds)
2025-09-14 17:36:02,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:36:14,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1519.15735 ± 339.048
2025-09-14 17:36:14,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1618.7385), np.float32(1140.4574), np.float32(1369.6884), np.float32(1943.0006), np.float32(2210.7566), np.float32(1503.1931), np.float32(1702.2195), np.float32(1306.3988), np.float32(1073.4355), np.float32(1323.6843)]
2025-09-14 17:36:14,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:36:14,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (1519.16) for latency 24
2025-09-14 17:36:14,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 8 minutes, 21 seconds)
2025-09-14 17:39:47,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:40:00,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1331.20740 ± 161.560
2025-09-14 17:40:00,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1608.4652), np.float32(1089.2445), np.float32(1450.3894), np.float32(1153.1156), np.float32(1224.0587), np.float32(1323.1201), np.float32(1515.6969), np.float32(1199.7189), np.float32(1455.1323), np.float32(1293.1332)]
2025-09-14 17:40:00,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:40:00,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 7 minutes, 54 seconds)
2025-09-14 17:43:27,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:43:40,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1514.91431 ± 313.437
2025-09-14 17:43:40,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1002.9459), np.float32(1890.2797), np.float32(1557.5342), np.float32(1463.6829), np.float32(1786.1083), np.float32(1203.4592), np.float32(1088.2178), np.float32(1971.2258), np.float32(1561.334), np.float32(1624.3563)]
2025-09-14 17:43:40,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:43:40,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 5 minutes, 53 seconds)
2025-09-14 17:47:10,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:47:22,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1468.47058 ± 357.874
2025-09-14 17:47:22,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1648.2861), np.float32(1237.23), np.float32(1215.0685), np.float32(1484.1575), np.float32(1171.3265), np.float32(1114.4556), np.float32(2140.5615), np.float32(1205.5796), np.float32(2091.0918), np.float32(1376.9481)]
2025-09-14 17:47:22,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:47:22,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 4 minutes, 53 seconds)
2025-09-14 17:50:51,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:51:04,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1559.32043 ± 460.712
2025-09-14 17:51:04,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1529.763), np.float32(2801.8914), np.float32(1134.1196), np.float32(1376.3599), np.float32(1150.1455), np.float32(1549.7368), np.float32(1622.0254), np.float32(1683.5824), np.float32(1141.5413), np.float32(1604.0392)]
2025-09-14 17:51:04,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:51:04,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (1559.32) for latency 24
2025-09-14 17:51:04,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 4 minutes, 54 seconds)
2025-09-14 17:54:32,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:54:45,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1567.90125 ± 398.841
2025-09-14 17:54:45,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1135.5227), np.float32(1130.3859), np.float32(2025.0902), np.float32(1390.77), np.float32(1381.3572), np.float32(1270.5908), np.float32(2239.3882), np.float32(1529.7057), np.float32(2175.0457), np.float32(1401.1562)]
2025-09-14 17:54:45,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:54:45,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (1567.90) for latency 24
2025-09-14 17:54:45,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 4 minutes, 15 seconds)
2025-09-14 17:58:14,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:58:26,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1372.39087 ± 367.907
2025-09-14 17:58:26,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1341.2767), np.float32(1294.6976), np.float32(1341.9055), np.float32(1620.1473), np.float32(1246.3297), np.float32(1145.1232), np.float32(1085.7214), np.float32(1124.0654), np.float32(1142.3346), np.float32(2382.3064)]
2025-09-14 17:58:26,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:58:26,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 59 minutes, 37 seconds)
2025-09-14 18:01:56,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:02:09,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1649.67944 ± 487.482
2025-09-14 18:02:09,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1588.9275), np.float32(2377.419), np.float32(2088.5786), np.float32(2382.0247), np.float32(1073.0275), np.float32(1666.4714), np.float32(1121.0168), np.float32(1210.8733), np.float32(1133.013), np.float32(1855.442)]
2025-09-14 18:02:09,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:02:09,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (1649.68) for latency 24
2025-09-14 18:02:09,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 56 minutes, 35 seconds)
2025-09-14 18:05:46,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:05:59,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1666.28101 ± 288.786
2025-09-14 18:05:59,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1256.1382), np.float32(1684.3777), np.float32(1927.8654), np.float32(1689.3502), np.float32(1898.2067), np.float32(2179.7095), np.float32(1344.4718), np.float32(1441.1505), np.float32(1866.4181), np.float32(1375.1229)]
2025-09-14 18:05:59,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:05:59,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (1666.28) for latency 24
2025-09-14 18:05:59,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 54 minutes, 29 seconds)
2025-09-14 18:09:24,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:09:37,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1565.07666 ± 472.492
2025-09-14 18:09:37,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1412.9777), np.float32(1759.7896), np.float32(2414.179), np.float32(1310.8729), np.float32(1063.0311), np.float32(1365.688), np.float32(1231.4916), np.float32(1426.6346), np.float32(2473.3535), np.float32(1192.7489)]
2025-09-14 18:09:37,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:09:37,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 50 minutes, 12 seconds)
2025-09-14 18:13:07,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:13:20,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1335.74561 ± 361.237
2025-09-14 18:13:20,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1126.9484), np.float32(1345.3892), np.float32(2232.3489), np.float32(1064.7101), np.float32(900.90137), np.float32(1116.9259), np.float32(1614.1292), np.float32(1335.5656), np.float32(1494.8036), np.float32(1125.7345)]
2025-09-14 18:13:20,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:13:20,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 46 minutes, 43 seconds)
2025-09-14 18:16:47,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:17:00,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1777.42871 ± 431.825
2025-09-14 18:17:00,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2382.6729), np.float32(1518.9425), np.float32(1908.6589), np.float32(1602.6796), np.float32(2494.2964), np.float32(1947.3754), np.float32(1119.3116), np.float32(2043.7601), np.float32(1474.1776), np.float32(1282.4114)]
2025-09-14 18:17:00,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:17:00,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (1777.43) for latency 24
2025-09-14 18:17:00,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 42 minutes, 37 seconds)
2025-09-14 18:20:27,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:20:40,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1681.01038 ± 560.694
2025-09-14 18:20:40,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2651.7769), np.float32(2213.752), np.float32(2535.7725), np.float32(1397.3296), np.float32(1468.6022), np.float32(1247.1569), np.float32(1166.4658), np.float32(1184.9634), np.float32(1842.9966), np.float32(1101.2872)]
2025-09-14 18:20:40,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:20:40,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 38 minutes, 21 seconds)
2025-09-14 18:24:07,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:24:19,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1606.98328 ± 449.704
2025-09-14 18:24:19,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1408.4927), np.float32(1200.7012), np.float32(1577.1284), np.float32(1120.6307), np.float32(1774.0399), np.float32(1148.888), np.float32(2685.9626), np.float32(2007.4331), np.float32(1484.2297), np.float32(1662.3273)]
2025-09-14 18:24:19,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:24:19,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 32 minutes, 45 seconds)
2025-09-14 18:27:52,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:28:05,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1802.29224 ± 580.779
2025-09-14 18:28:05,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1141.0298), np.float32(2095.9634), np.float32(1747.5781), np.float32(1136.8959), np.float32(2760.751), np.float32(1555.8105), np.float32(2139.8123), np.float32(1065.4073), np.float32(1706.5748), np.float32(2673.0986)]
2025-09-14 18:28:05,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:28:05,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (1802.29) for latency 24
2025-09-14 18:28:05,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 30 minutes, 27 seconds)
2025-09-14 18:31:42,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:31:55,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1718.96655 ± 743.694
2025-09-14 18:31:55,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2656.8044), np.float32(1297.1836), np.float32(3057.3745), np.float32(2693.0544), np.float32(1124.1997), np.float32(1185.2428), np.float32(1090.913), np.float32(1258.0016), np.float32(1801.355), np.float32(1025.5369)]
2025-09-14 18:31:55,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:31:55,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 28 minutes, 10 seconds)
2025-09-14 18:35:18,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:35:32,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1609.69080 ± 284.298
2025-09-14 18:35:32,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1932.6685), np.float32(1445.6609), np.float32(1964.5168), np.float32(1403.3058), np.float32(1794.145), np.float32(1715.1647), np.float32(1153.0837), np.float32(1159.8726), np.float32(1813.1791), np.float32(1715.3114)]
2025-09-14 18:35:32,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:35:32,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 23 minutes, 51 seconds)
2025-09-14 18:38:59,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:39:11,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1628.33936 ± 483.915
2025-09-14 18:39:11,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1409.7887), np.float32(1116.3723), np.float32(2061.4631), np.float32(1360.4042), np.float32(2795.4548), np.float32(1359.85), np.float32(1177.6123), np.float32(1671.2958), np.float32(1920.9803), np.float32(1410.1719)]
2025-09-14 18:39:11,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:39:11,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 20 minutes, 1 second)
2025-09-14 18:42:40,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:42:53,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1687.02307 ± 511.336
2025-09-14 18:42:53,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1143.4163), np.float32(1915.5414), np.float32(1166.1829), np.float32(2620.6262), np.float32(1280.4244), np.float32(2388.148), np.float32(1465.4729), np.float32(1118.2155), np.float32(1798.8026), np.float32(1973.3997)]
2025-09-14 18:42:53,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:42:53,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 16 minutes, 40 seconds)
2025-09-14 18:46:20,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:46:33,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1737.98047 ± 483.269
2025-09-14 18:46:33,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1669.138), np.float32(1176.678), np.float32(1896.0806), np.float32(2378.7417), np.float32(2268.345), np.float32(1397.236), np.float32(2402.1174), np.float32(1149.9557), np.float32(1132.488), np.float32(1909.0242)]
2025-09-14 18:46:33,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:46:33,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 11 minutes, 56 seconds)
2025-09-14 18:49:59,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:50:10,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1689.98169 ± 295.675
2025-09-14 18:50:10,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2134.5173), np.float32(1631.4816), np.float32(1999.6416), np.float32(1550.0424), np.float32(1657.4634), np.float32(1883.8325), np.float32(1645.6338), np.float32(1196.9563), np.float32(1241.0234), np.float32(1959.2251)]
2025-09-14 18:50:10,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:50:10,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 6 minutes, 13 seconds)
2025-09-14 18:53:41,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:53:54,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1836.18591 ± 405.253
2025-09-14 18:53:54,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1699.3392), np.float32(2565.8413), np.float32(1641.4727), np.float32(1688.7504), np.float32(2514.7), np.float32(1891.4216), np.float32(1879.2976), np.float32(1839.9938), np.float32(1451.6763), np.float32(1189.3674)]
2025-09-14 18:53:54,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:53:54,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (1836.19) for latency 24
2025-09-14 18:53:54,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 3 minutes, 41 seconds)
2025-09-14 18:57:22,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:57:34,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1621.51929 ± 726.432
2025-09-14 18:57:34,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1438.021), np.float32(1173.4774), np.float32(2373.214), np.float32(2810.1445), np.float32(533.55286), np.float32(1408.6252), np.float32(2742.9492), np.float32(1082.8005), np.float32(1087.9718), np.float32(1564.4358)]
2025-09-14 18:57:34,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:57:34,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 7 seconds)
2025-09-14 19:00:46,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:00:58,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1826.67444 ± 557.820
2025-09-14 19:00:58,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1544.4558), np.float32(1733.9199), np.float32(1580.7728), np.float32(1438.371), np.float32(1290.7173), np.float32(1465.231), np.float32(3060.9414), np.float32(2735.1052), np.float32(1802.6919), np.float32(1614.538)]
2025-09-14 19:00:58,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:00:58,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 53 minutes, 36 seconds)
2025-09-14 19:04:03,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:04:14,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1655.95349 ± 494.078
2025-09-14 19:04:14,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2552.3613), np.float32(2236.8838), np.float32(1183.7211), np.float32(1993.7333), np.float32(1817.5726), np.float32(1570.5507), np.float32(1137.7904), np.float32(1865.7495), np.float32(1090.0236), np.float32(1111.1493)]
2025-09-14 19:04:14,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:04:14,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 46 minutes, 18 seconds)
2025-09-14 19:07:19,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:07:30,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1405.17969 ± 198.763
2025-09-14 19:07:30,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1618.5867), np.float32(1353.1737), np.float32(1561.3942), np.float32(1127.324), np.float32(1660.447), np.float32(1163.3972), np.float32(1493.4272), np.float32(1321.971), np.float32(1605.3665), np.float32(1146.7103)]
2025-09-14 19:07:30,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:07:30,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 39 minutes, 27 seconds)
2025-09-14 19:10:35,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:10:47,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1789.08423 ± 433.431
2025-09-14 19:10:47,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2502.8179), np.float32(1545.3014), np.float32(1508.6056), np.float32(1311.1078), np.float32(2722.0557), np.float32(1707.8115), np.float32(1758.5172), np.float32(1630.5001), np.float32(1723.654), np.float32(1480.4705)]
2025-09-14 19:10:47,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:10:47,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 31 minutes, 57 seconds)
2025-09-14 19:13:58,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:14:09,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1681.39331 ± 643.245
2025-09-14 19:14:09,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1171.226), np.float32(1506.3512), np.float32(1512.1245), np.float32(1375.8881), np.float32(1532.6744), np.float32(3032.5403), np.float32(1097.9791), np.float32(2790.356), np.float32(1127.622), np.float32(1667.1713)]
2025-09-14 19:14:09,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:14:09,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 25 minutes, 59 seconds)
2025-09-14 19:17:24,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:17:35,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1541.98120 ± 435.478
2025-09-14 19:17:35,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1184.7194), np.float32(1995.4357), np.float32(1565.892), np.float32(1290.1572), np.float32(1708.7738), np.float32(1389.7377), np.float32(1203.9033), np.float32(1082.1952), np.float32(2592.7202), np.float32(1406.276)]
2025-09-14 19:17:35,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:17:35,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 22 minutes, 55 seconds)
2025-09-14 19:20:49,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:20:59,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1835.57007 ± 612.542
2025-09-14 19:20:59,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2676.5098), np.float32(1080.9106), np.float32(2125.1147), np.float32(1350.5752), np.float32(1266.6368), np.float32(1130.3601), np.float32(1432.8157), np.float32(2195.9233), np.float32(2432.3992), np.float32(2664.4556)]
2025-09-14 19:20:59,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:20:59,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 20 minutes, 41 seconds)
2025-09-14 19:24:16,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:24:28,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1636.15271 ± 442.471
2025-09-14 19:24:28,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1481.4872), np.float32(2382.3008), np.float32(1474.9246), np.float32(1753.7917), np.float32(1161.0402), np.float32(2338.0562), np.float32(1371.0195), np.float32(1239.8542), np.float32(2018.3623), np.float32(1140.6904)]
2025-09-14 19:24:28,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:24:28,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 19 minutes, 5 seconds)
2025-09-14 19:27:42,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:27:54,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1462.72827 ± 379.317
2025-09-14 19:27:54,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1268.7456), np.float32(2404.9539), np.float32(1654.6559), np.float32(1536.8779), np.float32(1699.8656), np.float32(1456.5063), np.float32(1259.6616), np.float32(1190.2449), np.float32(1117.2136), np.float32(1038.5571)]
2025-09-14 19:27:54,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:27:54,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 16 minutes, 54 seconds)
2025-09-14 19:31:06,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:31:18,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1514.31519 ± 328.062
2025-09-14 19:31:18,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1376.5812), np.float32(1840.6406), np.float32(1409.378), np.float32(1195.4196), np.float32(1476.9559), np.float32(2315.6821), np.float32(1274.6396), np.float32(1264.039), np.float32(1686.817), np.float32(1302.9979)]
2025-09-14 19:31:18,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:31:18,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 13 minutes, 44 seconds)
2025-09-14 19:34:36,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:34:48,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1238.06885 ± 289.140
2025-09-14 19:34:48,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1254.9392), np.float32(790.99243), np.float32(1289.6606), np.float32(1133.169), np.float32(1205.2798), np.float32(2000.239), np.float32(1147.6102), np.float32(1126.4332), np.float32(1312.648), np.float32(1119.7158)]
2025-09-14 19:34:48,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:34:48,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 10 minutes, 49 seconds)
2025-09-14 19:38:01,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:38:12,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2039.10522 ± 740.902
2025-09-14 19:38:12,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3127.0278), np.float32(1373.7942), np.float32(1276.7617), np.float32(2507.937), np.float32(1998.0504), np.float32(1368.3096), np.float32(2746.078), np.float32(1478.7692), np.float32(3179.5603), np.float32(1334.7638)]
2025-09-14 19:38:12,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:38:12,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (2039.11) for latency 24
2025-09-14 19:38:12,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 7 minutes, 22 seconds)
2025-09-14 19:41:20,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:41:31,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1707.87537 ± 612.153
2025-09-14 19:41:31,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1241.1178), np.float32(1752.9272), np.float32(1844.2073), np.float32(1231.4921), np.float32(1400.6044), np.float32(2557.4673), np.float32(1473.7352), np.float32(1183.2534), np.float32(1286.7037), np.float32(3107.2454)]
2025-09-14 19:41:31,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:41:31,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 2 minutes, 48 seconds)
2025-09-14 19:44:37,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:44:48,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1613.02551 ± 572.367
2025-09-14 19:44:48,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1119.5381), np.float32(1394.3489), np.float32(1933.2291), np.float32(2820.823), np.float32(1163.7689), np.float32(1601.9127), np.float32(1204.1726), np.float32(2450.4841), np.float32(1371.8899), np.float32(1070.0878)]
2025-09-14 19:44:48,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:44:48,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 58 minutes, 23 seconds)
2025-09-14 19:47:53,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:48:05,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1314.52917 ± 273.037
2025-09-14 19:48:05,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1418.4412), np.float32(1405.6534), np.float32(1616.7545), np.float32(1444.974), np.float32(1284.2635), np.float32(1219.2345), np.float32(1048.9269), np.float32(686.46405), np.float32(1328.1052), np.float32(1692.4747)]
2025-09-14 19:48:05,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:48:05,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 54 minutes, 5 seconds)
2025-09-14 19:51:10,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:51:22,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2108.20142 ± 778.842
2025-09-14 19:51:22,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3305.4143), np.float32(1949.1625), np.float32(1842.9095), np.float32(1406.7506), np.float32(3146.9688), np.float32(1376.1448), np.float32(3350.5881), np.float32(1504.0571), np.float32(1649.7048), np.float32(1550.3118)]
2025-09-14 19:51:22,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:51:22,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (2108.20) for latency 24
2025-09-14 19:51:22,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 49 minutes, 24 seconds)
2025-09-14 19:54:37,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:54:49,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1625.33533 ± 572.356
2025-09-14 19:54:49,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1818.8447), np.float32(1171.7603), np.float32(1758.2947), np.float32(1393.4965), np.float32(3180.379), np.float32(1303.9805), np.float32(1221.1682), np.float32(1746.7051), np.float32(1537.4513), np.float32(1121.2737)]
2025-09-14 19:54:49,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:54:49,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 46 minutes, 20 seconds)
2025-09-14 19:58:01,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:58:12,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1727.88062 ± 589.871
2025-09-14 19:58:12,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1410.7108), np.float32(2678.6519), np.float32(1960.4008), np.float32(1854.9319), np.float32(1140.6171), np.float32(1619.162), np.float32(1282.2856), np.float32(1058.942), np.float32(1405.8154), np.float32(2867.2896)]
2025-09-14 19:58:12,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:58:12,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 43 minutes, 25 seconds)
2025-09-14 20:01:21,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:01:33,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1475.77319 ± 527.655
2025-09-14 20:01:33,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1258.1405), np.float32(1086.5007), np.float32(1304.9319), np.float32(1238.7239), np.float32(2906.5283), np.float32(1359.2236), np.float32(1081.8394), np.float32(1269.3601), np.float32(1939.4556), np.float32(1313.0292)]
2025-09-14 20:01:33,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:01:33,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 40 minutes, 25 seconds)
2025-09-14 20:04:37,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:04:49,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1656.32593 ± 589.163
2025-09-14 20:04:49,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1205.4835), np.float32(1292.148), np.float32(1066.9941), np.float32(1765.7926), np.float32(1689.8213), np.float32(1678.8273), np.float32(1337.8588), np.float32(1494.0332), np.float32(3280.3398), np.float32(1751.9603)]
2025-09-14 20:04:49,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:04:49,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 37 minutes, 3 seconds)
2025-09-14 20:07:55,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:08:07,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1412.28723 ± 455.363
2025-09-14 20:08:07,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1203.6401), np.float32(2691.8357), np.float32(1578.6144), np.float32(1037.5043), np.float32(1256.656), np.float32(1322.2334), np.float32(1407.8741), np.float32(1054.228), np.float32(1159.1277), np.float32(1411.1578)]
2025-09-14 20:08:07,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:08:07,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 33 minutes, 45 seconds)
2025-09-14 20:11:16,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:11:28,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1819.77368 ± 661.301
2025-09-14 20:11:28,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3288.5085), np.float32(1224.8931), np.float32(2153.1123), np.float32(1365.8776), np.float32(2620.414), np.float32(1054.8071), np.float32(1697.886), np.float32(1588.5227), np.float32(1880.674), np.float32(1323.0408)]
2025-09-14 20:11:28,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:11:28,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 29 minutes, 53 seconds)
2025-09-14 20:14:50,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:15:02,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1773.11792 ± 638.033
2025-09-14 20:15:02,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1933.8367), np.float32(1571.2124), np.float32(1216.9675), np.float32(2445.4631), np.float32(1158.3843), np.float32(1230.7657), np.float32(2129.8818), np.float32(1518.6976), np.float32(3237.3887), np.float32(1288.5824)]
2025-09-14 20:15:02,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:15:02,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 27 minutes, 31 seconds)
2025-09-14 20:18:23,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:18:35,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1594.64307 ± 653.257
2025-09-14 20:18:35,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2089.9307), np.float32(1649.0637), np.float32(1865.3218), np.float32(1171.2273), np.float32(1167.0746), np.float32(1161.8014), np.float32(1107.1835), np.float32(3290.1677), np.float32(1163.9718), np.float32(1280.6888)]
2025-09-14 20:18:35,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:18:35,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 25 minutes, 11 seconds)
2025-09-14 20:21:50,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:22:01,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1675.34204 ± 786.650
2025-09-14 20:22:01,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1980.8265), np.float32(2101.4497), np.float32(1131.3934), np.float32(1313.0753), np.float32(1983.6604), np.float32(1594.3833), np.float32(199.01787), np.float32(1608.2029), np.float32(3442.8098), np.float32(1398.6008)]
2025-09-14 20:22:01,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:22:01,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 22 minutes, 35 seconds)
2025-09-14 20:25:12,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:25:22,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2047.49927 ± 708.230
2025-09-14 20:25:22,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1477.6888), np.float32(1520.0564), np.float32(2917.9077), np.float32(1553.9508), np.float32(1725.7888), np.float32(1521.5728), np.float32(1145.712), np.float32(3000.4185), np.float32(3088.7485), np.float32(2523.1477)]
2025-09-14 20:25:22,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:25:22,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 19 minutes, 21 seconds)
2025-09-14 20:28:31,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:28:41,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2126.13037 ± 774.594
2025-09-14 20:28:41,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3276.873), np.float32(2170.3025), np.float32(1772.4609), np.float32(1754.7827), np.float32(3213.1648), np.float32(3164.3364), np.float32(1255.8201), np.float32(1919.9852), np.float32(1681.5659), np.float32(1052.0099)]
2025-09-14 20:28:41,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:28:41,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (2126.13) for latency 24
2025-09-14 20:28:41,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 15 minutes, 44 seconds)
2025-09-14 20:31:51,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:32:01,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1918.65259 ± 622.181
2025-09-14 20:32:01,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2698.9365), np.float32(3272.069), np.float32(1307.4296), np.float32(1535.936), np.float32(1607.33), np.float32(2003.7098), np.float32(2202.2708), np.float32(1130.9366), np.float32(1610.1556), np.float32(1817.7515)]
2025-09-14 20:32:01,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:32:01,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 11 minutes, 19 seconds)
2025-09-14 20:35:11,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:35:22,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1869.75122 ± 648.862
2025-09-14 20:35:22,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2356.6226), np.float32(1249.7716), np.float32(1471.8198), np.float32(2609.706), np.float32(3076.3308), np.float32(1084.426), np.float32(1192.6915), np.float32(2158.238), np.float32(1402.8838), np.float32(2095.0222)]
2025-09-14 20:35:22,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:35:22,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 7 minutes, 10 seconds)
2025-09-14 20:38:38,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:38:50,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1355.73926 ± 607.798
2025-09-14 20:38:50,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1745.4747), np.float32(1279.0732), np.float32(269.8404), np.float32(1089.1102), np.float32(1111.5486), np.float32(1137.619), np.float32(2765.288), np.float32(1698.9639), np.float32(1376.531), np.float32(1083.9431)]
2025-09-14 20:38:50,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:38:50,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 3 minutes, 55 seconds)
2025-09-14 20:42:08,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:42:20,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2077.92432 ± 750.432
2025-09-14 20:42:20,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1173.1528), np.float32(1503.0233), np.float32(1302.4187), np.float32(1926.0822), np.float32(1854.923), np.float32(3150.9778), np.float32(1697.6093), np.float32(1808.6245), np.float32(3231.5396), np.float32(3130.8906)]
2025-09-14 20:42:20,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:42:20,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 1 minute, 5 seconds)
2025-09-14 20:45:27,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:45:38,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1656.95508 ± 610.514
2025-09-14 20:45:38,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1161.5997), np.float32(1395.9176), np.float32(1689.4048), np.float32(2211.1858), np.float32(1220.9714), np.float32(3269.478), np.float32(1466.186), np.float32(1243.5845), np.float32(1571.5345), np.float32(1339.689)]
2025-09-14 20:45:38,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:45:38,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 57 minutes, 38 seconds)
2025-09-14 20:48:42,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:48:54,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1994.02563 ± 595.174
2025-09-14 20:48:54,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3179.6855), np.float32(2101.876), np.float32(1395.2924), np.float32(1504.666), np.float32(1109.6592), np.float32(1564.7466), np.float32(2267.871), np.float32(2184.9866), np.float32(1967.8093), np.float32(2663.6636)]
2025-09-14 20:48:54,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:48:54,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 54 minutes)
2025-09-14 20:51:59,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:52:10,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2230.79736 ± 908.344
2025-09-14 20:52:10,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2790.3013), np.float32(2679.857), np.float32(3283.1746), np.float32(1241.3734), np.float32(1299.5367), np.float32(3538.0652), np.float32(1310.7506), np.float32(1915.7242), np.float32(3150.5645), np.float32(1098.6265)]
2025-09-14 20:52:10,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:52:10,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (2230.80) for latency 24
2025-09-14 20:52:10,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 50 minutes, 23 seconds)
2025-09-14 20:55:17,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:55:28,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 3340.07935 ± 936.382
2025-09-14 20:55:28,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3448.2725), np.float32(2056.2996), np.float32(3896.6357), np.float32(3704.125), np.float32(3643.8806), np.float32(3868.664), np.float32(3994.0342), np.float32(3789.986), np.float32(1041.548), np.float32(3957.344)]
2025-09-14 20:55:28,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:55:28,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (3340.08) for latency 24
2025-09-14 20:55:28,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 46 minutes, 33 seconds)
2025-09-14 20:58:42,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:58:54,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 3368.77612 ± 649.891
2025-09-14 20:58:54,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3437.6238), np.float32(3724.207), np.float32(3460.866), np.float32(3832.9219), np.float32(3914.052), np.float32(2214.4517), np.float32(2259.8372), np.float32(4145.307), np.float32(2899.0789), np.float32(3799.4175)]
2025-09-14 20:58:54,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:58:54,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (3368.78) for latency 24
2025-09-14 20:58:54,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 43 minutes, 3 seconds)
2025-09-14 21:02:04,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:02:15,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2913.89404 ± 909.036
2025-09-14 21:02:15,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3986.7954), np.float32(3505.123), np.float32(1962.694), np.float32(3079.248), np.float32(3777.9226), np.float32(2488.49), np.float32(1756.4049), np.float32(3331.6233), np.float32(1356.0042), np.float32(3894.6343)]
2025-09-14 21:02:15,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:02:16,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 39 minutes, 54 seconds)
2025-09-14 21:05:23,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:05:34,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 3641.14111 ± 851.814
2025-09-14 21:05:34,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4196.284), np.float32(1600.1422), np.float32(4314.3696), np.float32(4124.7437), np.float32(2449.7583), np.float32(3562.644), np.float32(4099.8325), np.float32(3978.3318), np.float32(3935.8306), np.float32(4149.4756)]
2025-09-14 21:05:34,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:05:34,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (3641.14) for latency 24
2025-09-14 21:05:34,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 36 minutes, 40 seconds)
2025-09-14 21:08:40,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:08:51,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2792.04565 ± 1068.248
2025-09-14 21:08:51,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3373.2314), np.float32(1077.5977), np.float32(1151.2878), np.float32(1744.2487), np.float32(3736.3062), np.float32(3593.0579), np.float32(3815.3904), np.float32(3638.5654), np.float32(2179.1357), np.float32(3611.6375)]
2025-09-14 21:08:51,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:08:51,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 33 minutes, 21 seconds)
2025-09-14 21:12:00,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:12:11,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2836.99292 ± 1151.810
2025-09-14 21:12:11,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3933.9836), np.float32(4289.687), np.float32(4212.276), np.float32(1077.6307), np.float32(2300.9348), np.float32(3158.3005), np.float32(1483.0083), np.float32(1449.1351), np.float32(3734.1558), np.float32(2730.8162)]
2025-09-14 21:12:11,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:12:11,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 30 minutes, 5 seconds)
2025-09-14 21:15:17,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:15:28,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2963.05420 ± 1105.727
2025-09-14 21:15:28,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4100.9272), np.float32(3883.056), np.float32(3743.5325), np.float32(1810.5396), np.float32(1702.0492), np.float32(1769.3344), np.float32(3672.623), np.float32(3445.6313), np.float32(4218.8506), np.float32(1283.9984)]
2025-09-14 21:15:28,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:15:28,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 26 minutes, 30 seconds)
2025-09-14 21:18:41,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:18:52,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 3048.16113 ± 1018.102
2025-09-14 21:18:52,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3691.0251), np.float32(3878.1409), np.float32(4006.4731), np.float32(3662.8447), np.float32(3161.323), np.float32(1478.6847), np.float32(3946.042), np.float32(3522.452), np.float32(1375.0266), np.float32(1759.5983)]
2025-09-14 21:18:52,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:18:52,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 23 minutes, 15 seconds)
2025-09-14 21:22:06,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:22:17,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 3157.04932 ± 966.682
2025-09-14 21:22:17,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2948.6875), np.float32(3854.5586), np.float32(2881.7869), np.float32(1688.9021), np.float32(2977.6887), np.float32(4400.262), np.float32(3772.879), np.float32(1279.507), np.float32(4036.5732), np.float32(3729.6465)]
2025-09-14 21:22:17,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:22:18,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 20 minutes, 4 seconds)
2025-09-14 21:25:27,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:25:38,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2807.60669 ± 1066.676
2025-09-14 21:25:38,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1279.2448), np.float32(3656.105), np.float32(1098.9846), np.float32(1781.8844), np.float32(3475.9014), np.float32(3264.457), np.float32(3582.858), np.float32(2065.7676), np.float32(3936.2224), np.float32(3934.6404)]
2025-09-14 21:25:38,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:25:38,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 16 minutes, 47 seconds)
2025-09-14 21:28:44,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:28:56,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2756.46753 ± 1045.095
2025-09-14 21:28:56,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2650.2844), np.float32(3994.9082), np.float32(4107.7905), np.float32(2514.248), np.float32(3701.2498), np.float32(1031.354), np.float32(2317.5547), np.float32(3828.5828), np.float32(1498.5397), np.float32(1920.1637)]
2025-09-14 21:28:56,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:28:56,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 13 minutes, 23 seconds)
2025-09-14 21:32:07,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:32:19,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2706.54688 ± 1328.325
2025-09-14 21:32:19,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1239.7582), np.float32(1107.7227), np.float32(4064.7861), np.float32(4086.2249), np.float32(4111.8623), np.float32(1097.6298), np.float32(3831.3423), np.float32(2584.513), np.float32(1199.2579), np.float32(3742.37)]
2025-09-14 21:32:19,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:32:19,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 10 minutes, 6 seconds)
2025-09-14 21:35:24,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:35:35,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2711.98291 ± 1205.248
2025-09-14 21:35:35,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3993.8691), np.float32(1364.3038), np.float32(3738.067), np.float32(3803.5303), np.float32(3143.4377), np.float32(1160.3463), np.float32(3810.263), np.float32(1217.8339), np.float32(1293.7699), np.float32(3594.4077)]
2025-09-14 21:35:35,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:35:35,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 41 seconds)
2025-09-14 21:38:46,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:38:58,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2826.58008 ± 906.968
2025-09-14 21:38:58,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3620.9666), np.float32(2969.7168), np.float32(3421.5334), np.float32(3979.2444), np.float32(2669.3796), np.float32(3660.7844), np.float32(3075.8135), np.float32(2335.0852), np.float32(1284.3662), np.float32(1248.9098)]
2025-09-14 21:38:58,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:38:58,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 20 seconds)
2025-09-14 21:42:15,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:42:28,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2370.56006 ± 1355.646
2025-09-14 21:42:28,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2187.491), np.float32(4035.427), np.float32(1919.0391), np.float32(4192.267), np.float32(1169.0751), np.float32(17.143715), np.float32(3359.9495), np.float32(3877.935), np.float32(1151.2875), np.float32(1795.9847)]
2025-09-14 21:42:28,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:42:28,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1251 [DEBUG]: Training session finished
