2025-09-14 14:19:49,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.150-delay_21
2025-09-14 14:19:49,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.150-delay_21
2025-09-14 14:19:49,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'21': <latency_env.delayed_mdp.ConstantDelay object at 0x7f50b3bebd40>}
2025-09-14 14:19:49,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 14:19:49,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 14:19:49,563 baseline-bpql-noisepromille150-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=143, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 14:19:49,564 baseline-bpql-noisepromille150-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 14:19:51,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 14:19:51,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 14:23:00,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:23:11,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: -460.76544 ± 75.246
2025-09-14 14:23:11,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-536.8724), np.float32(-515.0816), np.float32(-428.72543), np.float32(-465.52472), np.float32(-502.03036), np.float32(-415.84238), np.float32(-485.22235), np.float32(-555.6999), np.float32(-282.25003), np.float32(-420.4054)]
2025-09-14 14:23:11,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:23:11,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (-460.77) for latency 21
2025-09-14 14:23:11,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 31 minutes, 17 seconds)
2025-09-14 14:26:23,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:26:34,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: -285.38434 ± 25.354
2025-09-14 14:26:34,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-294.14282), np.float32(-305.2252), np.float32(-284.33218), np.float32(-265.2151), np.float32(-248.20332), np.float32(-318.95374), np.float32(-323.95428), np.float32(-296.51193), np.float32(-258.3794), np.float32(-258.92545)]
2025-09-14 14:26:34,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:26:34,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (-285.38) for latency 21
2025-09-14 14:26:34,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 29 minutes, 11 seconds)
2025-09-14 14:29:46,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:29:57,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: -181.72917 ± 46.476
2025-09-14 14:29:57,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-201.69174), np.float32(-227.06625), np.float32(-147.4451), np.float32(-227.65575), np.float32(-223.86479), np.float32(-184.21721), np.float32(-161.52045), np.float32(-187.07072), np.float32(-65.33082), np.float32(-191.42896)]
2025-09-14 14:29:57,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:29:57,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (-181.73) for latency 21
2025-09-14 14:29:57,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 26 minutes, 58 seconds)
2025-09-14 14:33:10,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:33:21,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: -162.41142 ± 51.874
2025-09-14 14:33:21,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-205.62056), np.float32(-173.33415), np.float32(-140.22914), np.float32(-202.51164), np.float32(-152.88345), np.float32(-140.21327), np.float32(-133.19989), np.float32(-83.548615), np.float32(-116.06502), np.float32(-276.50845)]
2025-09-14 14:33:21,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:33:21,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (-162.41) for latency 21
2025-09-14 14:33:21,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 24 minutes, 17 seconds)
2025-09-14 14:36:33,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:36:44,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: -90.00352 ± 58.716
2025-09-14 14:36:44,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-67.47423), np.float32(-66.12689), np.float32(-58.307053), np.float32(-77.43107), np.float32(-31.154911), np.float32(-110.28537), np.float32(-174.87567), np.float32(-8.608081), np.float32(-210.43677), np.float32(-95.33527)]
2025-09-14 14:36:44,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:36:44,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (-90.00) for latency 21
2025-09-14 14:36:44,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 21 minutes)
2025-09-14 14:39:59,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:40:10,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 96.81985 ± 71.643
2025-09-14 14:40:10,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(197.44682), np.float32(81.663574), np.float32(75.34669), np.float32(102.64333), np.float32(160.15213), np.float32(-75.503716), np.float32(124.708374), np.float32(163.15733), np.float32(63.098877), np.float32(75.48504)]
2025-09-14 14:40:10,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:40:10,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (96.82) for latency 21
2025-09-14 14:40:10,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 19 minutes, 6 seconds)
2025-09-14 14:43:22,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:43:33,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 293.40021 ± 117.170
2025-09-14 14:43:33,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(308.58585), np.float32(-38.63785), np.float32(348.60104), np.float32(274.55716), np.float32(402.70972), np.float32(342.4021), np.float32(279.70316), np.float32(374.51285), np.float32(340.65634), np.float32(300.91177)]
2025-09-14 14:43:33,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:43:33,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (293.40) for latency 21
2025-09-14 14:43:33,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 15 minutes, 58 seconds)
2025-09-14 14:46:44,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:46:55,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 406.60532 ± 298.547
2025-09-14 14:46:55,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(632.9683), np.float32(489.69016), np.float32(614.4546), np.float32(-412.59335), np.float32(243.2204), np.float32(641.8623), np.float32(439.88187), np.float32(371.62985), np.float32(584.3239), np.float32(460.61505)]
2025-09-14 14:46:55,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:46:55,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (406.61) for latency 21
2025-09-14 14:46:55,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 12 minutes, 10 seconds)
2025-09-14 14:50:07,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:50:18,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 629.28204 ± 153.649
2025-09-14 14:50:18,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(798.4256), np.float32(481.15216), np.float32(700.61743), np.float32(767.7645), np.float32(767.1781), np.float32(603.238), np.float32(528.43164), np.float32(288.80518), np.float32(605.7777), np.float32(751.42993)]
2025-09-14 14:50:18,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:50:18,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (629.28) for latency 21
2025-09-14 14:50:18,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 8 minutes, 22 seconds)
2025-09-14 14:53:32,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:53:44,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 751.00494 ± 132.277
2025-09-14 14:53:44,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(907.209), np.float32(777.6968), np.float32(812.2419), np.float32(445.49445), np.float32(866.06366), np.float32(697.8888), np.float32(586.3641), np.float32(795.62036), np.float32(795.2118), np.float32(826.2585)]
2025-09-14 14:53:44,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:53:44,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (751.00) for latency 21
2025-09-14 14:53:44,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 5 minutes, 47 seconds)
2025-09-14 14:56:57,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:57:08,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 707.45978 ± 144.524
2025-09-14 14:57:08,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(871.3717), np.float32(885.1121), np.float32(458.84753), np.float32(668.64404), np.float32(875.2937), np.float32(688.8177), np.float32(608.5729), np.float32(625.23834), np.float32(840.96655), np.float32(551.7325)]
2025-09-14 14:57:08,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:57:08,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 1 minute, 57 seconds)
2025-09-14 15:00:19,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:00:30,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 911.93768 ± 117.826
2025-09-14 15:00:30,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(886.36487), np.float32(625.5282), np.float32(896.1882), np.float32(1052.4117), np.float32(979.0629), np.float32(980.1712), np.float32(874.66003), np.float32(1062.1414), np.float32(905.8084), np.float32(857.0398)]
2025-09-14 15:00:30,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:00:30,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (911.94) for latency 21
2025-09-14 15:00:30,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 58 minutes, 12 seconds)
2025-09-14 15:03:43,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:03:54,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1046.85645 ± 185.690
2025-09-14 15:03:54,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(950.4506), np.float32(958.0), np.float32(911.13525), np.float32(970.70337), np.float32(1196.2701), np.float32(1294.0472), np.float32(938.0567), np.float32(889.49994), np.float32(1452.056), np.float32(908.3456)]
2025-09-14 15:03:54,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:03:54,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1046.86) for latency 21
2025-09-14 15:03:54,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 55 minutes, 21 seconds)
2025-09-14 15:07:08,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:07:19,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1090.65710 ± 166.949
2025-09-14 15:07:19,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(869.6687), np.float32(1090.7429), np.float32(1331.0859), np.float32(944.7455), np.float32(939.8707), np.float32(1078.6012), np.float32(1063.7216), np.float32(966.2321), np.float32(1225.3955), np.float32(1396.5066)]
2025-09-14 15:07:19,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:07:19,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1090.66) for latency 21
2025-09-14 15:07:19,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 52 minutes, 49 seconds)
2025-09-14 15:10:35,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:10:46,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1071.14734 ± 212.977
2025-09-14 15:10:46,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1103.5625), np.float32(1323.5574), np.float32(1445.7498), np.float32(837.9988), np.float32(1332.6117), np.float32(920.15784), np.float32(901.77216), np.float32(1049.5269), np.float32(982.0922), np.float32(814.44525)]
2025-09-14 15:10:46,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:10:46,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 49 minutes, 40 seconds)
2025-09-14 15:13:57,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:14:08,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1125.10278 ± 169.448
2025-09-14 15:14:08,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1304.5845), np.float32(1385.9897), np.float32(945.1218), np.float32(970.61237), np.float32(973.59814), np.float32(1059.9287), np.float32(1204.0729), np.float32(1042.8429), np.float32(977.34515), np.float32(1386.9323)]
2025-09-14 15:14:08,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:14:08,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1125.10) for latency 21
2025-09-14 15:14:08,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 45 minutes, 45 seconds)
2025-09-14 15:17:20,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:17:31,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1133.43787 ± 186.824
2025-09-14 15:17:31,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1258.4531), np.float32(1050.7792), np.float32(1627.0886), np.float32(1076.4785), np.float32(1041.9991), np.float32(1005.9933), np.float32(1077.0868), np.float32(964.26715), np.float32(1009.72406), np.float32(1222.5085)]
2025-09-14 15:17:31,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:17:31,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1133.44) for latency 21
2025-09-14 15:17:31,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 42 minutes, 38 seconds)
2025-09-14 15:20:44,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:20:56,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1331.39465 ± 211.343
2025-09-14 15:20:56,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1036.8989), np.float32(1457.7976), np.float32(1450.609), np.float32(954.83295), np.float32(1441.3129), np.float32(1584.2805), np.float32(1431.5894), np.float32(1409.877), np.float32(1484.2993), np.float32(1062.4486)]
2025-09-14 15:20:56,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:20:56,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1331.39) for latency 21
2025-09-14 15:20:56,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 39 minutes, 18 seconds)
2025-09-14 15:24:10,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:24:21,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1280.54895 ± 269.285
2025-09-14 15:24:21,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1264.7141), np.float32(1092.7517), np.float32(1215.1691), np.float32(1933.7844), np.float32(995.48737), np.float32(1503.3412), np.float32(1112.9115), np.float32(1002.292), np.float32(1427.7415), np.float32(1257.2965)]
2025-09-14 15:24:21,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:24:21,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 35 minutes, 53 seconds)
2025-09-14 15:27:33,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:27:44,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1360.87671 ± 236.006
2025-09-14 15:27:44,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1583.2169), np.float32(1768.0444), np.float32(1318.0718), np.float32(1568.9109), np.float32(1156.6757), np.float32(1458.0426), np.float32(1441.1017), np.float32(1110.6029), np.float32(1247.7548), np.float32(956.34436)]
2025-09-14 15:27:44,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:27:44,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1360.88) for latency 21
2025-09-14 15:27:44,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 31 minutes, 27 seconds)
2025-09-14 15:30:56,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:31:07,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1334.58154 ± 790.188
2025-09-14 15:31:07,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2582.6204), np.float32(1265.2782), np.float32(928.88336), np.float32(939.44653), np.float32(1058.4692), np.float32(1417.2365), np.float32(1264.1619), np.float32(2884.1877), np.float32(8.9567), np.float32(996.5755)]
2025-09-14 15:31:07,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:31:07,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 28 minutes, 17 seconds)
2025-09-14 15:34:19,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:34:30,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1566.64685 ± 438.129
2025-09-14 15:34:30,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1580.3179), np.float32(1125.6256), np.float32(1316.7919), np.float32(1631.3843), np.float32(2588.6455), np.float32(1206.9171), np.float32(1548.4026), np.float32(1921.088), np.float32(1750.4224), np.float32(996.8727)]
2025-09-14 15:34:30,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:34:30,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1566.65) for latency 21
2025-09-14 15:34:30,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 24 minutes, 59 seconds)
2025-09-14 15:37:44,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:37:55,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1316.00818 ± 488.589
2025-09-14 15:37:55,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(987.9884), np.float32(1017.5893), np.float32(1093.8052), np.float32(1041.1641), np.float32(1234.7798), np.float32(1506.1743), np.float32(967.3929), np.float32(2694.6555), np.float32(1291.7383), np.float32(1324.795)]
2025-09-14 15:37:55,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:37:55,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 21 minutes, 40 seconds)
2025-09-14 15:41:07,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:41:17,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1578.69312 ± 603.540
2025-09-14 15:41:17,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1345.689), np.float32(2799.7253), np.float32(934.8104), np.float32(1026.4231), np.float32(2329.4712), np.float32(990.5108), np.float32(2139.3289), np.float32(1362.4136), np.float32(1605.9698), np.float32(1252.5897)]
2025-09-14 15:41:17,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:41:17,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1578.69) for latency 21
2025-09-14 15:41:17,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 17 minutes, 27 seconds)
2025-09-14 15:44:31,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:44:43,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1420.68726 ± 431.800
2025-09-14 15:44:43,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1032.7324), np.float32(1200.0656), np.float32(958.21875), np.float32(1774.6619), np.float32(2052.9097), np.float32(1014.61945), np.float32(1435.6876), np.float32(907.07324), np.float32(1978.4967), np.float32(1852.4073)]
2025-09-14 15:44:43,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:44:43,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 14 minutes, 40 seconds)
2025-09-14 15:47:55,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:48:06,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1499.52954 ± 392.578
2025-09-14 15:48:06,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1103.3917), np.float32(1442.7104), np.float32(1591.7205), np.float32(1507.5703), np.float32(1359.9962), np.float32(1815.7189), np.float32(1129.3127), np.float32(1894.2417), np.float32(2264.034), np.float32(886.599)]
2025-09-14 15:48:06,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:48:06,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 11 minutes, 24 seconds)
2025-09-14 15:51:19,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:51:30,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1411.84717 ± 465.306
2025-09-14 15:51:30,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1695.8755), np.float32(1293.1647), np.float32(1967.1172), np.float32(1076.9303), np.float32(967.084), np.float32(1442.559), np.float32(2445.758), np.float32(1249.9854), np.float32(990.20044), np.float32(989.7971)]
2025-09-14 15:51:30,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:51:30,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 8 minutes, 9 seconds)
2025-09-14 15:54:42,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:54:52,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1497.45935 ± 402.627
2025-09-14 15:54:52,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1794.2008), np.float32(1141.2169), np.float32(1463.3804), np.float32(1561.9313), np.float32(1920.8574), np.float32(1826.2856), np.float32(976.945), np.float32(1164.7283), np.float32(2163.8975), np.float32(961.1508)]
2025-09-14 15:54:52,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:54:52,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 4 minutes, 7 seconds)
2025-09-14 15:58:07,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:58:19,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1612.85522 ± 422.331
2025-09-14 15:58:19,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1220.2369), np.float32(1530.2004), np.float32(1241.009), np.float32(1698.9467), np.float32(2725.4094), np.float32(1297.5723), np.float32(1652.5211), np.float32(1302.0067), np.float32(1646.6865), np.float32(1813.9639)]
2025-09-14 15:58:19,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:58:19,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1612.86) for latency 21
2025-09-14 15:58:19,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 1 minute, 41 seconds)
2025-09-14 16:01:34,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:01:45,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1443.29260 ± 462.888
2025-09-14 16:01:45,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1752.7239), np.float32(1002.33606), np.float32(2104.7466), np.float32(1791.4265), np.float32(1242.6205), np.float32(978.4475), np.float32(958.9152), np.float32(1589.5242), np.float32(884.201), np.float32(2127.984)]
2025-09-14 16:01:45,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:01:45,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 58 minutes, 35 seconds)
2025-09-14 16:04:59,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:05:10,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1192.10706 ± 256.203
2025-09-14 16:05:10,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(889.2606), np.float32(1120.5419), np.float32(959.97833), np.float32(1663.6423), np.float32(1488.6694), np.float32(973.3786), np.float32(910.7772), np.float32(1446.8186), np.float32(1232.3812), np.float32(1235.6235)]
2025-09-14 16:05:10,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:05:10,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 55 minutes, 32 seconds)
2025-09-14 16:08:21,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:08:32,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1279.74939 ± 245.574
2025-09-14 16:08:32,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(986.2117), np.float32(1372.0835), np.float32(1207.8693), np.float32(1699.6217), np.float32(1053.3685), np.float32(1574.7433), np.float32(1326.3901), np.float32(882.1182), np.float32(1445.4495), np.float32(1249.6389)]
2025-09-14 16:08:32,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:08:32,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 51 minutes, 38 seconds)
2025-09-14 16:11:47,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:11:58,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1448.95508 ± 434.486
2025-09-14 16:11:58,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1555.1704), np.float32(1245.6909), np.float32(1018.75867), np.float32(1216.5038), np.float32(1464.761), np.float32(2335.645), np.float32(964.7999), np.float32(1948.6282), np.float32(1765.397), np.float32(974.1956)]
2025-09-14 16:11:58,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:11:58,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 49 minutes, 7 seconds)
2025-09-14 16:15:12,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:15:23,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1124.24194 ± 219.697
2025-09-14 16:15:23,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(924.42896), np.float32(1151.6664), np.float32(1101.2396), np.float32(917.0231), np.float32(1077.2217), np.float32(1692.5265), np.float32(1295.4036), np.float32(1130.9238), np.float32(970.1146), np.float32(981.87146)]
2025-09-14 16:15:23,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:15:23,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 45 minutes, 26 seconds)
2025-09-14 16:18:36,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:18:48,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1359.14282 ± 302.045
2025-09-14 16:18:48,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1432.6307), np.float32(1412.8481), np.float32(1963.5778), np.float32(994.3312), np.float32(1034.1721), np.float32(1144.807), np.float32(1688.6925), np.float32(1104.646), np.float32(1602.7084), np.float32(1213.0146)]
2025-09-14 16:18:48,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:18:48,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 41 minutes, 32 seconds)
2025-09-14 16:21:58,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:22:10,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1578.30566 ± 422.083
2025-09-14 16:22:10,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1325.9513), np.float32(961.84064), np.float32(2135.9626), np.float32(1766.4005), np.float32(2269.3015), np.float32(1751.2509), np.float32(1305.2471), np.float32(1840.573), np.float32(1020.614), np.float32(1405.9146)]
2025-09-14 16:22:10,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:22:10,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 37 minutes, 26 seconds)
2025-09-14 16:25:22,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:25:33,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1352.19580 ± 583.728
2025-09-14 16:25:33,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1944.0798), np.float32(1687.9805), np.float32(2157.7537), np.float32(1036.4897), np.float32(1729.6938), np.float32(938.11694), np.float32(-12.54452), np.float32(1234.445), np.float32(1392.7883), np.float32(1413.1552)]
2025-09-14 16:25:33,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:25:33,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 34 minutes, 27 seconds)
2025-09-14 16:28:48,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:28:59,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1285.96704 ± 322.460
2025-09-14 16:28:59,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1110.2905), np.float32(1019.61365), np.float32(1121.5128), np.float32(1544.1285), np.float32(1411.7931), np.float32(952.7335), np.float32(1822.1276), np.float32(876.85333), np.float32(1208.1577), np.float32(1792.4594)]
2025-09-14 16:28:59,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:28:59,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 30 minutes, 54 seconds)
2025-09-14 16:32:11,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:32:22,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1684.98865 ± 464.467
2025-09-14 16:32:22,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1275.7172), np.float32(2660.0005), np.float32(1561.0032), np.float32(2332.3464), np.float32(1360.4011), np.float32(1873.8021), np.float32(1278.1978), np.float32(1143.1381), np.float32(1730.5819), np.float32(1634.6985)]
2025-09-14 16:32:22,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:32:22,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1684.99) for latency 21
2025-09-14 16:32:22,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 27 minutes, 3 seconds)
2025-09-14 16:35:36,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:35:47,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1766.80566 ± 672.142
2025-09-14 16:35:47,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1529.0736), np.float32(2825.2432), np.float32(2712.7747), np.float32(1522.8783), np.float32(1264.6929), np.float32(942.144), np.float32(1484.9557), np.float32(1455.2815), np.float32(2739.762), np.float32(1191.2494)]
2025-09-14 16:35:47,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:35:47,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1766.81) for latency 21
2025-09-14 16:35:47,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 23 minutes, 53 seconds)
2025-09-14 16:39:00,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:39:11,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1195.14050 ± 451.307
2025-09-14 16:39:11,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(185.00204), np.float32(997.8478), np.float32(1502.0366), np.float32(1088.411), np.float32(1460.0603), np.float32(1055.9711), np.float32(1073.2058), np.float32(1787.3198), np.float32(993.1999), np.float32(1808.3502)]
2025-09-14 16:39:11,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:39:11,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 20 minutes, 51 seconds)
2025-09-14 16:42:23,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:42:34,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1492.57263 ± 517.049
2025-09-14 16:42:34,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1873.8068), np.float32(1666.5537), np.float32(1012.72205), np.float32(1077.8849), np.float32(1363.7786), np.float32(2435.5242), np.float32(1045.6018), np.float32(960.36993), np.float32(2283.0833), np.float32(1206.4009)]
2025-09-14 16:42:34,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:42:34,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 17 minutes, 20 seconds)
2025-09-14 16:45:46,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:45:57,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1383.85144 ± 338.409
2025-09-14 16:45:57,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1539.2458), np.float32(1128.1674), np.float32(1152.2693), np.float32(1824.4918), np.float32(983.0315), np.float32(1604.8071), np.float32(1836.623), np.float32(1094.432), np.float32(946.9609), np.float32(1728.4851)]
2025-09-14 16:45:57,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:45:57,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 13 minutes, 29 seconds)
2025-09-14 16:49:11,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:49:23,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1391.15320 ± 401.959
2025-09-14 16:49:23,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(975.18567), np.float32(1441.2708), np.float32(1277.6636), np.float32(884.54285), np.float32(2015.1543), np.float32(901.90424), np.float32(1650.0265), np.float32(1157.059), np.float32(1606.1279), np.float32(2002.598)]
2025-09-14 16:49:23,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:49:23,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 10 minutes, 32 seconds)
2025-09-14 16:52:36,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:52:47,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1158.25659 ± 301.968
2025-09-14 16:52:47,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1591.818), np.float32(974.3132), np.float32(1393.7256), np.float32(1379.1041), np.float32(981.99915), np.float32(955.04126), np.float32(1319.6611), np.float32(544.32104), np.float32(994.8651), np.float32(1447.7172)]
2025-09-14 16:52:47,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:52:47,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 7 minutes, 3 seconds)
2025-09-14 16:55:59,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:56:10,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1533.90173 ± 561.483
2025-09-14 16:56:10,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1112.7184), np.float32(1713.8113), np.float32(1605.5752), np.float32(737.7225), np.float32(997.7067), np.float32(2654.8108), np.float32(1952.6727), np.float32(2122.6729), np.float32(1364.9325), np.float32(1076.3944)]
2025-09-14 16:56:10,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:56:10,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 3 minutes, 27 seconds)
2025-09-14 16:59:22,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:59:33,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1624.99463 ± 466.190
2025-09-14 16:59:33,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2487.921), np.float32(1816.2096), np.float32(886.6358), np.float32(1072.3392), np.float32(1740.7783), np.float32(1392.1447), np.float32(1224.1758), np.float32(2023.8312), np.float32(2016.0621), np.float32(1589.8484)]
2025-09-14 16:59:33,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:59:33,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 59 minutes, 59 seconds)
2025-09-14 17:02:46,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:02:57,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1560.20947 ± 380.040
2025-09-14 17:02:57,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1512.0309), np.float32(1297.6019), np.float32(1080.5618), np.float32(1725.9844), np.float32(2241.0315), np.float32(1009.6964), np.float32(2040.5052), np.float32(1423.7185), np.float32(1425.4282), np.float32(1845.5358)]
2025-09-14 17:02:57,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:02:57,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 56 minutes, 52 seconds)
2025-09-14 17:06:12,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:06:23,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1725.18262 ± 664.841
2025-09-14 17:06:23,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2320.8484), np.float32(1338.4255), np.float32(1063.8125), np.float32(2812.9473), np.float32(2461.7993), np.float32(1042.4573), np.float32(927.1813), np.float32(1753.183), np.float32(1197.4656), np.float32(2333.7063)]
2025-09-14 17:06:23,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:06:23,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 53 minutes, 25 seconds)
2025-09-14 17:09:34,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:09:45,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1383.52307 ± 432.162
2025-09-14 17:09:45,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1320.3306), np.float32(1021.6427), np.float32(920.6455), np.float32(1340.0037), np.float32(1247.2196), np.float32(1449.8784), np.float32(1131.2589), np.float32(1381.6036), np.float32(1444.9086), np.float32(2577.7393)]
2025-09-14 17:09:45,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:09:45,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 49 minutes, 34 seconds)
2025-09-14 17:12:56,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:13:07,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1516.59009 ± 532.974
2025-09-14 17:13:07,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1073.7352), np.float32(1726.1378), np.float32(1002.795), np.float32(1359.6462), np.float32(2420.6482), np.float32(1203.4615), np.float32(1109.3821), np.float32(2590.842), np.float32(1255.5914), np.float32(1423.6599)]
2025-09-14 17:13:07,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:13:07,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 46 minutes, 10 seconds)
2025-09-14 17:16:20,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:16:31,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1651.41284 ± 430.912
2025-09-14 17:16:31,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2624.89), np.float32(1558.6943), np.float32(1362.7206), np.float32(1970.2261), np.float32(1046.8656), np.float32(1996.9615), np.float32(1272.7155), np.float32(1358.6038), np.float32(1634.8613), np.float32(1687.5903)]
2025-09-14 17:16:31,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:16:31,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 42 minutes, 51 seconds)
2025-09-14 17:19:44,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:19:55,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1487.17554 ± 590.359
2025-09-14 17:19:55,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1347.8115), np.float32(2093.0867), np.float32(1893.0424), np.float32(1700.2782), np.float32(1066.9984), np.float32(2801.3696), np.float32(956.4502), np.float32(1039.2819), np.float32(957.6183), np.float32(1015.81824)]
2025-09-14 17:19:55,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:19:55,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 39 minutes, 25 seconds)
2025-09-14 17:23:06,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:23:18,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1856.53259 ± 476.928
2025-09-14 17:23:18,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1407.4147), np.float32(2354.991), np.float32(1494.6077), np.float32(1463.17), np.float32(1834.1583), np.float32(2326.5107), np.float32(1813.6844), np.float32(2406.1064), np.float32(2433.2742), np.float32(1031.4081)]
2025-09-14 17:23:18,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:23:18,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1856.53) for latency 21
2025-09-14 17:23:18,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 35 minutes, 36 seconds)
2025-09-14 17:26:33,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:26:44,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1245.98718 ± 328.163
2025-09-14 17:26:44,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1066.7031), np.float32(1394.429), np.float32(1057.6047), np.float32(2160.0146), np.float32(1062.6504), np.float32(1017.9138), np.float32(1110.804), np.float32(1092.8193), np.float32(1137.9645), np.float32(1358.9677)]
2025-09-14 17:26:44,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:26:44,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 32 minutes, 53 seconds)
2025-09-14 17:29:57,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:30:08,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1670.70374 ± 639.473
2025-09-14 17:30:08,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1238.8807), np.float32(947.8698), np.float32(1460.2511), np.float32(1147.3059), np.float32(1601.6475), np.float32(1053.2115), np.float32(1444.2784), np.float32(2625.1792), np.float32(2531.0295), np.float32(2657.3845)]
2025-09-14 17:30:08,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:30:08,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 29 minutes, 41 seconds)
2025-09-14 17:33:20,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:33:31,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1320.20935 ± 310.236
2025-09-14 17:33:31,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1122.2529), np.float32(1014.57135), np.float32(1552.3054), np.float32(1365.251), np.float32(1131.599), np.float32(937.41345), np.float32(999.62195), np.float32(1747.5375), np.float32(1481.8606), np.float32(1849.6812)]
2025-09-14 17:33:31,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:33:31,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 26 minutes, 13 seconds)
2025-09-14 17:36:43,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:36:54,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1705.90002 ± 623.502
2025-09-14 17:36:54,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2807.4773), np.float32(2719.1506), np.float32(1061.5326), np.float32(1063.168), np.float32(1443.0331), np.float32(2261.3203), np.float32(1632.0951), np.float32(1556.4059), np.float32(1332.9456), np.float32(1181.8716)]
2025-09-14 17:36:54,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:36:54,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 22 minutes, 39 seconds)
2025-09-14 17:40:08,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:40:19,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1462.92151 ± 492.098
2025-09-14 17:40:19,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2487.7485), np.float32(1495.7175), np.float32(1071.0541), np.float32(953.1036), np.float32(2206.0383), np.float32(1137.9354), np.float32(1112.3511), np.float32(1705.951), np.float32(1165.7802), np.float32(1293.5345)]
2025-09-14 17:40:19,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:40:19,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 19 minutes, 38 seconds)
2025-09-14 17:43:33,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:43:44,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2051.91870 ± 702.177
2025-09-14 17:43:44,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2733.6775), np.float32(2081.3608), np.float32(2994.662), np.float32(938.58374), np.float32(2936.398), np.float32(1821.5236), np.float32(1421.6621), np.float32(2653.259), np.float32(1291.762), np.float32(1646.2979)]
2025-09-14 17:43:44,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:43:44,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2051.92) for latency 21
2025-09-14 17:43:44,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 16 minutes, 1 second)
2025-09-14 17:46:55,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:47:06,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1486.76367 ± 383.478
2025-09-14 17:47:06,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1085.2861), np.float32(2098.611), np.float32(1118.7422), np.float32(1008.6836), np.float32(1594.5154), np.float32(1406.7384), np.float32(1430.3823), np.float32(1733.9557), np.float32(1247.2029), np.float32(2143.5183)]
2025-09-14 17:47:06,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:47:06,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 12 minutes, 23 seconds)
2025-09-14 17:50:18,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:50:29,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1607.49939 ± 476.209
2025-09-14 17:50:29,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2214.3035), np.float32(1074.8381), np.float32(1089.4597), np.float32(1773.7886), np.float32(2466.5728), np.float32(1228.8384), np.float32(1433.989), np.float32(2034.1168), np.float32(1626.8971), np.float32(1132.19)]
2025-09-14 17:50:29,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:50:29,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 8 minutes, 59 seconds)
2025-09-14 17:53:44,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:53:55,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1509.21448 ± 487.086
2025-09-14 17:53:55,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1361.415), np.float32(1884.2496), np.float32(1203.3967), np.float32(1227.9731), np.float32(925.6231), np.float32(2655.8403), np.float32(1935.7893), np.float32(1425.8453), np.float32(1087.8934), np.float32(1384.1183)]
2025-09-14 17:53:55,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:53:55,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 5 minutes, 53 seconds)
2025-09-14 17:57:09,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:57:21,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1347.24487 ± 623.325
2025-09-14 17:57:21,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1473.5869), np.float32(1392.2262), np.float32(1072.444), np.float32(2367.543), np.float32(1816.2136), np.float32(-127.70186), np.float32(1238.9282), np.float32(949.70404), np.float32(1579.03), np.float32(1710.4747)]
2025-09-14 17:57:21,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:57:21,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 2 minutes, 34 seconds)
2025-09-14 18:00:33,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:00:44,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1480.65161 ± 426.899
2025-09-14 18:00:44,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1264.3611), np.float32(1535.9757), np.float32(1207.4558), np.float32(1029.0538), np.float32(934.26086), np.float32(2092.0337), np.float32(1084.7916), np.float32(1868.3168), np.float32(1602.6422), np.float32(2187.6243)]
2025-09-14 18:00:44,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:00:44,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 58 minutes, 55 seconds)
2025-09-14 18:03:55,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:04:06,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1235.63794 ± 270.300
2025-09-14 18:04:06,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(997.5741), np.float32(1388.5577), np.float32(1369.7589), np.float32(911.85767), np.float32(1044.3936), np.float32(1731.676), np.float32(1621.0719), np.float32(1185.3507), np.float32(1170.0358), np.float32(936.10376)]
2025-09-14 18:04:06,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:04:06,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 55 minutes, 35 seconds)
2025-09-14 18:07:20,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:07:31,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1534.11450 ± 605.120
2025-09-14 18:07:31,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1150.0829), np.float32(1493.0406), np.float32(1551.244), np.float32(646.80334), np.float32(2758.401), np.float32(1588.2354), np.float32(955.19836), np.float32(1857.5267), np.float32(2281.159), np.float32(1059.453)]
2025-09-14 18:07:31,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:07:31,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 52 minutes, 24 seconds)
2025-09-14 18:10:45,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:10:56,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2073.06616 ± 641.896
2025-09-14 18:10:56,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2892.1587), np.float32(1271.8517), np.float32(2381.6462), np.float32(2295.413), np.float32(2645.42), np.float32(1623.2905), np.float32(1425.3633), np.float32(1726.91), np.float32(3112.2466), np.float32(1356.361)]
2025-09-14 18:10:56,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:10:56,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2073.07) for latency 21
2025-09-14 18:10:56,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 48 minutes, 56 seconds)
2025-09-14 18:14:09,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:14:20,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1568.13013 ± 457.140
2025-09-14 18:14:20,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1432.3634), np.float32(1915.9232), np.float32(1271.4579), np.float32(1036.8374), np.float32(1245.9089), np.float32(2490.112), np.float32(1134.6519), np.float32(2206.637), np.float32(1553.6572), np.float32(1393.7526)]
2025-09-14 18:14:20,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:14:20,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 45 minutes, 20 seconds)
2025-09-14 18:17:33,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:17:44,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1689.91663 ± 496.690
2025-09-14 18:17:44,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1595.0043), np.float32(1770.9343), np.float32(2984.7915), np.float32(1509.4526), np.float32(1604.1351), np.float32(1663.1583), np.float32(1150.9445), np.float32(1471.0055), np.float32(1145.2161), np.float32(2004.5242)]
2025-09-14 18:17:44,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:17:44,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 42 minutes, 3 seconds)
2025-09-14 18:20:57,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:21:09,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1478.28320 ± 491.686
2025-09-14 18:21:09,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1007.794), np.float32(1114.0723), np.float32(1107.0303), np.float32(1324.5713), np.float32(986.42053), np.float32(1557.0952), np.float32(2680.7876), np.float32(1554.4017), np.float32(1525.2395), np.float32(1925.4199)]
2025-09-14 18:21:09,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:21:09,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 38 minutes, 48 seconds)
2025-09-14 18:24:22,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:24:33,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1733.89197 ± 570.228
2025-09-14 18:24:33,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2025.8947), np.float32(2705.793), np.float32(1018.37646), np.float32(2050.2002), np.float32(1902.7388), np.float32(1488.8541), np.float32(1023.2405), np.float32(1699.5354), np.float32(2424.1936), np.float32(1000.09393)]
2025-09-14 18:24:33,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:24:33,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 35 minutes, 24 seconds)
2025-09-14 18:27:47,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:27:58,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1693.96155 ± 428.414
2025-09-14 18:27:58,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1524.0769), np.float32(1407.7174), np.float32(1809.9163), np.float32(1380.0049), np.float32(1386.4486), np.float32(1150.5833), np.float32(2249.8855), np.float32(1623.5083), np.float32(2644.7554), np.float32(1762.7183)]
2025-09-14 18:27:58,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:27:58,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 32 minutes)
2025-09-14 18:31:13,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:31:24,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1528.46521 ± 615.057
2025-09-14 18:31:24,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1063.4412), np.float32(1523.0398), np.float32(1089.7305), np.float32(1114.3744), np.float32(2545.139), np.float32(1014.92053), np.float32(1224.6125), np.float32(1661.3844), np.float32(1216.4641), np.float32(2831.5461)]
2025-09-14 18:31:24,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:31:24,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 28 minutes, 43 seconds)
2025-09-14 18:34:37,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:34:47,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1621.52563 ± 636.299
2025-09-14 18:34:47,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1983.1418), np.float32(2381.9895), np.float32(1792.8967), np.float32(974.0556), np.float32(1029.473), np.float32(1052.8671), np.float32(1923.9369), np.float32(1160.4558), np.float32(1039.9502), np.float32(2876.4893)]
2025-09-14 18:34:47,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:34:47,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 25 minutes, 14 seconds)
2025-09-14 18:37:59,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:38:09,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1799.02319 ± 725.157
2025-09-14 18:38:09,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1052.7389), np.float32(1313.028), np.float32(1528.1082), np.float32(3209.3542), np.float32(3133.2595), np.float32(1725.8376), np.float32(1559.4991), np.float32(1222.9795), np.float32(1910.8558), np.float32(1334.572)]
2025-09-14 18:38:09,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:38:09,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 21 minutes, 39 seconds)
2025-09-14 18:41:18,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:41:29,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1318.71326 ± 199.875
2025-09-14 18:41:29,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1537.9156), np.float32(1326.7352), np.float32(1373.2782), np.float32(1221.9688), np.float32(1083.348), np.float32(1569.2095), np.float32(1137.8632), np.float32(1533.1449), np.float32(962.15393), np.float32(1441.5161)]
2025-09-14 18:41:29,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:41:29,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 17 minutes, 50 seconds)
2025-09-14 18:44:39,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:44:50,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1662.02026 ± 895.772
2025-09-14 18:44:50,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1717.951), np.float32(2847.9856), np.float32(972.025), np.float32(3119.9897), np.float32(1251.0782), np.float32(941.9566), np.float32(2735.2944), np.float32(1708.7452), np.float32(957.095), np.float32(368.08234)]
2025-09-14 18:44:50,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:44:50,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 14 minutes, 11 seconds)
2025-09-14 18:48:00,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:48:10,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1891.61938 ± 703.380
2025-09-14 18:48:10,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2623.483), np.float32(2714.8955), np.float32(1466.9218), np.float32(1323.7806), np.float32(3329.075), np.float32(1217.572), np.float32(1233.1951), np.float32(1927.2654), np.float32(1659.4595), np.float32(1420.5455)]
2025-09-14 18:48:10,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:48:10,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 10 minutes, 27 seconds)
2025-09-14 18:51:19,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:51:29,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1649.85217 ± 670.898
2025-09-14 18:51:29,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1267.0701), np.float32(3024.7395), np.float32(2704.4417), np.float32(1848.0248), np.float32(1283.9045), np.float32(1283.0101), np.float32(1466.9847), np.float32(1002.37366), np.float32(1732.7994), np.float32(885.173)]
2025-09-14 18:51:29,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:51:29,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 6 minutes, 47 seconds)
2025-09-14 18:54:38,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:54:48,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1980.98083 ± 682.946
2025-09-14 18:54:48,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2624.5918), np.float32(1867.3494), np.float32(1417.1222), np.float32(1295.2222), np.float32(3148.9248), np.float32(2271.8071), np.float32(2879.8975), np.float32(1757.028), np.float32(1540.477), np.float32(1007.3888)]
2025-09-14 18:54:48,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:54:48,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 3 minutes, 17 seconds)
2025-09-14 18:57:59,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:58:10,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1350.27441 ± 801.368
2025-09-14 18:58:10,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2233.0408), np.float32(1545.1193), np.float32(2121.244), np.float32(1128.7079), np.float32(1755.3243), np.float32(1452.7091), np.float32(-72.37285), np.float32(2133.6672), np.float32(-115.37261), np.float32(1320.6772)]
2025-09-14 18:58:10,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:58:10,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 5 seconds)
2025-09-14 19:01:24,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:01:36,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1831.62964 ± 789.358
2025-09-14 19:01:36,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2765.9656), np.float32(2964.633), np.float32(1035.5701), np.float32(1471.1414), np.float32(1014.44556), np.float32(1027.1447), np.float32(1107.8263), np.float32(1883.3014), np.float32(2016.8677), np.float32(3029.3994)]
2025-09-14 19:01:36,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:01:36,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 56 minutes, 58 seconds)
2025-09-14 19:04:47,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:04:58,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1308.08582 ± 326.940
2025-09-14 19:04:58,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1185.0275), np.float32(933.29034), np.float32(923.58795), np.float32(1470.6083), np.float32(1901.6119), np.float32(1345.6107), np.float32(1128.5378), np.float32(1794.0562), np.float32(1415.2075), np.float32(983.3211)]
2025-09-14 19:04:58,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:04:58,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 53 minutes, 43 seconds)
2025-09-14 19:08:10,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:08:21,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1573.38245 ± 514.116
2025-09-14 19:08:21,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1304.2644), np.float32(2190.1704), np.float32(1424.2548), np.float32(1177.328), np.float32(1307.3315), np.float32(1303.3209), np.float32(1679.537), np.float32(1415.1813), np.float32(1093.7068), np.float32(2838.729)]
2025-09-14 19:08:21,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:08:21,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 50 minutes, 35 seconds)
2025-09-14 19:11:34,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:11:45,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1569.14673 ± 562.456
2025-09-14 19:11:45,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1038.8645), np.float32(1853.6033), np.float32(1569.5281), np.float32(2591.4429), np.float32(1136.4254), np.float32(1100.4167), np.float32(1095.576), np.float32(2407.9285), np.float32(1005.7645), np.float32(1891.9177)]
2025-09-14 19:11:45,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:11:45,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 47 minutes, 27 seconds)
2025-09-14 19:15:00,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:15:11,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1506.52466 ± 423.926
2025-09-14 19:15:11,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1036.0835), np.float32(1536.5057), np.float32(1620.0951), np.float32(2224.8533), np.float32(2307.0913), np.float32(1092.7681), np.float32(1101.914), np.float32(1491.3499), np.float32(1273.754), np.float32(1380.8328)]
2025-09-14 19:15:11,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:15:11,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 44 minutes, 13 seconds)
2025-09-14 19:18:22,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:18:33,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1382.58716 ± 343.840
2025-09-14 19:18:33,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1126.8971), np.float32(1816.5564), np.float32(1084.681), np.float32(1108.6575), np.float32(2123.6028), np.float32(1027.8408), np.float32(1161.461), np.float32(1493.7026), np.float32(1545.385), np.float32(1337.0881)]
2025-09-14 19:18:33,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:18:33,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 40 minutes, 42 seconds)
2025-09-14 19:21:45,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:21:56,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1487.68481 ± 834.006
2025-09-14 19:21:56,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3098.0415), np.float32(-64.40358), np.float32(2171.0364), np.float32(2433.9624), np.float32(1349.665), np.float32(1223.4603), np.float32(1350.3795), np.float32(1142.1073), np.float32(1230.6277), np.float32(941.9721)]
2025-09-14 19:21:56,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:21:56,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 37 minutes, 20 seconds)
2025-09-14 19:25:08,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:25:19,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1818.69751 ± 550.019
2025-09-14 19:25:19,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2921.675), np.float32(923.1828), np.float32(1825.9985), np.float32(1697.729), np.float32(2523.3357), np.float32(1226.3137), np.float32(1492.6624), np.float32(1987.3939), np.float32(1860.0645), np.float32(1728.6194)]
2025-09-14 19:25:19,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:25:19,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 33 minutes, 56 seconds)
2025-09-14 19:28:33,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:28:44,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1799.71704 ± 716.677
2025-09-14 19:28:44,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1049.5875), np.float32(1527.8507), np.float32(1570.1792), np.float32(3089.9521), np.float32(2447.6199), np.float32(1166.8212), np.float32(1526.9869), np.float32(2888.0342), np.float32(961.7905), np.float32(1768.348)]
2025-09-14 19:28:44,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:28:44,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 30 minutes, 33 seconds)
2025-09-14 19:31:55,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:32:06,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1985.36890 ± 722.180
2025-09-14 19:32:06,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2847.8152), np.float32(3504.4912), np.float32(974.83844), np.float32(2033.0671), np.float32(1431.9747), np.float32(1598.2611), np.float32(1368.0999), np.float32(1598.6267), np.float32(2160.1008), np.float32(2336.4138)]
2025-09-14 19:32:06,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:32:06,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 27 minutes, 4 seconds)
2025-09-14 19:35:19,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:35:30,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2363.05664 ± 544.601
2025-09-14 19:35:30,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2344.2888), np.float32(2575.0876), np.float32(1606.468), np.float32(3156.7256), np.float32(1991.1791), np.float32(2940.6816), np.float32(2812.4065), np.float32(2789.5752), np.float32(1737.537), np.float32(1676.6162)]
2025-09-14 19:35:30,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:35:30,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2363.06) for latency 21
2025-09-14 19:35:30,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 23 minutes, 43 seconds)
2025-09-14 19:38:42,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:38:52,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1561.89478 ± 375.025
2025-09-14 19:38:52,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1880.6951), np.float32(1361.496), np.float32(1704.8838), np.float32(2506.6277), np.float32(1307.5338), np.float32(1185.9349), np.float32(1599.7833), np.float32(1246.3165), np.float32(1422.0151), np.float32(1403.661)]
2025-09-14 19:38:52,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:38:52,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 20 minutes, 19 seconds)
2025-09-14 19:42:05,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:42:16,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1659.04517 ± 414.175
2025-09-14 19:42:16,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1697.8484), np.float32(1585.1041), np.float32(1250.82), np.float32(1391.81), np.float32(2183.4578), np.float32(1375.1249), np.float32(1211.3674), np.float32(2287.3271), np.float32(2292.518), np.float32(1315.0742)]
2025-09-14 19:42:16,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:42:16,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 16 minutes, 57 seconds)
2025-09-14 19:45:28,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:45:39,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1758.82031 ± 515.210
2025-09-14 19:45:39,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1273.2134), np.float32(2756.366), np.float32(1495.778), np.float32(1890.4646), np.float32(1021.1009), np.float32(2376.6658), np.float32(1346.8284), np.float32(1655.3701), np.float32(2200.9521), np.float32(1571.4633)]
2025-09-14 19:45:39,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:45:39,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 13 minutes, 31 seconds)
2025-09-14 19:48:28,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:48:37,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1551.56360 ± 439.023
2025-09-14 19:48:37,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(995.4856), np.float32(1498.1648), np.float32(1422.6766), np.float32(2549.038), np.float32(1922.1132), np.float32(1132.852), np.float32(1872.1978), np.float32(1535.5381), np.float32(1127.6744), np.float32(1459.8945)]
2025-09-14 19:48:37,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:48:37,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 9 minutes, 54 seconds)
2025-09-14 19:51:19,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:51:27,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2010.07837 ± 794.020
2025-09-14 19:51:27,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2542.2183), np.float32(2425.5618), np.float32(3009.78), np.float32(1176.6102), np.float32(2779.0896), np.float32(1200.4043), np.float32(1528.958), np.float32(1251.6426), np.float32(1071.6643), np.float32(3114.8528)]
2025-09-14 19:51:27,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:51:27,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 22 seconds)
2025-09-14 19:54:02,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:54:10,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1350.01514 ± 399.893
2025-09-14 19:54:10,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2074.5002), np.float32(1233.399), np.float32(1055.5118), np.float32(1222.9683), np.float32(2055.1697), np.float32(1054.0302), np.float32(1652.944), np.float32(1038.9993), np.float32(979.09955), np.float32(1133.5297)]
2025-09-14 19:54:10,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:54:10,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 3 seconds)
2025-09-14 19:56:47,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:56:56,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1674.88940 ± 691.400
2025-09-14 19:56:56,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1601.3291), np.float32(2058.654), np.float32(2618.3394), np.float32(953.27466), np.float32(1397.4357), np.float32(1070.1805), np.float32(1578.3555), np.float32(3161.2656), np.float32(1068.3414), np.float32(1241.7188)]
2025-09-14 19:56:56,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:56:56,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1251 [DEBUG]: Training session finished
