2025-09-14 13:26:51,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.075-delay_18
2025-09-14 13:26:51,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.075-delay_18
2025-09-14 13:26:51,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'18': <latency_env.delayed_mdp.ConstantDelay object at 0x7fd49d2e3da0>}
2025-09-14 13:26:51,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 13:26:51,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 13:26:51,955 baseline-bpql-noisepromille75-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=125, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 13:26:51,955 baseline-bpql-noisepromille75-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 13:26:53,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 13:26:53,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 13:29:44,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 13:29:54,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: -347.78531 ± 63.771
2025-09-14 13:29:54,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-230.90007), np.float32(-419.89572), np.float32(-395.4085), np.float32(-361.22504), np.float32(-332.66852), np.float32(-428.30457), np.float32(-306.5498), np.float32(-257.22577), np.float32(-400.12143), np.float32(-345.5537)]
2025-09-14 13:29:54,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:29:54,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (-347.79) for latency 18
2025-09-14 13:29:54,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 58 minutes, 1 second)
2025-09-14 13:33:02,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 13:33:11,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: -266.54202 ± 33.461
2025-09-14 13:33:11,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-238.98193), np.float32(-258.6425), np.float32(-291.57816), np.float32(-196.88333), np.float32(-232.49184), np.float32(-278.18982), np.float32(-277.22598), np.float32(-277.20724), np.float32(-300.74075), np.float32(-313.47876)]
2025-09-14 13:33:11,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:33:11,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (-266.54) for latency 18
2025-09-14 13:33:11,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 8 minutes, 25 seconds)
2025-09-14 13:36:20,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 13:36:30,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: -177.61363 ± 39.841
2025-09-14 13:36:30,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-120.152145), np.float32(-165.01166), np.float32(-262.74564), np.float32(-157.90274), np.float32(-160.02115), np.float32(-191.96591), np.float32(-150.7324), np.float32(-212.1871), np.float32(-211.93379), np.float32(-143.4838)]
2025-09-14 13:36:30,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:36:30,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (-177.61) for latency 18
2025-09-14 13:36:30,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 10 minutes, 55 seconds)
2025-09-14 13:39:27,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 13:39:37,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: -49.84095 ± 72.303
2025-09-14 13:39:37,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-21.011772), np.float32(-91.40708), np.float32(-155.5511), np.float32(27.692808), np.float32(-37.765633), np.float32(-2.0616577), np.float32(-84.698074), np.float32(-123.1204), np.float32(93.600876), np.float32(-104.087494)]
2025-09-14 13:39:37,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:39:37,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (-49.84) for latency 18
2025-09-14 13:39:37,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 5 minutes, 32 seconds)
2025-09-14 13:42:32,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 13:42:42,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 85.77434 ± 180.456
2025-09-14 13:42:42,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(101.49947), np.float32(298.7122), np.float32(81.38529), np.float32(422.04562), np.float32(-59.2589), np.float32(-106.84972), np.float32(-213.92274), np.float32(52.00124), np.float32(64.43736), np.float32(217.69354)]
2025-09-14 13:42:42,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:42:42,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (85.77) for latency 18
2025-09-14 13:42:42,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 28 seconds)
2025-09-14 13:45:37,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 13:45:47,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 234.74287 ± 113.382
2025-09-14 13:45:47,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(154.22145), np.float32(320.70135), np.float32(411.5813), np.float32(446.0632), np.float32(130.94331), np.float32(164.41776), np.float32(163.94214), np.float32(172.17155), np.float32(263.79974), np.float32(119.586624)]
2025-09-14 13:45:47,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:45:47,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (234.74) for latency 18
2025-09-14 13:45:47,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 58 minutes, 33 seconds)
2025-09-14 13:48:42,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 13:48:52,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 698.07739 ± 192.937
2025-09-14 13:48:52,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(629.53174), np.float32(678.8284), np.float32(872.82764), np.float32(610.396), np.float32(673.78467), np.float32(1033.3761), np.float32(353.11746), np.float32(510.6226), np.float32(945.39655), np.float32(672.893)]
2025-09-14 13:48:52,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:48:52,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (698.08) for latency 18
2025-09-14 13:48:52,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 51 minutes, 40 seconds)
2025-09-14 13:51:47,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 13:51:57,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 926.76123 ± 104.573
2025-09-14 13:51:57,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1022.1388), np.float32(857.73694), np.float32(955.63257), np.float32(955.04224), np.float32(989.556), np.float32(755.90735), np.float32(850.93353), np.float32(810.8665), np.float32(1130.1787), np.float32(939.6197)]
2025-09-14 13:51:57,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:51:57,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (926.76) for latency 18
2025-09-14 13:51:57,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 44 minutes, 16 seconds)
2025-09-14 13:55:08,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 13:55:20,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1090.86548 ± 138.853
2025-09-14 13:55:20,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(759.6862), np.float32(1189.6307), np.float32(1058.0414), np.float32(1161.9703), np.float32(1132.9844), np.float32(1313.1721), np.float32(1149.4207), np.float32(1113.439), np.float32(1033.0251), np.float32(997.28516)]
2025-09-14 13:55:20,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:55:20,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1090.87) for latency 18
2025-09-14 13:55:20,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 45 minutes, 52 seconds)
2025-09-14 13:58:39,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 13:58:50,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1005.78943 ± 463.809
2025-09-14 13:58:50,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1028.6292), np.float32(1104.0764), np.float32(1078.3914), np.float32(1253.5625), np.float32(1193.2155), np.float32(1076.2987), np.float32(990.3651), np.float32(1162.3505), np.float32(-324.50076), np.float32(1495.5063)]
2025-09-14 13:58:50,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:58:50,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 50 minutes, 15 seconds)
2025-09-14 14:01:57,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:02:07,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1262.52380 ± 144.562
2025-09-14 14:02:07,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1197.516), np.float32(1394.9994), np.float32(1372.1185), np.float32(1412.1741), np.float32(1032.1318), np.float32(1303.4971), np.float32(1376.2136), np.float32(1295.6997), np.float32(1268.1624), np.float32(972.72644)]
2025-09-14 14:02:07,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:02:07,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1262.52) for latency 18
2025-09-14 14:02:07,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 50 minutes, 45 seconds)
2025-09-14 14:05:04,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:05:13,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1186.30005 ± 199.858
2025-09-14 14:05:13,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1048.8901), np.float32(1433.2142), np.float32(1132.9707), np.float32(1002.0397), np.float32(1122.2056), np.float32(1592.9783), np.float32(1270.74), np.float32(1308.5135), np.float32(940.0872), np.float32(1011.3599)]
2025-09-14 14:05:13,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:05:13,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 47 minutes, 57 seconds)
2025-09-14 14:08:08,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:08:18,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1238.68372 ± 184.719
2025-09-14 14:08:18,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1115.4249), np.float32(1187.8202), np.float32(1039.7047), np.float32(1311.15), np.float32(1272.8462), np.float32(1159.7944), np.float32(1677.893), np.float32(1353.6976), np.float32(1281.3136), np.float32(987.19226)]
2025-09-14 14:08:18,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:08:18,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 44 minutes, 29 seconds)
2025-09-14 14:11:14,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:11:24,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1254.82581 ± 260.169
2025-09-14 14:11:24,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1337.4099), np.float32(1506.5765), np.float32(654.64026), np.float32(1232.4756), np.float32(1320.194), np.float32(1145.6362), np.float32(1156.3945), np.float32(1250.245), np.float32(1221.9198), np.float32(1722.7661)]
2025-09-14 14:11:24,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:11:24,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 36 minutes, 23 seconds)
2025-09-14 14:14:19,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:14:29,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1588.80945 ± 412.701
2025-09-14 14:14:29,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2355.095), np.float32(1980.3789), np.float32(2005.7804), np.float32(1085.9492), np.float32(1356.7692), np.float32(1453.776), np.float32(1133.4865), np.float32(1837.052), np.float32(1528.3667), np.float32(1151.4407)]
2025-09-14 14:14:29,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:14:29,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1588.81) for latency 18
2025-09-14 14:14:29,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 26 minutes, 9 seconds)
2025-09-14 14:17:22,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:17:32,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1262.24487 ± 141.846
2025-09-14 14:17:32,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1293.247), np.float32(1149.1145), np.float32(1119.3326), np.float32(1291.0724), np.float32(1196.6556), np.float32(1224.5994), np.float32(1142.0131), np.float32(1246.387), np.float32(1639.5687), np.float32(1320.459)]
2025-09-14 14:17:32,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:17:32,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 19 minutes, 11 seconds)
2025-09-14 14:20:53,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:21:04,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1275.55347 ± 357.318
2025-09-14 14:21:04,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1291.2272), np.float32(1286.2488), np.float32(1228.2919), np.float32(2321.5918), np.float32(1068.2534), np.float32(1142.6548), np.float32(1131.6641), np.float32(1068.9296), np.float32(1105.233), np.float32(1111.4408)]
2025-09-14 14:21:04,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:21:04,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 22 minutes, 53 seconds)
2025-09-14 14:24:23,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:24:34,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1368.80261 ± 316.554
2025-09-14 14:24:34,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1349.4989), np.float32(1127.51), np.float32(1120.614), np.float32(1967.5016), np.float32(1150.7456), np.float32(2004.7478), np.float32(1234.0981), np.float32(1295.2278), np.float32(1180.066), np.float32(1258.0165)]
2025-09-14 14:24:34,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:24:34,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 26 minutes, 43 seconds)
2025-09-14 14:27:35,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:27:45,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1397.68982 ± 287.246
2025-09-14 14:27:45,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1052.3711), np.float32(1103.011), np.float32(1188.086), np.float32(1543.6935), np.float32(2117.6096), np.float32(1395.0093), np.float32(1308.5664), np.float32(1293.0332), np.float32(1474.3203), np.float32(1501.1976)]
2025-09-14 14:27:45,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:27:45,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 24 minutes, 58 seconds)
2025-09-14 14:30:42,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:30:52,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1643.39917 ± 387.943
2025-09-14 14:30:52,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1399.4368), np.float32(1281.8582), np.float32(1392.9799), np.float32(1249.5422), np.float32(1502.4341), np.float32(1722.4249), np.float32(2210.095), np.float32(2353.6821), np.float32(2010.4586), np.float32(1311.0792)]
2025-09-14 14:30:52,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:30:52,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1643.40) for latency 18
2025-09-14 14:30:52,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 22 minutes, 2 seconds)
2025-09-14 14:33:47,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:33:57,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1415.43872 ± 173.611
2025-09-14 14:33:57,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1165.6226), np.float32(1556.1559), np.float32(1537.4968), np.float32(1347.6549), np.float32(1504.7214), np.float32(1295.1714), np.float32(1138.6895), np.float32(1395.9707), np.float32(1723.346), np.float32(1489.5596)]
2025-09-14 14:33:57,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:33:57,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 19 minutes, 21 seconds)
2025-09-14 14:36:54,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:37:03,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1346.05176 ± 145.120
2025-09-14 14:37:03,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1484.4741), np.float32(1421.6527), np.float32(1178.1292), np.float32(1229.6245), np.float32(1379.2119), np.float32(1262.3152), np.float32(1123.7103), np.float32(1318.9276), np.float32(1623.7013), np.float32(1438.7701)]
2025-09-14 14:37:03,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:37:03,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 9 minutes, 20 seconds)
2025-09-14 14:39:57,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:40:07,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1403.08081 ± 215.942
2025-09-14 14:40:07,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1167.8407), np.float32(1409.925), np.float32(1658.9861), np.float32(1586.0336), np.float32(1257.7875), np.float32(1837.1193), np.float32(1164.6217), np.float32(1291.2341), np.float32(1433.5343), np.float32(1223.7267)]
2025-09-14 14:40:07,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:40:07,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 59 minutes, 22 seconds)
2025-09-14 14:43:03,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:43:13,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1581.55798 ± 343.442
2025-09-14 14:43:13,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1788.9581), np.float32(1721.2905), np.float32(1163.4266), np.float32(1948.4932), np.float32(1267.3575), np.float32(1153.4738), np.float32(1938.6526), np.float32(1468.0116), np.float32(2109.8972), np.float32(1256.0182)]
2025-09-14 14:43:13,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:43:13,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 54 minutes, 59 seconds)
2025-09-14 14:46:18,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:46:29,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1810.30017 ± 363.635
2025-09-14 14:46:29,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1682.9984), np.float32(1654.7843), np.float32(1924.418), np.float32(1254.948), np.float32(1735.6685), np.float32(1750.9524), np.float32(1486.1486), np.float32(2536.5776), np.float32(1710.4532), np.float32(2366.053)]
2025-09-14 14:46:29,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:46:29,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1810.30) for latency 18
2025-09-14 14:46:29,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 54 minutes, 18 seconds)
2025-09-14 14:49:48,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:49:59,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1449.19104 ± 224.757
2025-09-14 14:49:59,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1427.3077), np.float32(1421.8516), np.float32(1929.2819), np.float32(1450.8038), np.float32(1382.1241), np.float32(1566.4454), np.float32(1173.2318), np.float32(1214.5516), np.float32(1214.5664), np.float32(1711.7458)]
2025-09-14 14:49:59,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:49:59,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 57 minutes, 10 seconds)
2025-09-14 14:53:12,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:53:22,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1617.64746 ± 320.697
2025-09-14 14:53:22,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1917.9048), np.float32(1198.3431), np.float32(1620.0746), np.float32(1590.5801), np.float32(1337.9636), np.float32(1982.6704), np.float32(1661.4327), np.float32(1496.239), np.float32(1173.5823), np.float32(2197.6836)]
2025-09-14 14:53:22,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:53:22,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 58 minutes, 17 seconds)
2025-09-14 14:56:23,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:56:34,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1527.50781 ± 352.726
2025-09-14 14:56:34,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1151.2301), np.float32(1303.0597), np.float32(1239.0748), np.float32(1369.5583), np.float32(1391.8933), np.float32(1918.136), np.float32(1282.4948), np.float32(2196.6848), np.float32(2025.7621), np.float32(1397.1843)]
2025-09-14 14:56:34,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:56:34,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 56 minutes, 52 seconds)
2025-09-14 14:59:30,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 14:59:39,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1730.60181 ± 334.772
2025-09-14 14:59:39,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1778.7272), np.float32(1292.4797), np.float32(2395.9553), np.float32(1356.3613), np.float32(1627.4358), np.float32(1716.8802), np.float32(1466.7532), np.float32(1694.418), np.float32(2242.394), np.float32(1734.6138)]
2025-09-14 14:59:39,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:59:39,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 53 minutes, 26 seconds)
2025-09-14 15:02:33,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:02:43,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1661.56409 ± 616.143
2025-09-14 15:02:43,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1385.1438), np.float32(1692.4564), np.float32(1543.8779), np.float32(2298.55), np.float32(418.79218), np.float32(2912.0374), np.float32(1498.231), np.float32(1352.1608), np.float32(1916.0886), np.float32(1598.301)]
2025-09-14 15:02:43,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:02:43,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 47 minutes, 20 seconds)
2025-09-14 15:05:39,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:05:49,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1408.74048 ± 221.915
2025-09-14 15:05:49,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1544.4685), np.float32(1147.5369), np.float32(1183.312), np.float32(1493.04), np.float32(1359.0531), np.float32(1223.3898), np.float32(1953.1619), np.float32(1340.2174), np.float32(1352.172), np.float32(1491.0526)]
2025-09-14 15:05:49,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:05:49,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 38 minutes, 34 seconds)
2025-09-14 15:08:46,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:08:55,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1636.97534 ± 559.560
2025-09-14 15:08:55,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2839.237), np.float32(1662.7172), np.float32(2601.8574), np.float32(1423.5692), np.float32(1318.6603), np.float32(1431.8337), np.float32(1146.8954), np.float32(1219.8636), np.float32(1373.9287), np.float32(1351.1904)]
2025-09-14 15:08:55,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:08:55,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 31 minutes, 35 seconds)
2025-09-14 15:11:52,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:12:02,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1499.86108 ± 604.079
2025-09-14 15:12:02,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1194.72), np.float32(1127.0939), np.float32(651.4534), np.float32(2100.8406), np.float32(1384.2953), np.float32(1082.1108), np.float32(2515.3074), np.float32(2482.6436), np.float32(1283.1798), np.float32(1176.9656)]
2025-09-14 15:12:02,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:12:02,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 27 minutes, 20 seconds)
2025-09-14 15:15:15,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:15:26,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1533.49194 ± 224.928
2025-09-14 15:15:26,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1512.9434), np.float32(1164.3049), np.float32(1310.9869), np.float32(1699.5736), np.float32(1599.0171), np.float32(1934.6696), np.float32(1769.0846), np.float32(1294.3541), np.float32(1608.7759), np.float32(1441.2112)]
2025-09-14 15:15:26,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:15:26,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 28 minutes, 24 seconds)
2025-09-14 15:18:45,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:18:56,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1810.24634 ± 506.749
2025-09-14 15:18:56,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1430.1697), np.float32(2106.4421), np.float32(1294.5657), np.float32(1508.1426), np.float32(1687.4563), np.float32(1876.8746), np.float32(2339.8757), np.float32(1479.8378), np.float32(2990.0564), np.float32(1389.0413)]
2025-09-14 15:18:56,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:18:56,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 30 minutes, 44 seconds)
2025-09-14 15:22:02,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:22:12,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1817.72681 ± 525.271
2025-09-14 15:22:12,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2010.6337), np.float32(1149.3829), np.float32(1476.7727), np.float32(1492.8207), np.float32(1416.6238), np.float32(3014.266), np.float32(1700.66), np.float32(1482.1018), np.float32(2128.667), np.float32(2305.3384)]
2025-09-14 15:22:12,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:22:12,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1817.73) for latency 18
2025-09-14 15:22:12,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 29 minutes, 35 seconds)
2025-09-14 15:25:12,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:25:21,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1586.48853 ± 230.370
2025-09-14 15:25:21,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1437.3601), np.float32(1246.9829), np.float32(1902.7561), np.float32(1256.1426), np.float32(1474.7936), np.float32(1477.808), np.float32(1679.2859), np.float32(1871.4446), np.float32(1687.3135), np.float32(1830.9971)]
2025-09-14 15:25:21,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:25:21,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 27 minutes, 1 second)
2025-09-14 15:28:15,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:28:24,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1876.08533 ± 477.919
2025-09-14 15:28:24,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2481.261), np.float32(1569.2213), np.float32(1293.203), np.float32(1954.922), np.float32(2894.212), np.float32(1299.3187), np.float32(1864.5712), np.float32(1847.1572), np.float32(1568.2125), np.float32(1988.7742)]
2025-09-14 15:28:24,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:28:24,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1876.09) for latency 18
2025-09-14 15:28:24,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 22 minutes, 56 seconds)
2025-09-14 15:31:20,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:31:30,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1725.60388 ± 420.099
2025-09-14 15:31:30,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1864.6176), np.float32(1247.9971), np.float32(1299.2924), np.float32(2033.834), np.float32(2385.0386), np.float32(1491.3649), np.float32(1460.7213), np.float32(2436.4397), np.float32(1306.7289), np.float32(1730.0049)]
2025-09-14 15:31:30,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:31:30,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 15 minutes, 51 seconds)
2025-09-14 15:34:25,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:34:35,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1996.12463 ± 559.546
2025-09-14 15:34:35,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1447.3895), np.float32(2549.253), np.float32(1791.4407), np.float32(2578.7573), np.float32(2981.4397), np.float32(2083.9536), np.float32(2187.3044), np.float32(1233.9027), np.float32(1832.5466), np.float32(1275.2582)]
2025-09-14 15:34:35,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:34:35,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1996.12) for latency 18
2025-09-14 15:34:35,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 7 minutes, 51 seconds)
2025-09-14 15:37:31,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:37:41,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1568.64966 ± 275.486
2025-09-14 15:37:41,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1355.9397), np.float32(1341.3306), np.float32(1632.4241), np.float32(1637.0), np.float32(1772.3334), np.float32(1176.1461), np.float32(2026.7084), np.float32(1352.4102), np.float32(1986.4933), np.float32(1405.7107)]
2025-09-14 15:37:41,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:37:41,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 2 minutes, 48 seconds)
2025-09-14 15:40:37,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:40:46,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1706.75903 ± 303.156
2025-09-14 15:40:46,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2269.4233), np.float32(1438.369), np.float32(1844.9078), np.float32(1570.4988), np.float32(2027.2721), np.float32(1317.2455), np.float32(1875.8097), np.float32(1584.423), np.float32(1850.2306), np.float32(1289.4098)]
2025-09-14 15:40:46,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:40:46,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 58 minutes, 47 seconds)
2025-09-14 15:44:01,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:44:12,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1610.06018 ± 242.202
2025-09-14 15:44:12,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1471.8499), np.float32(1530.3053), np.float32(1493.1726), np.float32(1632.1226), np.float32(1229.2601), np.float32(1779.7007), np.float32(1739.5298), np.float32(1531.6581), np.float32(1501.1752), np.float32(2191.8271)]
2025-09-14 15:44:12,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:44:12,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 4 seconds)
2025-09-14 15:47:30,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:47:41,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1499.41675 ± 238.914
2025-09-14 15:47:41,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1220.9656), np.float32(1699.101), np.float32(1535.8538), np.float32(1541.215), np.float32(1573.7021), np.float32(1935.1571), np.float32(1747.528), np.float32(1283.5394), np.float32(1224.6058), np.float32(1232.4998)]
2025-09-14 15:47:41,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:47:41,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 1 minute, 23 seconds)
2025-09-14 15:50:45,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:50:55,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1446.25452 ± 553.597
2025-09-14 15:50:55,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1903.0895), np.float32(1599.7854), np.float32(-13.7983465), np.float32(1620.0452), np.float32(1836.5529), np.float32(1399.9114), np.float32(2093.0305), np.float32(1238.1528), np.float32(1236.894), np.float32(1548.8806)]
2025-09-14 15:50:55,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:50:55,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 59 minutes, 40 seconds)
2025-09-14 15:53:55,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:54:04,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2371.08862 ± 580.806
2025-09-14 15:54:04,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2377.2363), np.float32(1817.1158), np.float32(2870.6672), np.float32(1903.5302), np.float32(2985.9277), np.float32(2757.247), np.float32(2729.0847), np.float32(1654.9374), np.float32(1464.3644), np.float32(3150.776)]
2025-09-14 15:54:04,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:54:04,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2371.09) for latency 18
2025-09-14 15:54:04,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 56 minutes, 53 seconds)
2025-09-14 15:56:57,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 15:57:07,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1987.96094 ± 453.639
2025-09-14 15:57:07,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1681.5392), np.float32(1797.0791), np.float32(2356.9717), np.float32(1855.0879), np.float32(1640.5858), np.float32(3016.5632), np.float32(1989.553), np.float32(2319.2546), np.float32(1929.1423), np.float32(1293.8314)]
2025-09-14 15:57:07,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:57:07,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 53 minutes, 17 seconds)
2025-09-14 16:00:03,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:00:13,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1589.49390 ± 225.609
2025-09-14 16:00:13,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1577.8538), np.float32(1682.3586), np.float32(1546.7415), np.float32(1816.4978), np.float32(1397.7714), np.float32(1474.5272), np.float32(1705.481), np.float32(1328.6467), np.float32(1296.4618), np.float32(2068.5986)]
2025-09-14 16:00:13,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:00:13,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 46 minutes, 37 seconds)
2025-09-14 16:03:10,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:03:20,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1573.96069 ± 247.457
2025-09-14 16:03:20,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1442.2826), np.float32(1722.5375), np.float32(1990.4147), np.float32(1262.7471), np.float32(1910.3132), np.float32(1643.0983), np.float32(1486.8137), np.float32(1429.1595), np.float32(1188.992), np.float32(1663.248)]
2025-09-14 16:03:20,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:03:20,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 39 minutes, 33 seconds)
2025-09-14 16:06:16,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:06:25,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1923.66528 ± 502.542
2025-09-14 16:06:25,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1395.5327), np.float32(1794.1576), np.float32(1338.9312), np.float32(1602.4906), np.float32(1442.8389), np.float32(2375.089), np.float32(2812.3928), np.float32(2084.0894), np.float32(2635.06), np.float32(1756.0704)]
2025-09-14 16:06:25,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:06:25,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 35 minutes)
2025-09-14 16:09:20,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:09:29,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1757.19214 ± 462.322
2025-09-14 16:09:29,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1231.9122), np.float32(1621.3047), np.float32(2832.76), np.float32(1412.4618), np.float32(1916.6466), np.float32(1865.756), np.float32(2154.3086), np.float32(1820.3475), np.float32(1516.4445), np.float32(1199.9797)]
2025-09-14 16:09:29,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:09:29,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 31 minutes, 4 seconds)
2025-09-14 16:12:45,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:12:57,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1684.32654 ± 755.195
2025-09-14 16:12:57,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1531.0807), np.float32(3222.254), np.float32(212.07222), np.float32(1374.4294), np.float32(1525.29), np.float32(1546.0934), np.float32(1352.281), np.float32(1709.3179), np.float32(2628.9797), np.float32(1741.4684)]
2025-09-14 16:12:57,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:12:57,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 31 minutes, 57 seconds)
2025-09-14 16:16:15,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:16:26,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1623.14539 ± 267.533
2025-09-14 16:16:26,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1894.9598), np.float32(1588.7063), np.float32(1285.8112), np.float32(1362.8143), np.float32(2097.9878), np.float32(1687.5446), np.float32(1500.4502), np.float32(1364.7806), np.float32(1471.9623), np.float32(1976.4379)]
2025-09-14 16:16:26,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:16:26,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 32 minutes, 26 seconds)
2025-09-14 16:19:29,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:19:39,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1738.59839 ± 581.215
2025-09-14 16:19:39,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2261.8752), np.float32(1411.4708), np.float32(1337.4779), np.float32(1677.2125), np.float32(1584.8177), np.float32(1715.6547), np.float32(1529.9828), np.float32(3277.459), np.float32(1254.4097), np.float32(1335.6229)]
2025-09-14 16:19:39,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:19:39,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 30 minutes, 7 seconds)
2025-09-14 16:22:36,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:22:45,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2089.31689 ± 558.292
2025-09-14 16:22:45,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2766.785), np.float32(1577.1644), np.float32(1365.4546), np.float32(2140.5112), np.float32(2983.6128), np.float32(1382.2328), np.float32(2403.7117), np.float32(1752.4459), np.float32(1874.4352), np.float32(2646.8142)]
2025-09-14 16:22:45,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:22:45,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 26 minutes, 59 seconds)
2025-09-14 16:25:40,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:25:50,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1659.89587 ± 389.573
2025-09-14 16:25:50,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2253.9888), np.float32(1393.612), np.float32(1464.5725), np.float32(1473.18), np.float32(2480.2173), np.float32(1611.8685), np.float32(1318.1854), np.float32(1437.0731), np.float32(1293.8475), np.float32(1872.414)]
2025-09-14 16:25:50,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:25:50,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 23 minutes, 55 seconds)
2025-09-14 16:28:46,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:28:55,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1673.95740 ± 260.817
2025-09-14 16:28:55,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1525.1722), np.float32(1539.3832), np.float32(1551.441), np.float32(1849.7793), np.float32(1236.3369), np.float32(1697.681), np.float32(1806.5872), np.float32(1509.0548), np.float32(1758.5103), np.float32(2265.6284)]
2025-09-14 16:28:55,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:28:55,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 17 minutes, 24 seconds)
2025-09-14 16:31:51,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:32:01,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2253.11475 ± 513.834
2025-09-14 16:32:01,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1305.3315), np.float32(1713.0826), np.float32(2813.6445), np.float32(2986.6953), np.float32(2807.3274), np.float32(2058.5498), np.float32(2297.9788), np.float32(1788.1141), np.float32(2415.1318), np.float32(2345.292)]
2025-09-14 16:32:01,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:32:01,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 10 minutes, 49 seconds)
2025-09-14 16:34:56,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:35:05,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1890.47290 ± 423.978
2025-09-14 16:35:05,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1551.7515), np.float32(1274.342), np.float32(2549.5168), np.float32(1662.0925), np.float32(1613.5776), np.float32(1594.0702), np.float32(2662.649), np.float32(1929.9653), np.float32(2115.666), np.float32(1951.0985)]
2025-09-14 16:35:05,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:35:05,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 6 minutes, 35 seconds)
2025-09-14 16:37:59,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:38:09,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1879.31763 ± 637.083
2025-09-14 16:38:09,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(824.6714), np.float32(1960.6442), np.float32(1406.3127), np.float32(1403.9204), np.float32(2062.2405), np.float32(2286.843), np.float32(1682.5461), np.float32(2061.2014), np.float32(3362.5557), np.float32(1742.2404)]
2025-09-14 16:38:09,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:38:09,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 3 minutes, 13 seconds)
2025-09-14 16:41:29,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:41:40,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1839.00757 ± 520.298
2025-09-14 16:41:40,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1335.9626), np.float32(2344.869), np.float32(1234.3691), np.float32(1538.6766), np.float32(2164.1648), np.float32(2876.7244), np.float32(1398.293), np.float32(2170.1294), np.float32(1975.1006), np.float32(1351.7872)]
2025-09-14 16:41:40,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:41:40,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 3 minutes, 26 seconds)
2025-09-14 16:44:58,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:45:09,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1535.60364 ± 624.460
2025-09-14 16:45:09,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2501.2368), np.float32(1197.9746), np.float32(1713.9495), np.float32(1398.805), np.float32(2571.5894), np.float32(1745.1783), np.float32(317.39893), np.float32(1173.9658), np.float32(1452.6653), np.float32(1283.2737)]
2025-09-14 16:45:09,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:45:09,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 3 minutes, 21 seconds)
2025-09-14 16:48:10,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:48:20,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1485.50842 ± 457.241
2025-09-14 16:48:20,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(774.7936), np.float32(1438.8838), np.float32(1420.5969), np.float32(1398.1384), np.float32(1414.4739), np.float32(1427.0364), np.float32(1316.0675), np.float32(1324.135), np.float32(1636.6398), np.float32(2704.3188)]
2025-09-14 16:48:20,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:48:20,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 50 seconds)
2025-09-14 16:51:16,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:51:26,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2017.26404 ± 548.925
2025-09-14 16:51:26,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1268.6149), np.float32(1953.301), np.float32(2417.2573), np.float32(2789.964), np.float32(1811.52), np.float32(1658.0221), np.float32(1831.4941), np.float32(3099.6406), np.float32(1477.8899), np.float32(1864.9363)]
2025-09-14 16:51:26,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:51:26,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 57 minutes, 43 seconds)
2025-09-14 16:54:21,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:54:31,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2087.84912 ± 765.340
2025-09-14 16:54:31,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1605.1128), np.float32(1698.1967), np.float32(1453.3436), np.float32(1417.0526), np.float32(1452.9738), np.float32(1712.6968), np.float32(2400.3784), np.float32(2143.5044), np.float32(3514.4111), np.float32(3480.8193)]
2025-09-14 16:54:31,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:54:31,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 54 minutes, 28 seconds)
2025-09-14 16:57:25,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 16:57:35,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2135.23706 ± 583.405
2025-09-14 16:57:35,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1565.3585), np.float32(1604.913), np.float32(2921.5676), np.float32(3197.9734), np.float32(1617.5221), np.float32(2168.563), np.float32(1830.0581), np.float32(2197.2795), np.float32(2709.178), np.float32(1539.9597)]
2025-09-14 16:57:35,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:57:35,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 48 minutes, 18 seconds)
2025-09-14 17:00:31,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:00:41,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1928.36267 ± 382.683
2025-09-14 17:00:41,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2536.5938), np.float32(1689.3046), np.float32(1979.4391), np.float32(1571.3534), np.float32(1936.6927), np.float32(2084.4272), np.float32(2573.7742), np.float32(2011.9607), np.float32(1420.4829), np.float32(1479.598)]
2025-09-14 17:00:41,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:00:41,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 42 minutes, 27 seconds)
2025-09-14 17:03:35,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:03:44,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1749.41479 ± 484.423
2025-09-14 17:03:44,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1324.6256), np.float32(2735.1213), np.float32(1576.8364), np.float32(1336.6276), np.float32(1276.55), np.float32(2403.5247), np.float32(2088.1853), np.float32(1693.0563), np.float32(1276.2701), np.float32(1783.3501)]
2025-09-14 17:03:44,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:03:44,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 38 minutes, 30 seconds)
2025-09-14 17:06:43,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:06:54,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1879.01245 ± 493.751
2025-09-14 17:06:54,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2660.2556), np.float32(1776.1704), np.float32(1367.848), np.float32(1476.5212), np.float32(2295.7268), np.float32(1611.3977), np.float32(1714.5753), np.float32(2769.1724), np.float32(1309.7488), np.float32(1808.7085)]
2025-09-14 17:06:54,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:06:54,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 35 minutes, 48 seconds)
2025-09-14 17:10:13,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:10:24,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1894.35413 ± 553.711
2025-09-14 17:10:24,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2357.2034), np.float32(1404.7963), np.float32(1728.1304), np.float32(1809.0503), np.float32(1599.7335), np.float32(2559.2344), np.float32(1309.9502), np.float32(1830.7477), np.float32(3052.1624), np.float32(1292.5325)]
2025-09-14 17:10:24,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:10:24,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 35 minutes, 18 seconds)
2025-09-14 17:13:43,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:13:52,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2210.56689 ± 695.835
2025-09-14 17:13:52,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2677.9282), np.float32(3063.9949), np.float32(1589.7129), np.float32(1692.6066), np.float32(1521.1714), np.float32(3165.5972), np.float32(1532.5641), np.float32(3165.6787), np.float32(2179.4658), np.float32(1516.9484)]
2025-09-14 17:13:52,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:13:52,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 34 minutes, 27 seconds)
2025-09-14 17:16:52,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:17:02,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1912.12085 ± 594.179
2025-09-14 17:17:02,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3461.0515), np.float32(2071.9087), np.float32(2091.9744), np.float32(1474.8342), np.float32(2008.4109), np.float32(1625.9352), np.float32(1346.2963), np.float32(2068.6597), np.float32(1281.9481), np.float32(1690.1907)]
2025-09-14 17:17:02,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:17:02,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 31 minutes, 35 seconds)
2025-09-14 17:19:58,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:20:08,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1932.49316 ± 703.332
2025-09-14 17:20:08,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2862.9458), np.float32(1960.1903), np.float32(2054.8318), np.float32(1436.0295), np.float32(1557.559), np.float32(2337.0586), np.float32(1992.3491), np.float32(2739.1628), np.float32(253.04214), np.float32(2131.7617)]
2025-09-14 17:20:08,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:20:08,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 28 minutes, 32 seconds)
2025-09-14 17:23:03,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:23:12,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1740.22620 ± 400.761
2025-09-14 17:23:12,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1391.2429), np.float32(1623.6875), np.float32(1837.7759), np.float32(1284.689), np.float32(2343.4158), np.float32(1697.0525), np.float32(2495.3066), np.float32(1251.5043), np.float32(1937.5281), np.float32(1540.0596)]
2025-09-14 17:23:12,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:23:12,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 24 minutes, 46 seconds)
2025-09-14 17:26:06,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:26:16,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2113.69971 ± 706.758
2025-09-14 17:26:16,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3000.8176), np.float32(1496.7196), np.float32(1233.6849), np.float32(1443.769), np.float32(3375.0908), np.float32(2027.9729), np.float32(2292.829), np.float32(1468.433), np.float32(1954.6152), np.float32(2843.064)]
2025-09-14 17:26:16,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:26:16,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 19 minutes, 20 seconds)
2025-09-14 17:29:11,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:29:21,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2000.07886 ± 487.878
2025-09-14 17:29:21,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1559.0465), np.float32(1983.1523), np.float32(1827.1875), np.float32(2816.9817), np.float32(2692.8694), np.float32(2152.1147), np.float32(1495.4744), np.float32(1492.7931), np.float32(2476.9065), np.float32(1504.2642)]
2025-09-14 17:29:21,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:29:21,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 14 minutes, 17 seconds)
2025-09-14 17:32:17,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:32:27,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2575.36499 ± 627.123
2025-09-14 17:32:27,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3019.8997), np.float32(1493.7366), np.float32(3012.286), np.float32(2227.9468), np.float32(3623.0928), np.float32(2022.5686), np.float32(2081.3176), np.float32(2234.7573), np.float32(3238.8235), np.float32(2799.22)]
2025-09-14 17:32:27,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:32:27,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2575.36) for latency 18
2025-09-14 17:32:27,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 10 minutes, 52 seconds)
2025-09-14 17:35:36,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:35:47,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2250.31226 ± 662.885
2025-09-14 17:35:47,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2511.0742), np.float32(1353.1239), np.float32(2236.4082), np.float32(1563.5092), np.float32(3255.31), np.float32(2476.1572), np.float32(1484.3091), np.float32(3298.7646), np.float32(1783.5685), np.float32(2540.8977)]
2025-09-14 17:35:47,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:35:47,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 8 minutes, 51 seconds)
2025-09-14 17:39:06,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:39:16,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1858.01733 ± 249.550
2025-09-14 17:39:16,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1874.4326), np.float32(2119.7908), np.float32(1702.037), np.float32(1528.3478), np.float32(1782.2218), np.float32(1829.7041), np.float32(1784.9705), np.float32(1903.1064), np.float32(1609.8151), np.float32(2445.7468)]
2025-09-14 17:39:16,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:39:16,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 7 minutes, 31 seconds)
2025-09-14 17:42:27,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:42:36,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2148.94336 ± 648.256
2025-09-14 17:42:36,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3216.4011), np.float32(1714.8658), np.float32(2538.0967), np.float32(1969.7201), np.float32(3184.2522), np.float32(1548.9996), np.float32(1475.1079), np.float32(2419.9067), np.float32(2128.2432), np.float32(1293.8402)]
2025-09-14 17:42:36,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:42:36,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 5 minutes, 22 seconds)
2025-09-14 17:45:34,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:45:44,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1523.29224 ± 562.143
2025-09-14 17:45:44,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1627.3125), np.float32(77.449715), np.float32(1421.5546), np.float32(1332.7378), np.float32(2378.8447), np.float32(1482.085), np.float32(1871.6338), np.float32(1575.6436), np.float32(1550.4297), np.float32(1915.2306)]
2025-09-14 17:45:44,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:45:44,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 2 minutes, 14 seconds)
2025-09-14 17:48:39,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:48:49,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1918.59741 ± 717.384
2025-09-14 17:48:49,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2090.3635), np.float32(1388.5056), np.float32(1369.2947), np.float32(1905.6936), np.float32(2543.4253), np.float32(3205.112), np.float32(536.48236), np.float32(2610.7275), np.float32(1603.9788), np.float32(1932.3904)]
2025-09-14 17:48:49,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:48:49,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 58 minutes, 57 seconds)
2025-09-14 17:51:45,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:51:55,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1758.88904 ± 363.371
2025-09-14 17:51:55,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1655.5831), np.float32(1467.245), np.float32(1796.602), np.float32(1425.6362), np.float32(1908.1595), np.float32(2213.65), np.float32(1636.6472), np.float32(2576.028), np.float32(1367.1216), np.float32(1542.2186)]
2025-09-14 17:51:55,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:51:55,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 54 minutes, 51 seconds)
2025-09-14 17:54:50,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:55:00,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2107.95850 ± 401.546
2025-09-14 17:55:00,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1891.8104), np.float32(1689.8286), np.float32(2995.8608), np.float32(1873.4891), np.float32(2338.196), np.float32(1879.1282), np.float32(2185.2012), np.float32(2550.122), np.float32(1620.1879), np.float32(2055.7627)]
2025-09-14 17:55:00,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:55:00,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 50 minutes, 19 seconds)
2025-09-14 17:57:55,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 17:58:03,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2156.30737 ± 670.174
2025-09-14 17:58:03,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1807.8556), np.float32(1479.952), np.float32(1785.5815), np.float32(2306.832), np.float32(1461.1091), np.float32(3316.004), np.float32(1855.3153), np.float32(1540.862), np.float32(3091.303), np.float32(2918.2605)]
2025-09-14 17:58:03,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:58:03,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 46 minutes, 20 seconds)
2025-09-14 18:01:01,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:01:12,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2190.92090 ± 526.177
2025-09-14 18:01:12,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2410.7197), np.float32(2850.975), np.float32(1992.7201), np.float32(1754.0381), np.float32(2167.223), np.float32(1728.8771), np.float32(1348.9824), np.float32(1852.1984), np.float32(2845.4856), np.float32(2957.9883)]
2025-09-14 18:01:12,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:01:12,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 43 minutes, 18 seconds)
2025-09-14 18:04:32,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:04:43,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2014.24194 ± 609.267
2025-09-14 18:04:43,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2910.8667), np.float32(2120.238), np.float32(1798.3666), np.float32(1358.224), np.float32(1957.0104), np.float32(3216.4727), np.float32(1361.7028), np.float32(1673.8518), np.float32(2314.2256), np.float32(1431.4625)]
2025-09-14 18:04:43,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:04:43,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 41 minutes, 18 seconds)
2025-09-14 18:08:01,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:08:11,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1772.28906 ± 570.260
2025-09-14 18:08:11,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1954.3438), np.float32(1268.4226), np.float32(1548.8447), np.float32(1300.8741), np.float32(3093.5945), np.float32(1453.9822), np.float32(1595.8223), np.float32(2558.7656), np.float32(1378.9413), np.float32(1569.3)]
2025-09-14 18:08:11,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:08:11,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 39 minutes, 3 seconds)
2025-09-14 18:11:12,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:11:22,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1977.03003 ± 730.232
2025-09-14 18:11:22,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1601.9766), np.float32(2272.103), np.float32(3613.864), np.float32(1346.2307), np.float32(1656.0454), np.float32(1327.9833), np.float32(1267.8933), np.float32(2960.4546), np.float32(1919.763), np.float32(1803.9854)]
2025-09-14 18:11:22,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:11:22,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 35 minutes, 59 seconds)
2025-09-14 18:14:18,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:14:28,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2219.44897 ± 763.571
2025-09-14 18:14:28,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1893.6355), np.float32(1339.7048), np.float32(1462.6517), np.float32(3532.5464), np.float32(3771.5208), np.float32(1901.5314), np.float32(2093.8137), np.float32(2261.5823), np.float32(1962.2687), np.float32(1975.2318)]
2025-09-14 18:14:28,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:14:28,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 32 minutes, 49 seconds)
2025-09-14 18:17:24,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:17:33,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1957.65881 ± 679.215
2025-09-14 18:17:33,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2130.7332), np.float32(1323.4713), np.float32(1330.3909), np.float32(2192.1272), np.float32(1645.6007), np.float32(1416.9515), np.float32(1212.1078), np.float32(2055.0125), np.float32(3085.2903), np.float32(3184.9019)]
2025-09-14 18:17:33,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:17:33,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 29 minutes, 25 seconds)
2025-09-14 18:20:27,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:20:37,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2236.65698 ± 869.644
2025-09-14 18:20:37,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2581.9211), np.float32(1310.7213), np.float32(3697.9438), np.float32(1814.0796), np.float32(3427.9893), np.float32(3240.7527), np.float32(1777.633), np.float32(1424.6471), np.float32(1447.4348), np.float32(1643.448)]
2025-09-14 18:20:37,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:20:37,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 25 minutes, 26 seconds)
2025-09-14 18:23:33,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:23:43,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2263.44043 ± 790.473
2025-09-14 18:23:43,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1973.7109), np.float32(1509.9375), np.float32(1493.1831), np.float32(3762.745), np.float32(2613.0183), np.float32(2621.5007), np.float32(3270.5078), np.float32(2528.222), np.float32(1561.6294), np.float32(1299.9503)]
2025-09-14 18:23:43,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:23:43,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 21 minutes, 44 seconds)
2025-09-14 18:26:39,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:26:49,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1878.37207 ± 509.947
2025-09-14 18:26:49,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1954.1927), np.float32(1695.7183), np.float32(2023.3771), np.float32(1512.3787), np.float32(1479.1427), np.float32(2505.0007), np.float32(1838.7213), np.float32(1368.607), np.float32(1372.1013), np.float32(3034.4797)]
2025-09-14 18:26:49,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:26:49,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 18 minutes, 32 seconds)
2025-09-14 18:29:57,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:30:09,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1998.94849 ± 762.721
2025-09-14 18:30:09,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(895.60565), np.float32(1482.8616), np.float32(2997.416), np.float32(1300.4667), np.float32(2195.571), np.float32(1338.8334), np.float32(2173.8508), np.float32(2241.7847), np.float32(3504.0583), np.float32(1859.0347)]
2025-09-14 18:30:09,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:30:09,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 15 minutes, 40 seconds)
2025-09-14 18:33:23,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:33:34,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1761.94824 ± 381.117
2025-09-14 18:33:34,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1665.7164), np.float32(2169.5203), np.float32(1860.5316), np.float32(1376.5122), np.float32(2388.8174), np.float32(1388.3513), np.float32(1410.5803), np.float32(1432.9044), np.float32(2333.0806), np.float32(1593.4686)]
2025-09-14 18:33:34,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:33:34,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 12 minutes, 48 seconds)
2025-09-14 18:36:38,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:36:47,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2351.23877 ± 741.346
2025-09-14 18:36:47,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2787.958), np.float32(3253.4893), np.float32(1353.802), np.float32(2585.1187), np.float32(3678.1216), np.float32(2053.6216), np.float32(2021.5646), np.float32(1846.767), np.float32(1283.2025), np.float32(2648.7415)]
2025-09-14 18:36:47,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:36:47,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 9 minutes, 42 seconds)
2025-09-14 18:39:42,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:39:52,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2014.20581 ± 584.450
2025-09-14 18:39:52,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1859.2415), np.float32(2044.8458), np.float32(1930.8367), np.float32(1372.5508), np.float32(2135.1685), np.float32(1627.6029), np.float32(3413.183), np.float32(1240.3123), np.float32(2002.2092), np.float32(2516.108)]
2025-09-14 18:39:52,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:39:52,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 27 seconds)
2025-09-14 18:42:42,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:42:50,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2435.77832 ± 634.449
2025-09-14 18:42:50,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2926.563), np.float32(1604.6938), np.float32(2056.079), np.float32(2199.9995), np.float32(3335.225), np.float32(1590.6082), np.float32(3064.2185), np.float32(1764.124), np.float32(3105.9988), np.float32(2710.274)]
2025-09-14 18:42:50,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:42:50,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 12 seconds)
2025-09-14 18:45:39,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-14 18:45:48,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1925.07886 ± 536.128
2025-09-14 18:45:48,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1851.0448), np.float32(1854.9037), np.float32(1546.7104), np.float32(1496.4789), np.float32(1764.6787), np.float32(2780.317), np.float32(3124.0725), np.float32(1760.2198), np.float32(1456.6108), np.float32(1615.7529)]
2025-09-14 18:45:48,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:45:48,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1251 [DEBUG]: Training session finished
