2025-09-14 15:10:00,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.025-delay_24
2025-09-14 15:10:00,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.025-delay_24
2025-09-14 15:10:00,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'24': <latency_env.delayed_mdp.ConstantDelay object at 0x7fd1242c7c50>}
2025-09-14 15:10:00,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 15:10:00,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 15:10:01,065 baseline-bpql-noisepromille25-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=161, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 15:10:01,065 baseline-bpql-noisepromille25-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 15:10:02,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 15:10:02,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 15:12:14,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:12:23,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: -260.34006 ± 25.322
2025-09-14 15:12:23,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-276.95703), np.float32(-227.20654), np.float32(-299.57513), np.float32(-261.68024), np.float32(-290.1746), np.float32(-266.06174), np.float32(-271.65), np.float32(-230.45035), np.float32(-258.67557), np.float32(-220.96957)]
2025-09-14 15:12:23,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:12:23,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (-260.34) for latency 24
2025-09-14 15:12:23,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 52 minutes, 48 seconds)
2025-09-14 15:14:39,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:14:48,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: -299.21158 ± 39.063
2025-09-14 15:14:48,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-280.07834), np.float32(-258.47247), np.float32(-320.38834), np.float32(-283.76294), np.float32(-254.83066), np.float32(-247.47095), np.float32(-360.78833), np.float32(-316.88345), np.float32(-309.48016), np.float32(-359.96008)]
2025-09-14 15:14:48,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:14:48,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 53 minutes, 11 seconds)
2025-09-14 15:17:03,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:17:12,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: -73.44876 ± 117.927
2025-09-14 15:17:12,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(111.89526), np.float32(-153.51877), np.float32(-165.55373), np.float32(70.72568), np.float32(-160.89867), np.float32(-147.4992), np.float32(-19.809198), np.float32(-191.20209), np.float32(-175.0154), np.float32(96.38847)]
2025-09-14 15:17:12,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:17:12,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (-73.45) for latency 24
2025-09-14 15:17:12,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 51 minutes, 28 seconds)
2025-09-14 15:19:27,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:19:36,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: -13.34562 ± 165.797
2025-09-14 15:19:36,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(14.689222), np.float32(-19.46528), np.float32(225.82993), np.float32(192.78442), np.float32(-134.1801), np.float32(-185.97655), np.float32(163.12306), np.float32(50.83329), np.float32(-154.06381), np.float32(-287.03043)]
2025-09-14 15:19:36,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:19:36,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (-13.35) for latency 24
2025-09-14 15:19:36,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 49 minutes, 19 seconds)
2025-09-14 15:21:52,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:22:01,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 67.61111 ± 122.491
2025-09-14 15:22:01,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-48.64073), np.float32(382.20935), np.float32(-17.485899), np.float32(2.9225378), np.float32(75.89062), np.float32(-37.198845), np.float32(139.5578), np.float32(113.38836), np.float32(-19.078373), np.float32(84.54634)]
2025-09-14 15:22:01,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:22:01,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (67.61) for latency 24
2025-09-14 15:22:01,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 47 minutes, 29 seconds)
2025-09-14 15:24:31,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:24:41,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 308.91476 ± 120.025
2025-09-14 15:24:41,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(254.02861), np.float32(157.06712), np.float32(351.81012), np.float32(587.18036), np.float32(206.29767), np.float32(183.71416), np.float32(376.34222), np.float32(270.31015), np.float32(389.2851), np.float32(313.11212)]
2025-09-14 15:24:41,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:24:41,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (308.91) for latency 24
2025-09-14 15:24:41,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 51 minutes, 12 seconds)
2025-09-14 15:27:05,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:27:14,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 483.15869 ± 111.584
2025-09-14 15:27:14,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(321.69193), np.float32(591.9588), np.float32(375.54608), np.float32(318.89746), np.float32(605.77924), np.float32(467.44193), np.float32(577.80005), np.float32(512.4211), np.float32(626.33026), np.float32(433.7204)]
2025-09-14 15:27:14,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:27:14,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (483.16) for latency 24
2025-09-14 15:27:14,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 51 minutes, 27 seconds)
2025-09-14 15:29:33,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:29:42,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 602.57874 ± 57.771
2025-09-14 15:29:42,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(544.64886), np.float32(719.666), np.float32(605.35675), np.float32(537.78455), np.float32(633.5496), np.float32(537.6294), np.float32(567.4872), np.float32(575.10754), np.float32(652.5995), np.float32(651.95776)]
2025-09-14 15:29:42,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:29:42,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (602.58) for latency 24
2025-09-14 15:29:42,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 50 minutes)
2025-09-14 15:32:00,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:32:09,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 730.63782 ± 302.947
2025-09-14 15:32:09,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(867.6552), np.float32(324.0989), np.float32(848.426), np.float32(826.35486), np.float32(22.061384), np.float32(975.0158), np.float32(908.2737), np.float32(625.82886), np.float32(1010.3217), np.float32(898.3413)]
2025-09-14 15:32:09,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:32:09,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (730.64) for latency 24
2025-09-14 15:32:09,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 48 minutes, 37 seconds)
2025-09-14 15:34:28,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:34:37,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 914.91034 ± 186.550
2025-09-14 15:34:37,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1031.8862), np.float32(970.13763), np.float32(513.82355), np.float32(974.0276), np.float32(1051.6289), np.float32(937.3152), np.float32(596.00903), np.float32(1102.64), np.float32(1004.78723), np.float32(966.84766)]
2025-09-14 15:34:37,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:34:37,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (914.91) for latency 24
2025-09-14 15:34:37,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 46 minutes, 48 seconds)
2025-09-14 15:36:56,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:37:05,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1071.17993 ± 96.485
2025-09-14 15:37:05,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1045.0431), np.float32(1088.3119), np.float32(991.35095), np.float32(1084.652), np.float32(977.37726), np.float32(1089.1248), np.float32(971.451), np.float32(1021.9181), np.float32(1318.7683), np.float32(1123.802)]
2025-09-14 15:37:05,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:37:05,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (1071.18) for latency 24
2025-09-14 15:37:05,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 40 minutes, 46 seconds)
2025-09-14 15:39:23,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:39:32,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1108.80212 ± 134.468
2025-09-14 15:39:32,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1123.3727), np.float32(1085.7598), np.float32(1309.3773), np.float32(983.0648), np.float32(1042.324), np.float32(1053.1289), np.float32(1063.4143), np.float32(1413.756), np.float32(1025.5524), np.float32(988.272)]
2025-09-14 15:39:32,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:39:32,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (1108.80) for latency 24
2025-09-14 15:39:32,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 3 hours, 36 minutes, 27 seconds)
2025-09-14 15:41:50,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:41:59,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1069.53186 ± 114.923
2025-09-14 15:41:59,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(892.1256), np.float32(1012.19995), np.float32(997.0011), np.float32(991.9992), np.float32(1130.7502), np.float32(1040.7411), np.float32(1018.56256), np.float32(1308.4669), np.float32(1216.0071), np.float32(1087.4663)]
2025-09-14 15:41:59,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:41:59,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 3 hours, 33 minutes, 55 seconds)
2025-09-14 15:44:17,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:44:26,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1055.61328 ± 84.172
2025-09-14 15:44:26,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1074.1465), np.float32(998.10815), np.float32(1011.0679), np.float32(951.2997), np.float32(1122.0538), np.float32(932.69196), np.float32(1100.9902), np.float32(1091.9875), np.float32(1041.4629), np.float32(1232.324)]
2025-09-14 15:44:26,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:44:26,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 3 hours, 31 minutes, 20 seconds)
2025-09-14 15:46:45,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:46:54,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1251.91467 ± 192.540
2025-09-14 15:46:54,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1431.5012), np.float32(1175.4535), np.float32(1722.9805), np.float32(1149.4792), np.float32(1302.8878), np.float32(1150.1), np.float32(1153.2225), np.float32(1281.0905), np.float32(989.82025), np.float32(1162.6115)]
2025-09-14 15:46:54,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:46:54,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (1251.91) for latency 24
2025-09-14 15:46:54,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 3 hours, 28 minutes, 48 seconds)
2025-09-14 15:49:12,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:49:21,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1140.76880 ± 401.256
2025-09-14 15:49:21,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1142.5093), np.float32(1702.7849), np.float32(1623.0104), np.float32(1010.2481), np.float32(1381.8278), np.float32(1152.4146), np.float32(954.6998), np.float32(166.40268), np.float32(1170.964), np.float32(1102.8259)]
2025-09-14 15:49:21,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:49:21,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 25 minutes, 51 seconds)
2025-09-14 15:51:39,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:51:48,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1325.76489 ± 230.680
2025-09-14 15:51:48,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1157.8925), np.float32(1223.2412), np.float32(1140.9574), np.float32(1709.8145), np.float32(1090.0026), np.float32(1372.1), np.float32(1605.2291), np.float32(1240.7604), np.float32(1647.3994), np.float32(1070.2523)]
2025-09-14 15:51:48,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:51:48,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (1325.76) for latency 24
2025-09-14 15:51:48,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 23 minutes, 28 seconds)
2025-09-14 15:54:06,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:54:15,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1296.05676 ± 175.053
2025-09-14 15:54:15,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1616.3844), np.float32(1319.8823), np.float32(1301.0028), np.float32(1168.0992), np.float32(1129.1354), np.float32(1561.6072), np.float32(1073.1682), np.float32(1377.3452), np.float32(1116.6324), np.float32(1297.3099)]
2025-09-14 15:54:15,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:54:15,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 21 minutes, 10 seconds)
2025-09-14 15:56:34,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:56:43,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1496.84949 ± 297.352
2025-09-14 15:56:43,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1246.6036), np.float32(1353.7826), np.float32(1881.528), np.float32(1318.9857), np.float32(1212.1982), np.float32(2089.7432), np.float32(1304.8276), np.float32(1293.2936), np.float32(1824.9851), np.float32(1442.5466)]
2025-09-14 15:56:43,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:56:43,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (1496.85) for latency 24
2025-09-14 15:56:43,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 18 minutes, 48 seconds)
2025-09-14 15:59:01,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:59:10,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1208.94495 ± 148.270
2025-09-14 15:59:10,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1091.4537), np.float32(1293.5168), np.float32(1199.1866), np.float32(1108.8546), np.float32(1122.3339), np.float32(1165.9558), np.float32(1619.0009), np.float32(1213.7439), np.float32(1161.1621), np.float32(1114.241)]
2025-09-14 15:59:10,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:59:10,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 16 minutes, 22 seconds)
2025-09-14 16:01:28,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:01:37,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1377.83679 ± 190.762
2025-09-14 16:01:37,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1163.0487), np.float32(1512.3519), np.float32(1281.6744), np.float32(1388.8676), np.float32(1275.6478), np.float32(1328.2233), np.float32(1409.081), np.float32(1186.1593), np.float32(1366.0464), np.float32(1867.2687)]
2025-09-14 16:01:37,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:01:37,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 13 minutes, 56 seconds)
2025-09-14 16:03:56,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:04:05,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1404.17151 ± 158.390
2025-09-14 16:04:05,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1288.0215), np.float32(1487.5492), np.float32(1636.0085), np.float32(1405.5707), np.float32(1174.7762), np.float32(1347.7721), np.float32(1393.1907), np.float32(1204.8319), np.float32(1688.5945), np.float32(1415.3992)]
2025-09-14 16:04:05,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:04:05,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 11 minutes, 36 seconds)
2025-09-14 16:06:23,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:06:32,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1213.63550 ± 115.713
2025-09-14 16:06:32,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1135.6008), np.float32(1184.5708), np.float32(1154.8029), np.float32(1314.3716), np.float32(1336.7957), np.float32(1026.0264), np.float32(1169.3202), np.float32(1445.4869), np.float32(1237.5692), np.float32(1131.812)]
2025-09-14 16:06:32,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:06:32,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 9 minutes, 3 seconds)
2025-09-14 16:08:50,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:08:59,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1376.07593 ± 113.806
2025-09-14 16:08:59,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1510.2684), np.float32(1421.1495), np.float32(1280.154), np.float32(1574.7804), np.float32(1279.9509), np.float32(1299.6191), np.float32(1466.0751), np.float32(1299.0004), np.float32(1204.9318), np.float32(1424.8301)]
2025-09-14 16:08:59,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:08:59,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 6 minutes, 37 seconds)
2025-09-14 16:11:18,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:11:27,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1261.84094 ± 145.596
2025-09-14 16:11:27,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1101.2206), np.float32(1361.4952), np.float32(1183.6635), np.float32(1171.5813), np.float32(1264.1923), np.float32(1049.4182), np.float32(1236.3865), np.float32(1338.4824), np.float32(1588.1923), np.float32(1323.7772)]
2025-09-14 16:11:27,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:11:27,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 4 minutes, 8 seconds)
2025-09-14 16:13:45,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:13:54,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1329.52490 ± 145.179
2025-09-14 16:13:54,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1263.1948), np.float32(1078.326), np.float32(1312.8783), np.float32(1310.5372), np.float32(1471.7354), np.float32(1266.889), np.float32(1302.1058), np.float32(1252.7603), np.float32(1660.6842), np.float32(1376.137)]
2025-09-14 16:13:54,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:13:54,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 1 minute, 43 seconds)
2025-09-14 16:16:12,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:16:21,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1541.83521 ± 260.945
2025-09-14 16:16:21,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1178.394), np.float32(1762.93), np.float32(1590.7708), np.float32(1249.8983), np.float32(1251.7888), np.float32(1722.1473), np.float32(1866.0999), np.float32(1317.7457), np.float32(1915.8496), np.float32(1562.7264)]
2025-09-14 16:16:21,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:16:21,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (1541.84) for latency 24
2025-09-14 16:16:21,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 59 minutes, 4 seconds)
2025-09-14 16:18:39,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:18:48,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1237.79626 ± 82.086
2025-09-14 16:18:48,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1296.5795), np.float32(1236.6959), np.float32(1111.0116), np.float32(1196.6238), np.float32(1262.7286), np.float32(1384.1189), np.float32(1222.0461), np.float32(1173.0348), np.float32(1346.2772), np.float32(1148.8452)]
2025-09-14 16:18:48,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:18:48,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 56 minutes, 31 seconds)
2025-09-14 16:21:06,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:21:15,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1623.30396 ± 514.586
2025-09-14 16:21:15,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1967.679), np.float32(482.48456), np.float32(2284.9524), np.float32(1227.6183), np.float32(1477.7054), np.float32(1956.856), np.float32(2302.0317), np.float32(1533.75), np.float32(1531.6935), np.float32(1468.2689)]
2025-09-14 16:21:15,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:21:15,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (1623.30) for latency 24
2025-09-14 16:21:15,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 54 minutes, 7 seconds)
2025-09-14 16:23:34,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:23:43,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1351.86414 ± 72.241
2025-09-14 16:23:43,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1160.4911), np.float32(1299.9841), np.float32(1428.2955), np.float32(1340.7224), np.float32(1364.4657), np.float32(1392.0056), np.float32(1398.5352), np.float32(1364.1141), np.float32(1402.4846), np.float32(1367.5443)]
2025-09-14 16:23:43,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:23:43,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 51 minutes, 45 seconds)
2025-09-14 16:26:01,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:26:10,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1312.80078 ± 260.072
2025-09-14 16:26:10,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1364.0266), np.float32(1643.9641), np.float32(1144.5636), np.float32(1363.956), np.float32(1206.9816), np.float32(692.4124), np.float32(1324.6294), np.float32(1547.7006), np.float32(1591.3934), np.float32(1248.3795)]
2025-09-14 16:26:10,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:26:10,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 49 minutes, 20 seconds)
2025-09-14 16:28:28,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:28:37,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1955.41248 ± 754.209
2025-09-14 16:28:37,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1453.3638), np.float32(1574.2395), np.float32(3365.6797), np.float32(1544.9639), np.float32(3180.2524), np.float32(1176.8427), np.float32(1245.7076), np.float32(2590.26), np.float32(1738.362), np.float32(1684.4524)]
2025-09-14 16:28:37,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:28:37,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (1955.41) for latency 24
2025-09-14 16:28:37,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 46 minutes, 59 seconds)
2025-09-14 16:30:56,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:31:05,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1519.04822 ± 383.955
2025-09-14 16:31:05,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(908.3802), np.float32(1195.4879), np.float32(1451.1809), np.float32(1574.1145), np.float32(2047.7969), np.float32(2074.7664), np.float32(1440.4802), np.float32(2001.0447), np.float32(1159.7113), np.float32(1337.52)]
2025-09-14 16:31:05,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:31:05,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 44 minutes, 37 seconds)
2025-09-14 16:33:23,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:33:32,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1463.79565 ± 138.892
2025-09-14 16:33:32,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1305.6083), np.float32(1456.8077), np.float32(1619.7334), np.float32(1389.214), np.float32(1392.9519), np.float32(1525.256), np.float32(1461.5189), np.float32(1442.5072), np.float32(1769.0472), np.float32(1275.312)]
2025-09-14 16:33:32,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:33:32,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 42 minutes, 6 seconds)
2025-09-14 16:35:50,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:35:59,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1610.45862 ± 523.323
2025-09-14 16:35:59,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1587.7849), np.float32(1391.7771), np.float32(2047.4008), np.float32(1257.3087), np.float32(3019.7983), np.float32(1198.2103), np.float32(1568.346), np.float32(1400.954), np.float32(1313.127), np.float32(1319.8794)]
2025-09-14 16:35:59,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:35:59,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 39 minutes, 37 seconds)
2025-09-14 16:38:18,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:38:27,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1587.67920 ± 358.707
2025-09-14 16:38:27,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1424.2726), np.float32(1884.5205), np.float32(1823.3143), np.float32(1418.6953), np.float32(1389.5535), np.float32(2349.2737), np.float32(910.72736), np.float32(1576.3344), np.float32(1489.6161), np.float32(1610.4846)]
2025-09-14 16:38:27,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:38:27,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 37 minutes, 16 seconds)
2025-09-14 16:40:46,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:40:55,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1866.07483 ± 452.635
2025-09-14 16:40:55,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2716.4666), np.float32(1322.9171), np.float32(2561.5874), np.float32(2113.6382), np.float32(1415.5741), np.float32(1494.2157), np.float32(1574.0939), np.float32(1786.3379), np.float32(1678.0299), np.float32(1997.886)]
2025-09-14 16:40:55,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:40:55,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 34 minutes, 49 seconds)
2025-09-14 16:43:13,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:43:22,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1415.79858 ± 187.641
2025-09-14 16:43:22,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1325.2737), np.float32(1331.5161), np.float32(1345.777), np.float32(1796.9785), np.float32(1441.2708), np.float32(1747.3048), np.float32(1336.5713), np.float32(1265.2366), np.float32(1205.3712), np.float32(1362.686)]
2025-09-14 16:43:22,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:43:22,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 32 minutes, 20 seconds)
2025-09-14 16:45:40,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:45:49,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1454.57104 ± 137.704
2025-09-14 16:45:49,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1326.1847), np.float32(1625.1802), np.float32(1546.8652), np.float32(1692.4347), np.float32(1481.5494), np.float32(1448.733), np.float32(1422.8884), np.float32(1185.1912), np.float32(1395.4927), np.float32(1421.1908)]
2025-09-14 16:45:49,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:45:49,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 29 minutes, 54 seconds)
2025-09-14 16:48:08,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:48:17,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1816.14624 ± 507.187
2025-09-14 16:48:17,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1182.2345), np.float32(1459.3119), np.float32(1362.4274), np.float32(2035.3247), np.float32(2179.8584), np.float32(1662.3147), np.float32(2066.765), np.float32(2857.3708), np.float32(2151.3145), np.float32(1204.5409)]
2025-09-14 16:48:17,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:48:17,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 27 minutes, 27 seconds)
2025-09-14 16:50:37,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:50:46,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1775.47498 ± 525.041
2025-09-14 16:50:46,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1595.307), np.float32(1823.5365), np.float32(2114.8413), np.float32(1732.633), np.float32(1288.7441), np.float32(1999.5415), np.float32(922.84607), np.float32(2227.1646), np.float32(2819.4849), np.float32(1230.65)]
2025-09-14 16:50:46,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:50:46,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 25 minutes, 11 seconds)
2025-09-14 16:53:04,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:53:13,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1757.91284 ± 446.660
2025-09-14 16:53:13,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1441.5416), np.float32(1598.6989), np.float32(1522.1741), np.float32(2678.5896), np.float32(2260.7998), np.float32(1422.7596), np.float32(1802.9485), np.float32(1256.8208), np.float32(2212.686), np.float32(1382.1096)]
2025-09-14 16:53:13,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:53:13,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 22 minutes, 47 seconds)
2025-09-14 16:55:32,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:55:41,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1690.97339 ± 379.438
2025-09-14 16:55:41,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1876.3171), np.float32(1304.6344), np.float32(2355.1763), np.float32(1498.4326), np.float32(1462.7108), np.float32(1717.687), np.float32(1306.608), np.float32(2347.6487), np.float32(1310.7133), np.float32(1729.8075)]
2025-09-14 16:55:41,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:55:41,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 20 minutes, 23 seconds)
2025-09-14 16:57:59,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:58:08,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1585.32422 ± 287.036
2025-09-14 16:58:08,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1206.27), np.float32(1547.7723), np.float32(1249.4722), np.float32(1343.2246), np.float32(2013.4625), np.float32(1921.2229), np.float32(1529.2227), np.float32(1344.2954), np.float32(1922.6841), np.float32(1775.6155)]
2025-09-14 16:58:08,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:58:08,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 17 minutes, 54 seconds)
2025-09-14 17:00:26,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:00:35,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1514.60669 ± 289.111
2025-09-14 17:00:35,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1608.1683), np.float32(1394.2642), np.float32(1261.0798), np.float32(1420.8943), np.float32(1268.2793), np.float32(1446.6238), np.float32(2249.135), np.float32(1749.9901), np.float32(1514.6675), np.float32(1232.9641)]
2025-09-14 17:00:35,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:00:35,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 15 minutes, 23 seconds)
2025-09-14 17:02:53,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:03:02,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1463.20544 ± 240.592
2025-09-14 17:03:02,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1281.18), np.float32(1272.9614), np.float32(1252.6798), np.float32(1651.462), np.float32(2032.4623), np.float32(1219.5029), np.float32(1589.901), np.float32(1305.5957), np.float32(1527.7963), np.float32(1498.514)]
2025-09-14 17:03:02,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:03:02,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 12 minutes, 33 seconds)
2025-09-14 17:05:21,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:05:30,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1499.55847 ± 331.461
2025-09-14 17:05:30,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2149.8484), np.float32(1358.7029), np.float32(957.53754), np.float32(1294.4878), np.float32(1234.2697), np.float32(1582.0365), np.float32(1466.8938), np.float32(1783.7864), np.float32(1854.9456), np.float32(1313.0769)]
2025-09-14 17:05:30,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:05:30,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 10 minutes, 4 seconds)
2025-09-14 17:07:47,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:07:56,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1819.21838 ± 480.776
2025-09-14 17:07:56,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2084.4358), np.float32(1493.6033), np.float32(1538.7947), np.float32(2068.8071), np.float32(1364.5249), np.float32(1940.5326), np.float32(1457.7117), np.float32(2922.847), np.float32(2092.6692), np.float32(1228.257)]
2025-09-14 17:07:56,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:07:56,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 7 minutes, 31 seconds)
2025-09-14 17:10:15,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:10:24,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1646.34497 ± 429.098
2025-09-14 17:10:24,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1440.5332), np.float32(1295.0326), np.float32(1445.0203), np.float32(1290.4655), np.float32(1678.8586), np.float32(1251.4183), np.float32(1787.0499), np.float32(1427.4752), np.float32(2617.0583), np.float32(2230.5364)]
2025-09-14 17:10:24,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:10:24,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 5 minutes, 5 seconds)
2025-09-14 17:12:43,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:12:52,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2121.35791 ± 684.284
2025-09-14 17:12:52,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1968.9263), np.float32(1398.4419), np.float32(2275.3503), np.float32(2985.855), np.float32(1295.0714), np.float32(3149.5962), np.float32(1460.4781), np.float32(3083.9688), np.float32(1635.2494), np.float32(1960.6414)]
2025-09-14 17:12:52,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:12:52,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (2121.36) for latency 24
2025-09-14 17:12:52,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 2 minutes, 45 seconds)
2025-09-14 17:15:10,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:15:19,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1818.02173 ± 794.615
2025-09-14 17:15:19,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(141.9334), np.float32(2614.141), np.float32(1419.5406), np.float32(1722.7166), np.float32(2000.9669), np.float32(3311.18), np.float32(1571.8888), np.float32(2298.5496), np.float32(1597.8495), np.float32(1501.4486)]
2025-09-14 17:15:19,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:15:19,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 24 seconds)
2025-09-14 17:17:38,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:17:47,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1649.90369 ± 299.374
2025-09-14 17:17:47,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1863.4376), np.float32(1081.13), np.float32(2269.1472), np.float32(1606.8456), np.float32(1608.7987), np.float32(1892.0806), np.float32(1683.913), np.float32(1537.6552), np.float32(1412.2837), np.float32(1543.7452)]
2025-09-14 17:17:47,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:17:47,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 57 minutes, 58 seconds)
2025-09-14 17:20:05,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:20:14,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1556.83667 ± 719.707
2025-09-14 17:20:14,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1744.8833), np.float32(-473.8422), np.float32(1899.6736), np.float32(1774.9004), np.float32(1315.149), np.float32(1676.5621), np.float32(2151.7583), np.float32(1712.6293), np.float32(2192.7869), np.float32(1573.8658)]
2025-09-14 17:20:14,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:20:14,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 55 minutes, 33 seconds)
2025-09-14 17:22:32,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:22:41,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1522.69080 ± 253.176
2025-09-14 17:22:41,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1374.7692), np.float32(1742.312), np.float32(1268.8755), np.float32(1271.4149), np.float32(1867.7908), np.float32(1672.714), np.float32(1167.2565), np.float32(1924.2948), np.float32(1404.4608), np.float32(1533.0195)]
2025-09-14 17:22:41,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:22:41,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 53 minutes)
2025-09-14 17:24:59,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:25:08,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1869.16577 ± 432.189
2025-09-14 17:25:08,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1912.2828), np.float32(1504.7635), np.float32(1169.7612), np.float32(2385.1628), np.float32(1888.5162), np.float32(1600.8347), np.float32(2560.7312), np.float32(1911.0803), np.float32(2340.3496), np.float32(1418.1757)]
2025-09-14 17:25:08,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:25:08,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 50 minutes, 29 seconds)
2025-09-14 17:27:31,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:27:40,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1641.80786 ± 324.698
2025-09-14 17:27:40,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1367.6945), np.float32(1266.7754), np.float32(2262.395), np.float32(2115.5906), np.float32(1324.6211), np.float32(1671.5573), np.float32(1414.0138), np.float32(1867.5272), np.float32(1485.0576), np.float32(1642.8468)]
2025-09-14 17:27:40,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:27:40,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 48 minutes, 37 seconds)
2025-09-14 17:30:02,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:30:11,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1648.05042 ± 318.534
2025-09-14 17:30:11,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1244.8145), np.float32(1297.8936), np.float32(1192.0963), np.float32(1680.8712), np.float32(1909.4084), np.float32(2044.7568), np.float32(1669.5928), np.float32(1563.0316), np.float32(1696.3677), np.float32(2181.6697)]
2025-09-14 17:30:11,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:30:11,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 46 minutes, 40 seconds)
2025-09-14 17:32:33,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:32:42,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2737.57812 ± 786.323
2025-09-14 17:32:42,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3496.6382), np.float32(1815.5566), np.float32(1427.2544), np.float32(3838.5562), np.float32(1956.3108), np.float32(3582.7183), np.float32(2556.4202), np.float32(2883.4248), np.float32(2467.9556), np.float32(3350.9463)]
2025-09-14 17:32:42,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:32:42,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (2737.58) for latency 24
2025-09-14 17:32:42,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 44 minutes, 42 seconds)
2025-09-14 17:34:57,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:35:06,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1874.50024 ± 466.111
2025-09-14 17:35:06,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1335.5051), np.float32(2022.611), np.float32(1278.6403), np.float32(1800.9326), np.float32(2414.4548), np.float32(2099.2288), np.float32(1292.743), np.float32(1688.0839), np.float32(2075.8015), np.float32(2737.001)]
2025-09-14 17:35:06,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:35:06,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 41 minutes, 50 seconds)
2025-09-14 17:37:28,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:37:38,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2173.68896 ± 827.269
2025-09-14 17:37:38,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2593.9666), np.float32(1342.4274), np.float32(1166.3601), np.float32(2028.752), np.float32(3504.5476), np.float32(2265.7297), np.float32(1967.6377), np.float32(1479.7811), np.float32(3717.4785), np.float32(1670.209)]
2025-09-14 17:37:38,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:37:38,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 39 minutes, 58 seconds)
2025-09-14 17:40:16,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:40:25,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2000.29749 ± 570.814
2025-09-14 17:40:25,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1352.6605), np.float32(1418.2162), np.float32(3016.817), np.float32(2057.489), np.float32(1599.3534), np.float32(1616.3987), np.float32(2813.4507), np.float32(1864.8779), np.float32(1662.2042), np.float32(2601.5051)]
2025-09-14 17:40:25,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:40:25,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 39 minutes, 26 seconds)
2025-09-14 17:43:22,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:43:32,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1712.89709 ± 304.998
2025-09-14 17:43:32,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1846.469), np.float32(1563.3578), np.float32(1359.9083), np.float32(1586.2375), np.float32(1553.8008), np.float32(2219.7168), np.float32(1672.9266), np.float32(1908.0989), np.float32(1246.9036), np.float32(2171.5515)]
2025-09-14 17:43:32,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:43:32,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 41 minutes, 25 seconds)
2025-09-14 17:46:35,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:46:45,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1934.85059 ± 424.142
2025-09-14 17:46:45,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1895.6437), np.float32(1431.4984), np.float32(1555.725), np.float32(1568.7803), np.float32(2202.8735), np.float32(2673.38), np.float32(2525.4353), np.float32(2066.6902), np.float32(2018.828), np.float32(1409.652)]
2025-09-14 17:46:45,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:46:45,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 43 minutes, 59 seconds)
2025-09-14 17:49:48,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:49:58,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1882.91797 ± 546.257
2025-09-14 17:49:58,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1579.9219), np.float32(1328.3285), np.float32(2782.2358), np.float32(1573.3235), np.float32(2946.759), np.float32(1259.9725), np.float32(1661.4852), np.float32(1637.5304), np.float32(2092.0623), np.float32(1967.5596)]
2025-09-14 17:49:58,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:49:58,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 47 minutes, 2 seconds)
2025-09-14 17:53:01,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:53:11,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2125.75415 ± 733.717
2025-09-14 17:53:11,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1377.9652), np.float32(2176.8325), np.float32(1828.0615), np.float32(1604.1593), np.float32(2059.4197), np.float32(1473.8451), np.float32(3701.153), np.float32(2122.1763), np.float32(1641.415), np.float32(3272.513)]
2025-09-14 17:53:11,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:53:11,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 48 minutes, 48 seconds)
2025-09-14 17:56:13,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:56:24,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1829.30469 ± 665.878
2025-09-14 17:56:24,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3614.8052), np.float32(1784.5161), np.float32(1402.8442), np.float32(1686.2072), np.float32(1218.7799), np.float32(1373.7452), np.float32(1653.8491), np.float32(2102.913), np.float32(1318.4293), np.float32(2136.957)]
2025-09-14 17:56:24,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:56:24,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 48 minutes, 39 seconds)
2025-09-14 17:59:27,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:59:37,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1778.41443 ± 445.919
2025-09-14 17:59:37,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1593.104), np.float32(2936.1406), np.float32(1582.2622), np.float32(1326.4398), np.float32(1472.5553), np.float32(1906.75), np.float32(1744.2172), np.float32(1562.1475), np.float32(1508.0272), np.float32(2152.501)]
2025-09-14 17:59:37,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:59:37,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 46 minutes, 9 seconds)
2025-09-14 18:02:39,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:02:49,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2507.21631 ± 675.021
2025-09-14 18:02:49,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1862.9106), np.float32(1689.8907), np.float32(2129.2815), np.float32(3177.7922), np.float32(2772.2598), np.float32(2152.0815), np.float32(3295.2065), np.float32(3808.1077), np.float32(2208.4668), np.float32(1976.1653)]
2025-09-14 18:02:49,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:02:49,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 42 minutes, 50 seconds)
2025-09-14 18:05:52,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:06:02,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2016.40918 ± 640.025
2025-09-14 18:06:02,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1397.2615), np.float32(1731.6691), np.float32(1610.5502), np.float32(1485.8229), np.float32(1803.5936), np.float32(2100.2148), np.float32(3079.283), np.float32(2047.5209), np.float32(1554.2128), np.float32(3353.9626)]
2025-09-14 18:06:02,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:06:02,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 39 minutes, 34 seconds)
2025-09-14 18:09:05,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:09:15,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2904.35352 ± 1053.815
2025-09-14 18:09:15,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1467.3217), np.float32(1360.5642), np.float32(3874.8538), np.float32(1958.0892), np.float32(1835.3127), np.float32(4068.435), np.float32(3179.4495), np.float32(3837.864), np.float32(3795.8896), np.float32(3665.7534)]
2025-09-14 18:09:15,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:09:15,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (2904.35) for latency 24
2025-09-14 18:09:15,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 36 minutes, 24 seconds)
2025-09-14 18:12:18,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:12:28,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2283.65503 ± 683.465
2025-09-14 18:12:28,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2811.0989), np.float32(2288.7847), np.float32(1467.2933), np.float32(1648.629), np.float32(3701.3613), np.float32(1282.586), np.float32(2065.2607), np.float32(2715.3535), np.float32(2294.8667), np.float32(2561.3157)]
2025-09-14 18:12:28,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:12:28,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 33 minutes, 15 seconds)
2025-09-14 18:15:32,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:15:42,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1958.45056 ± 658.936
2025-09-14 18:15:42,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2270.6719), np.float32(1326.4622), np.float32(1465.2942), np.float32(1791.9844), np.float32(1489.6626), np.float32(2130.6562), np.float32(3417.404), np.float32(1493.6324), np.float32(1413.6477), np.float32(2785.0896)]
2025-09-14 18:15:42,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:15:42,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 30 minutes, 3 seconds)
2025-09-14 18:18:44,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:18:55,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2730.57251 ± 1045.998
2025-09-14 18:18:55,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1500.0846), np.float32(3794.308), np.float32(3294.812), np.float32(3890.9417), np.float32(2252.784), np.float32(1294.9727), np.float32(2198.1106), np.float32(3818.008), np.float32(1435.9648), np.float32(3825.7383)]
2025-09-14 18:18:55,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:18:55,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 26 minutes, 53 seconds)
2025-09-14 18:21:57,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:22:07,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2846.65967 ± 1001.997
2025-09-14 18:22:07,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2001.632), np.float32(2246.2717), np.float32(3944.0134), np.float32(3706.785), np.float32(4260.325), np.float32(2029.3794), np.float32(1479.1008), np.float32(4197.2847), np.float32(2481.0867), np.float32(2120.7178)]
2025-09-14 18:22:07,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:22:07,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 23 minutes, 37 seconds)
2025-09-14 18:25:09,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:25:20,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2214.48193 ± 1055.679
2025-09-14 18:25:20,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2497.9714), np.float32(1256.2341), np.float32(3106.4185), np.float32(1755.0674), np.float32(4097.9814), np.float32(1182.9021), np.float32(1571.6954), np.float32(3878.5989), np.float32(1485.7673), np.float32(1312.1827)]
2025-09-14 18:25:20,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:25:20,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 20 minutes, 24 seconds)
2025-09-14 18:28:22,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:28:32,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2525.78735 ± 814.674
2025-09-14 18:28:32,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2821.3943), np.float32(3645.7598), np.float32(3811.6987), np.float32(2484.526), np.float32(1348.6412), np.float32(2338.6936), np.float32(1273.8046), np.float32(2156.7468), np.float32(3169.3374), np.float32(2207.269)]
2025-09-14 18:28:32,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:28:32,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 17 minutes, 4 seconds)
2025-09-14 18:31:34,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:31:43,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3081.42310 ± 938.604
2025-09-14 18:31:43,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4012.8308), np.float32(2941.16), np.float32(3748.0789), np.float32(4012.2134), np.float32(3363.1626), np.float32(3041.9268), np.float32(1238.0082), np.float32(1423.0374), np.float32(3556.513), np.float32(3477.299)]
2025-09-14 18:31:43,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:31:43,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3081.42) for latency 24
2025-09-14 18:31:43,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 13 minutes, 44 seconds)
2025-09-14 18:34:28,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:34:37,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2425.41357 ± 743.476
2025-09-14 18:34:37,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1498.602), np.float32(1777.0488), np.float32(2616.303), np.float32(2698.177), np.float32(3625.2231), np.float32(2814.9006), np.float32(1527.533), np.float32(1900.1049), np.float32(2185.004), np.float32(3611.2407)]
2025-09-14 18:34:37,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:34:37,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 9 minutes, 5 seconds)
2025-09-14 18:37:16,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:37:25,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3340.01221 ± 602.773
2025-09-14 18:37:25,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3631.6426), np.float32(3634.2725), np.float32(3558.1863), np.float32(2693.9849), np.float32(3675.4966), np.float32(3650.5305), np.float32(3545.3342), np.float32(3687.8113), np.float32(1738.467), np.float32(3584.395)]
2025-09-14 18:37:25,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:37:25,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3340.01) for latency 24
2025-09-14 18:37:25,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 4 minutes, 17 seconds)
2025-09-14 18:39:53,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:40:02,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2198.35742 ± 555.516
2025-09-14 18:40:02,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2794.0745), np.float32(2438.4587), np.float32(1565.8171), np.float32(3191.794), np.float32(1613.3326), np.float32(2400.0093), np.float32(1916.2384), np.float32(1777.7803), np.float32(1576.9476), np.float32(2709.1228)]
2025-09-14 18:40:02,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:40:02,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 58 minutes, 48 seconds)
2025-09-14 18:42:18,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:42:27,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2356.47095 ± 1108.740
2025-09-14 18:42:27,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1194.1403), np.float32(1470.3431), np.float32(4322.448), np.float32(1420.79), np.float32(1707.2582), np.float32(1658.0884), np.float32(1665.843), np.float32(4071.5454), np.float32(2719.6887), np.float32(3334.5618)]
2025-09-14 18:42:27,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:42:27,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 52 minutes, 53 seconds)
2025-09-14 18:44:42,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:44:51,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3004.65796 ± 1033.168
2025-09-14 18:44:51,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2405.9634), np.float32(2669.6426), np.float32(4372.542), np.float32(3429.1694), np.float32(1513.2227), np.float32(3816.8057), np.float32(4254.2783), np.float32(2445.7234), np.float32(1341.6229), np.float32(3797.6096)]
2025-09-14 18:44:51,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:44:51,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 47 minutes, 15 seconds)
2025-09-14 18:47:00,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:47:09,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2203.37183 ± 699.754
2025-09-14 18:47:09,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3187.9436), np.float32(1408.7018), np.float32(2084.857), np.float32(3329.9622), np.float32(1778.3378), np.float32(2651.895), np.float32(1839.6096), np.float32(1335.459), np.float32(2818.7842), np.float32(1598.169)]
2025-09-14 18:47:09,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:47:09,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 42 minutes, 37 seconds)
2025-09-14 18:49:16,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:49:25,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2992.70752 ± 1006.666
2025-09-14 18:49:25,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4099.726), np.float32(2271.185), np.float32(2478.5608), np.float32(1854.9329), np.float32(4160.052), np.float32(1903.0968), np.float32(2673.8154), np.float32(1989.0588), np.float32(4156.9297), np.float32(4339.7207)]
2025-09-14 18:49:25,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:49:25,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 38 minutes, 23 seconds)
2025-09-14 18:51:32,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:51:41,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3167.14014 ± 835.627
2025-09-14 18:51:41,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2275.0535), np.float32(3317.3494), np.float32(3477.7175), np.float32(3245.8835), np.float32(1812.8772), np.float32(3963.507), np.float32(4037.763), np.float32(3965.7373), np.float32(3764.9617), np.float32(1810.5532)]
2025-09-14 18:51:41,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:51:41,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 34 minutes, 57 seconds)
2025-09-14 18:53:48,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:53:57,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2061.47852 ± 735.878
2025-09-14 18:53:57,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2793.3452), np.float32(1754.9812), np.float32(2103.709), np.float32(1473.0016), np.float32(1373.7548), np.float32(2041.1722), np.float32(1426.691), np.float32(1915.5406), np.float32(1806.9509), np.float32(3925.6387)]
2025-09-14 18:53:57,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:53:57,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 32 minutes, 13 seconds)
2025-09-14 18:56:05,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:56:14,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2850.34131 ± 872.849
2025-09-14 18:56:14,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2382.2131), np.float32(2701.568), np.float32(3110.0378), np.float32(2018.3236), np.float32(4294.9487), np.float32(1537.3893), np.float32(2352.567), np.float32(4405.459), np.float32(2585.1392), np.float32(3115.7683)]
2025-09-14 18:56:14,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:56:14,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 29 minutes, 34 seconds)
2025-09-14 18:58:22,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:58:31,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3656.18237 ± 904.998
2025-09-14 18:58:31,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4415.532), np.float32(3795.11), np.float32(4277.2793), np.float32(3227.6956), np.float32(2261.0898), np.float32(3935.0806), np.float32(4512.392), np.float32(1745.9514), np.float32(4061.084), np.float32(4330.6094)]
2025-09-14 18:58:31,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:58:31,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3656.18) for latency 24
2025-09-14 18:58:31,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 27 minutes, 15 seconds)
2025-09-14 19:00:38,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:00:47,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3769.38403 ± 621.369
2025-09-14 19:00:47,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3592.6396), np.float32(3749.1377), np.float32(4167.739), np.float32(4237.2134), np.float32(3391.341), np.float32(4008.4124), np.float32(4001.603), np.float32(2109.469), np.float32(4077.201), np.float32(4359.0874)]
2025-09-14 19:00:47,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:00:47,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3769.38) for latency 24
2025-09-14 19:00:47,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 25 minutes)
2025-09-14 19:02:54,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:03:03,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2513.71094 ± 717.963
2025-09-14 19:03:03,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3657.234), np.float32(2503.1748), np.float32(2825.9033), np.float32(2261.1042), np.float32(1360.4855), np.float32(3411.0356), np.float32(2025.0685), np.float32(1557.2397), np.float32(2402.0833), np.float32(3133.7783)]
2025-09-14 19:03:03,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:03:03,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 22 minutes, 44 seconds)
2025-09-14 19:05:11,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:05:20,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2054.69434 ± 598.680
2025-09-14 19:05:20,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1973.0779), np.float32(1537.3723), np.float32(1718.8164), np.float32(1532.6643), np.float32(1588.3726), np.float32(2344.2368), np.float32(1933.2771), np.float32(3546.3127), np.float32(2604.0852), np.float32(1768.7281)]
2025-09-14 19:05:20,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:05:20,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 20 minutes, 29 seconds)
2025-09-14 19:07:28,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:07:37,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2593.01367 ± 942.391
2025-09-14 19:07:37,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2225.275), np.float32(3918.0981), np.float32(2334.875), np.float32(1557.157), np.float32(1865.4404), np.float32(4047.3633), np.float32(1899.957), np.float32(2723.0256), np.float32(1518.9523), np.float32(3839.9924)]
2025-09-14 19:07:37,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:07:37,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 18 minutes, 12 seconds)
2025-09-14 19:09:44,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:09:53,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4158.25293 ± 382.171
2025-09-14 19:09:53,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3079.2397), np.float32(4385.054), np.float32(4328.868), np.float32(4339.7812), np.float32(4203.8564), np.float32(4375.894), np.float32(4142.6904), np.float32(4173.299), np.float32(4048.128), np.float32(4505.7188)]
2025-09-14 19:09:53,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:09:53,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (4158.25) for latency 24
2025-09-14 19:09:53,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 55 seconds)
2025-09-14 19:12:01,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:12:10,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3845.70557 ± 956.219
2025-09-14 19:12:10,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4740.4404), np.float32(1750.1405), np.float32(2630.9385), np.float32(3025.2085), np.float32(4379.4014), np.float32(4458.488), np.float32(4181.465), np.float32(4459.918), np.float32(4454.397), np.float32(4376.6577)]
2025-09-14 19:12:10,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:12:10,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 39 seconds)
2025-09-14 19:14:17,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:14:26,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3152.28955 ± 878.788
2025-09-14 19:14:26,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3384.8367), np.float32(2389.146), np.float32(4394.327), np.float32(2168.7097), np.float32(4359.7637), np.float32(4050.0935), np.float32(2172.8792), np.float32(2953.7942), np.float32(3565.879), np.float32(2083.4646)]
2025-09-14 19:14:26,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:14:26,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 22 seconds)
2025-09-14 19:16:33,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:16:42,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4055.69531 ± 1039.523
2025-09-14 19:16:42,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4703.1777), np.float32(4730.044), np.float32(1265.1835), np.float32(4600.5723), np.float32(3962.507), np.float32(4460.929), np.float32(4538.9766), np.float32(3140.0613), np.float32(4748.5347), np.float32(4406.9644)]
2025-09-14 19:16:42,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:16:42,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 9 minutes, 5 seconds)
2025-09-14 19:18:49,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:18:58,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3555.75342 ± 1328.039
2025-09-14 19:18:58,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4413.7104), np.float32(4563.956), np.float32(2856.9768), np.float32(1354.7797), np.float32(4762.4316), np.float32(4400.976), np.float32(2735.8396), np.float32(1225.9338), np.float32(4552.729), np.float32(4690.2046)]
2025-09-14 19:18:58,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:18:58,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 48 seconds)
2025-09-14 19:21:05,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:21:14,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3552.46411 ± 1273.282
2025-09-14 19:21:14,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4430.252), np.float32(4521.69), np.float32(1652.7836), np.float32(4125.225), np.float32(1797.8206), np.float32(4323.59), np.float32(4529.351), np.float32(1409.4576), np.float32(4461.6147), np.float32(4272.859)]
2025-09-14 19:21:14,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:21:14,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 32 seconds)
2025-09-14 19:23:21,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:23:30,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3918.32275 ± 713.135
2025-09-14 19:23:30,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4235.9204), np.float32(3891.049), np.float32(4592.513), np.float32(4106.4556), np.float32(4114.487), np.float32(4284.328), np.float32(4360.664), np.float32(4486.541), np.float32(2344.3677), np.float32(2766.9023)]
2025-09-14 19:23:30,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:23:30,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 15 seconds)
2025-09-14 19:25:37,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:25:46,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3612.95972 ± 968.244
2025-09-14 19:25:46,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2008.3949), np.float32(4845.466), np.float32(3206.5107), np.float32(4398.6646), np.float32(2775.835), np.float32(2325.8762), np.float32(4340.74), np.float32(4296.603), np.float32(3296.6492), np.float32(4634.861)]
2025-09-14 19:25:46,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:25:46,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1251 [DEBUG]: Training session finished
