2025-09-14 12:29:49,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.150-delay_15
2025-09-14 12:29:49,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.150-delay_15
2025-09-14 12:29:49,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'15': <latency_env.delayed_mdp.ConstantDelay object at 0x7fdb1cf3a6f0>}
2025-09-14 12:29:49,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 12:29:49,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 12:29:49,945 baseline-bpql-noisepromille150-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=107, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 12:29:49,946 baseline-bpql-noisepromille150-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 12:29:51,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 12:29:51,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 12:35:32,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:35:39,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: -285.25958 ± 44.621
2025-09-14 12:35:39,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-295.8389), np.float32(-289.15884), np.float32(-281.50333), np.float32(-254.94206), np.float32(-313.0077), np.float32(-293.82175), np.float32(-203.48427), np.float32(-370.09253), np.float32(-231.05743), np.float32(-319.68912)]
2025-09-14 12:35:39,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:35:39,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (-285.26) for latency 15
2025-09-14 12:35:39,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 9 hours, 35 minutes, 5 seconds)
2025-09-14 12:41:35,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:41:43,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: -220.44824 ± 38.343
2025-09-14 12:41:43,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-248.28935), np.float32(-194.06396), np.float32(-155.22658), np.float32(-189.44766), np.float32(-200.45222), np.float32(-232.76875), np.float32(-205.95317), np.float32(-219.35359), np.float32(-291.447), np.float32(-267.48013)]
2025-09-14 12:41:43,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:41:43,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (-220.45) for latency 15
2025-09-14 12:41:43,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 9 hours, 41 minutes, 35 seconds)
2025-09-14 12:47:29,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:47:37,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: -104.54156 ± 68.532
2025-09-14 12:47:37,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-20.371273), np.float32(-0.6413129), np.float32(-135.49113), np.float32(-89.003174), np.float32(-10.785435), np.float32(-127.760544), np.float32(-197.00473), np.float32(-194.78008), np.float32(-139.04282), np.float32(-130.5352)]
2025-09-14 12:47:37,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:47:37,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (-104.54) for latency 15
2025-09-14 12:47:37,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 9 hours, 34 minutes, 25 seconds)
2025-09-14 12:53:36,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:53:43,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: -31.37613 ± 63.666
2025-09-14 12:53:43,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-80.749), np.float32(-0.94991934), np.float32(120.99522), np.float32(-48.80093), np.float32(3.4773784), np.float32(-99.13548), np.float32(-38.28732), np.float32(-94.717026), np.float32(-80.81435), np.float32(5.2201533)]
2025-09-14 12:53:43,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:53:43,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (-31.38) for latency 15
2025-09-14 12:53:43,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 9 hours, 33 minutes, 6 seconds)
2025-09-14 12:59:53,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:00:00,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 68.34932 ± 129.875
2025-09-14 13:00:00,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-73.5533), np.float32(-84.189705), np.float32(203.72661), np.float32(12.720286), np.float32(-55.482063), np.float32(195.6769), np.float32(15.568345), np.float32(85.64846), np.float32(52.418358), np.float32(330.9593)]
2025-09-14 13:00:00,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:00:00,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (68.35) for latency 15
2025-09-14 13:00:00,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 9 hours, 32 minutes, 58 seconds)
2025-09-14 13:05:52,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:05:59,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 235.55859 ± 149.156
2025-09-14 13:05:59,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(85.09889), np.float32(284.55438), np.float32(-16.71524), np.float32(70.89539), np.float32(140.87146), np.float32(296.25272), np.float32(326.30585), np.float32(292.72308), np.float32(456.85355), np.float32(418.74606)]
2025-09-14 13:05:59,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:05:59,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (235.56) for latency 15
2025-09-14 13:05:59,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 9 hours, 30 minutes, 18 seconds)
2025-09-14 13:12:04,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:12:12,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 324.84149 ± 80.982
2025-09-14 13:12:12,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(313.7252), np.float32(337.13074), np.float32(270.857), np.float32(222.5155), np.float32(434.65808), np.float32(456.2395), np.float32(362.78308), np.float32(329.0903), np.float32(341.05756), np.float32(180.35762)]
2025-09-14 13:12:12,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:12:12,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (324.84) for latency 15
2025-09-14 13:12:12,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 9 hours, 26 minutes, 56 seconds)
2025-09-14 13:18:54,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:19:01,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 424.80731 ± 64.000
2025-09-14 13:19:01,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(341.8712), np.float32(357.01685), np.float32(481.4065), np.float32(301.98102), np.float32(458.20493), np.float32(459.7028), np.float32(464.44437), np.float32(425.18576), np.float32(452.76102), np.float32(505.49902)]
2025-09-14 13:19:01,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:19:01,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (424.81) for latency 15
2025-09-14 13:19:01,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 9 hours, 37 minutes, 53 seconds)
2025-09-14 13:25:23,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:25:30,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 518.95526 ± 68.970
2025-09-14 13:25:30,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(554.93994), np.float32(545.9609), np.float32(564.7915), np.float32(532.2752), np.float32(612.15845), np.float32(420.413), np.float32(591.3136), np.float32(525.2569), np.float32(405.56805), np.float32(436.87546)]
2025-09-14 13:25:30,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:25:30,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (518.96) for latency 15
2025-09-14 13:25:30,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 9 hours, 38 minutes, 19 seconds)
2025-09-14 13:31:43,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:31:50,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 572.01697 ± 100.623
2025-09-14 13:31:50,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(675.79205), np.float32(471.10254), np.float32(699.12646), np.float32(456.47913), np.float32(671.3738), np.float32(667.027), np.float32(463.13898), np.float32(519.1418), np.float32(458.67422), np.float32(638.3137)]
2025-09-14 13:31:50,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:31:50,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (572.02) for latency 15
2025-09-14 13:31:50,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 9 hours, 33 minutes, 3 seconds)
2025-09-14 13:38:24,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:38:31,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 833.53076 ± 127.355
2025-09-14 13:38:31,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(896.6133), np.float32(834.17236), np.float32(882.99194), np.float32(910.83575), np.float32(1105.2693), np.float32(744.65045), np.float32(789.3061), np.float32(627.5168), np.float32(682.3265), np.float32(861.625)]
2025-09-14 13:38:31,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:38:31,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (833.53) for latency 15
2025-09-14 13:38:31,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 9 hours, 39 minutes, 3 seconds)
2025-09-14 13:44:23,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:44:31,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 793.60583 ± 96.214
2025-09-14 13:44:31,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(743.00134), np.float32(958.9219), np.float32(710.7442), np.float32(798.0655), np.float32(903.0457), np.float32(732.91174), np.float32(724.99194), np.float32(890.2705), np.float32(836.26276), np.float32(637.84283)]
2025-09-14 13:44:31,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:44:31,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 9 hours, 28 minutes, 55 seconds)
2025-09-14 13:50:26,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:50:33,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 783.13947 ± 193.329
2025-09-14 13:50:33,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(707.7982), np.float32(756.5495), np.float32(962.3409), np.float32(708.7018), np.float32(840.9627), np.float32(923.8153), np.float32(907.48895), np.float32(853.92456), np.float32(907.0289), np.float32(262.7838)]
2025-09-14 13:50:33,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:50:33,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 9 hours, 8 minutes, 47 seconds)
2025-09-14 13:55:56,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:56:03,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 930.96649 ± 137.366
2025-09-14 13:56:03,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(914.372), np.float32(1172.5059), np.float32(820.85126), np.float32(777.4226), np.float32(867.02783), np.float32(888.6876), np.float32(1110.2659), np.float32(1062.1149), np.float32(735.3939), np.float32(961.0238)]
2025-09-14 13:56:03,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:56:03,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (930.97) for latency 15
2025-09-14 13:56:03,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 8 hours, 45 minutes, 32 seconds)
2025-09-14 14:01:51,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:01:59,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 751.76404 ± 389.477
2025-09-14 14:01:59,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1115.1721), np.float32(-383.52133), np.float32(762.7328), np.float32(774.78094), np.float32(890.31946), np.float32(902.1527), np.float32(820.51), np.float32(861.90137), np.float32(896.72614), np.float32(876.8667)]
2025-09-14 14:01:59,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:01:59,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 8 hours, 32 minutes, 27 seconds)
2025-09-14 14:08:08,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:08:16,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 643.38141 ± 389.472
2025-09-14 14:08:16,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(747.28595), np.float32(803.06433), np.float32(890.7319), np.float32(664.6551), np.float32(34.69836), np.float32(909.27246), np.float32(784.4439), np.float32(953.9628), np.float32(-249.82883), np.float32(895.52795)]
2025-09-14 14:08:16,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:08:16,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 8 hours, 19 minutes, 44 seconds)
2025-09-14 14:14:15,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:14:23,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 930.78485 ± 88.858
2025-09-14 14:14:23,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1127.365), np.float32(878.4842), np.float32(808.7954), np.float32(846.99396), np.float32(863.3541), np.float32(899.30304), np.float32(1020.8551), np.float32(939.9335), np.float32(967.3039), np.float32(955.46063)]
2025-09-14 14:14:23,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:14:23,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 8 hours, 15 minutes, 42 seconds)
2025-09-14 14:20:10,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:20:17,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 875.34802 ± 205.793
2025-09-14 14:20:17,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(740.1322), np.float32(779.3646), np.float32(960.7957), np.float32(487.40875), np.float32(814.2717), np.float32(1244.9542), np.float32(1069.8706), np.float32(886.555), np.float32(1057.8662), np.float32(712.2619)]
2025-09-14 14:20:17,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:20:17,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 8 hours, 7 minutes, 37 seconds)
2025-09-14 14:26:06,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:26:13,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 970.52185 ± 113.376
2025-09-14 14:26:13,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1094.8446), np.float32(863.52014), np.float32(1011.5849), np.float32(1156.9038), np.float32(987.83813), np.float32(902.1546), np.float32(992.5805), np.float32(1066.3351), np.float32(846.5576), np.float32(782.8989)]
2025-09-14 14:26:13,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:26:13,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (970.52) for latency 15
2025-09-14 14:26:13,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 8 hours, 8 minutes, 41 seconds)
2025-09-14 14:32:18,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:32:26,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 829.87109 ± 163.347
2025-09-14 14:32:26,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(973.52905), np.float32(890.935), np.float32(643.6086), np.float32(1013.96466), np.float32(644.91266), np.float32(968.5649), np.float32(640.42664), np.float32(810.852), np.float32(652.4714), np.float32(1059.4467)]
2025-09-14 14:32:26,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:32:26,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 8 hours, 7 minutes, 10 seconds)
2025-09-14 14:38:34,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:38:42,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 995.69324 ± 103.473
2025-09-14 14:38:42,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1120.1864), np.float32(942.6317), np.float32(959.6448), np.float32(977.28235), np.float32(837.8137), np.float32(1007.7878), np.float32(993.2773), np.float32(1226.373), np.float32(990.002), np.float32(901.93365)]
2025-09-14 14:38:42,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:38:42,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (995.69) for latency 15
2025-09-14 14:38:42,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 8 hours, 51 seconds)
2025-09-14 14:45:05,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:45:12,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 994.31573 ± 104.259
2025-09-14 14:45:12,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(771.4025), np.float32(1096.9167), np.float32(965.5252), np.float32(1032.6528), np.float32(976.5179), np.float32(1008.2182), np.float32(868.91833), np.float32(1156.2622), np.float32(1051.6984), np.float32(1015.04474)]
2025-09-14 14:45:12,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:45:12,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 8 hours, 53 seconds)
2025-09-14 14:51:28,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:51:36,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 982.95447 ± 119.885
2025-09-14 14:51:36,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1045.1561), np.float32(1008.2694), np.float32(948.1534), np.float32(1125.2096), np.float32(823.4223), np.float32(766.6721), np.float32(927.1048), np.float32(992.76434), np.float32(1187.4572), np.float32(1005.33575)]
2025-09-14 14:51:36,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:51:36,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 8 hours, 2 minutes, 3 seconds)
2025-09-14 14:57:15,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:57:22,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1005.11798 ± 99.455
2025-09-14 14:57:22,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(885.3719), np.float32(1037.725), np.float32(965.2479), np.float32(867.0594), np.float32(897.0121), np.float32(1158.8262), np.float32(1053.6685), np.float32(1090.505), np.float32(1132.2682), np.float32(963.4947)]
2025-09-14 14:57:22,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:57:22,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1005.12) for latency 15
2025-09-14 14:57:22,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 7 hours, 53 minutes, 28 seconds)
2025-09-14 15:03:13,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:03:20,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1177.03931 ± 235.278
2025-09-14 15:03:20,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1049.5762), np.float32(1016.06647), np.float32(1573.9753), np.float32(1659.9661), np.float32(1084.2323), np.float32(1048.1013), np.float32(977.16876), np.float32(1055.0625), np.float32(1293.502), np.float32(1012.7434)]
2025-09-14 15:03:20,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:03:20,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1177.04) for latency 15
2025-09-14 15:03:20,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 7 hours, 43 minutes, 33 seconds)
2025-09-14 15:09:22,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:09:29,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1138.11658 ± 215.818
2025-09-14 15:09:29,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1089.3281), np.float32(1009.1433), np.float32(1216.2174), np.float32(815.1591), np.float32(1326.1438), np.float32(1135.9515), np.float32(990.0739), np.float32(1154.7382), np.float32(997.3752), np.float32(1647.0364)]
2025-09-14 15:09:29,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:09:29,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 7 hours, 35 minutes, 40 seconds)
2025-09-14 15:15:34,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:15:41,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1103.19397 ± 73.816
2025-09-14 15:15:41,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1252.4911), np.float32(1149.17), np.float32(1052.076), np.float32(1124.6478), np.float32(1069.2694), np.float32(1047.7284), np.float32(1157.6603), np.float32(963.0688), np.float32(1099.5947), np.float32(1116.2334)]
2025-09-14 15:15:41,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:15:41,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 7 hours, 24 minutes, 58 seconds)
2025-09-14 15:21:45,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:21:53,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1060.16528 ± 160.638
2025-09-14 15:21:53,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1007.2752), np.float32(846.2455), np.float32(1193.8408), np.float32(1205.1501), np.float32(1284.4795), np.float32(1308.069), np.float32(935.129), np.float32(929.3222), np.float32(950.306), np.float32(941.8345)]
2025-09-14 15:21:53,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:21:53,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 7 hours, 16 minutes, 10 seconds)
2025-09-14 15:27:49,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:27:57,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1147.88403 ± 145.525
2025-09-14 15:27:57,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1515.5969), np.float32(1176.538), np.float32(1071.0406), np.float32(933.87616), np.float32(1108.1458), np.float32(1102.0652), np.float32(1172.4487), np.float32(1243.9471), np.float32(1059.6302), np.float32(1095.5524)]
2025-09-14 15:27:57,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:27:57,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 7 hours, 14 minutes, 8 seconds)
2025-09-14 15:33:45,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:33:52,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1121.04138 ± 142.475
2025-09-14 15:33:52,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(974.05176), np.float32(1097.2087), np.float32(1169.0635), np.float32(1123.3876), np.float32(884.812), np.float32(1108.0063), np.float32(966.0168), np.float32(1361.5011), np.float32(1274.7819), np.float32(1251.584)]
2025-09-14 15:33:52,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:33:52,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 7 hours, 7 minutes, 33 seconds)
2025-09-14 15:39:24,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:39:32,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1293.36597 ± 202.274
2025-09-14 15:39:32,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1217.768), np.float32(1483.4548), np.float32(1138.7003), np.float32(1287.1174), np.float32(1177.8705), np.float32(1254.2214), np.float32(1786.4987), np.float32(1385.3339), np.float32(1062.0713), np.float32(1140.6237)]
2025-09-14 15:39:32,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:39:32,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1293.37) for latency 15
2025-09-14 15:39:32,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 6 hours, 54 minutes, 39 seconds)
2025-09-14 15:45:18,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:45:26,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1228.52026 ± 119.489
2025-09-14 15:45:26,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1336.0833), np.float32(1133.4674), np.float32(1257.054), np.float32(1213.876), np.float32(1250.6998), np.float32(1343.6006), np.float32(1038.043), np.float32(1309.2657), np.float32(1023.86206), np.float32(1379.2512)]
2025-09-14 15:45:26,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:45:26,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 6 hours, 44 minutes, 31 seconds)
2025-09-14 15:51:15,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:51:23,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1092.61450 ± 262.946
2025-09-14 15:51:23,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1015.8469), np.float32(974.83765), np.float32(1106.0437), np.float32(1196.0363), np.float32(1620.5618), np.float32(602.24884), np.float32(1080.4824), np.float32(1405.2227), np.float32(1032.4774), np.float32(892.3871)]
2025-09-14 15:51:23,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:51:23,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 6 hours, 35 minutes, 15 seconds)
2025-09-14 15:57:39,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:57:46,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1311.38513 ± 389.960
2025-09-14 15:57:46,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1445.9257), np.float32(1145.5198), np.float32(1824.0579), np.float32(1612.2745), np.float32(1357.0282), np.float32(1860.4695), np.float32(1108.9802), np.float32(1273.9598), np.float32(997.01105), np.float32(488.62555)]
2025-09-14 15:57:46,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:57:46,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1311.39) for latency 15
2025-09-14 15:57:46,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 6 hours, 33 minutes, 47 seconds)
2025-09-14 16:03:51,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:03:59,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1212.98083 ± 156.173
2025-09-14 16:03:59,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1379.7946), np.float32(1217.9894), np.float32(1164.15), np.float32(1345.1178), np.float32(1129.5652), np.float32(1246.1023), np.float32(1505.758), np.float32(1147.7527), np.float32(948.32776), np.float32(1045.2495)]
2025-09-14 16:03:59,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:03:59,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 6 hours, 31 minutes, 23 seconds)
2025-09-14 16:09:37,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:09:44,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1147.60925 ± 191.357
2025-09-14 16:09:44,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1306.6129), np.float32(881.89355), np.float32(1004.3788), np.float32(1005.62897), np.float32(1414.8793), np.float32(1171.9913), np.float32(1405.6887), np.float32(1079.058), np.float32(1305.8347), np.float32(900.1266)]
2025-09-14 16:09:44,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:09:44,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 6 hours, 26 minutes, 35 seconds)
2025-09-14 16:16:20,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:16:28,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1134.03540 ± 239.140
2025-09-14 16:16:28,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(937.0729), np.float32(1477.4255), np.float32(1087.8776), np.float32(945.97), np.float32(1174.0253), np.float32(1151.6846), np.float32(1660.3318), np.float32(989.3957), np.float32(868.637), np.float32(1047.9336)]
2025-09-14 16:16:28,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:16:28,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 6 hours, 31 minutes, 3 seconds)
2025-09-14 16:22:21,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:22:28,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1378.05579 ± 381.765
2025-09-14 16:22:28,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1798.2191), np.float32(1127.8987), np.float32(1129.4924), np.float32(1058.0248), np.float32(981.8458), np.float32(1042.585), np.float32(1963.2726), np.float32(1986.7456), np.float32(1159.6907), np.float32(1532.7831)]
2025-09-14 16:22:28,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:22:28,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1378.06) for latency 15
2025-09-14 16:22:28,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 6 hours, 25 minutes, 34 seconds)
2025-09-14 16:28:10,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:28:17,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1614.06116 ± 372.640
2025-09-14 16:28:17,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1130.839), np.float32(2265.4695), np.float32(911.43115), np.float32(1720.7368), np.float32(1880.051), np.float32(1823.5297), np.float32(1644.0444), np.float32(1366.2925), np.float32(1568.1091), np.float32(1830.1077)]
2025-09-14 16:28:17,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:28:17,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1614.06) for latency 15
2025-09-14 16:28:17,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 6 hours, 12 minutes, 17 seconds)
2025-09-14 16:33:53,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:34:01,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1309.88989 ± 440.867
2025-09-14 16:34:01,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1359.8955), np.float32(823.3838), np.float32(707.6899), np.float32(1688.2631), np.float32(1173.439), np.float32(2044.5156), np.float32(981.6951), np.float32(948.1735), np.float32(1929.8394), np.float32(1442.0034)]
2025-09-14 16:34:01,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:34:01,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 6 hours, 21 seconds)
2025-09-14 16:39:51,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:39:58,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1673.95679 ± 395.477
2025-09-14 16:39:58,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1499.0691), np.float32(1417.4711), np.float32(2192.4077), np.float32(908.6246), np.float32(2118.5168), np.float32(1595.9073), np.float32(2210.4067), np.float32(1698.539), np.float32(1761.3834), np.float32(1337.2422)]
2025-09-14 16:39:58,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:39:58,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1673.96) for latency 15
2025-09-14 16:39:58,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 5 hours, 56 minutes, 43 seconds)
2025-09-14 16:45:57,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:46:05,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1355.11377 ± 277.699
2025-09-14 16:46:05,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1294.1129), np.float32(1605.8671), np.float32(959.2604), np.float32(1511.8184), np.float32(1786.5028), np.float32(1737.9174), np.float32(1056.0266), np.float32(1064.6606), np.float32(1240.0421), np.float32(1294.9293)]
2025-09-14 16:46:05,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:46:05,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 5 hours, 43 minutes, 27 seconds)
2025-09-14 16:52:21,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:52:29,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1516.88367 ± 406.972
2025-09-14 16:52:29,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1409.8829), np.float32(2170.186), np.float32(1529.0656), np.float32(1937.8438), np.float32(1942.1508), np.float32(896.40643), np.float32(1028.3666), np.float32(1060.3538), np.float32(1683.3502), np.float32(1511.2301)]
2025-09-14 16:52:29,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:52:29,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 5 hours, 42 minutes, 6 seconds)
2025-09-14 16:58:16,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:58:24,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1556.77271 ± 513.908
2025-09-14 16:58:24,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1439.9702), np.float32(2357.137), np.float32(1434.2909), np.float32(1544.3036), np.float32(1325.8768), np.float32(1029.3069), np.float32(1291.0944), np.float32(1456.2825), np.float32(1009.9949), np.float32(2679.4705)]
2025-09-14 16:58:24,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:58:24,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 5 hours, 37 minutes, 10 seconds)
2025-09-14 17:04:10,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 17:04:17,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1420.39807 ± 224.089
2025-09-14 17:04:17,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1659.4261), np.float32(1105.9164), np.float32(1147.1741), np.float32(1676.1031), np.float32(1419.0385), np.float32(1520.4219), np.float32(1732.8092), np.float32(1262.6238), np.float32(1165.967), np.float32(1514.4999)]
2025-09-14 17:04:17,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:04:17,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 5 hours, 33 minutes, 4 seconds)
2025-09-14 17:10:10,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 17:10:17,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1487.89624 ± 317.759
2025-09-14 17:10:17,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1275.6014), np.float32(1332.3182), np.float32(1148.7549), np.float32(962.79205), np.float32(1445.5416), np.float32(1453.2898), np.float32(2071.548), np.float32(1614.441), np.float32(1714.196), np.float32(1860.4785)]
2025-09-14 17:10:17,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:10:17,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 5 hours, 27 minutes, 25 seconds)
2025-09-14 17:16:19,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 17:16:27,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1797.42310 ± 506.256
2025-09-14 17:16:27,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2682.8103), np.float32(2051.0947), np.float32(1177.1632), np.float32(2064.3362), np.float32(1225.9756), np.float32(2382.2258), np.float32(1689.6589), np.float32(1233.4528), np.float32(1378.5142), np.float32(2088.9988)]
2025-09-14 17:16:27,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:16:27,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1797.42) for latency 15
2025-09-14 17:16:27,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 5 hours, 21 minutes, 59 seconds)
2025-09-14 17:22:46,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 17:22:53,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1533.96887 ± 379.272
2025-09-14 17:22:53,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1811.6432), np.float32(1588.2798), np.float32(1369.5726), np.float32(1055.1858), np.float32(1076.4816), np.float32(1409.2817), np.float32(2406.2288), np.float32(1388.1346), np.float32(1415.8528), np.float32(1819.0284)]
2025-09-14 17:22:53,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:22:53,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 5 hours, 16 minutes, 13 seconds)
2025-09-14 17:28:57,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 17:29:05,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1685.62537 ± 716.833
2025-09-14 17:29:05,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1809.3958), np.float32(2256.7664), np.float32(1999.436), np.float32(1825.1152), np.float32(1540.0616), np.float32(1193.4675), np.float32(1718.863), np.float32(2628.9136), np.float32(-157.68361), np.float32(2041.9174)]
2025-09-14 17:29:05,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:29:05,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 5 hours, 13 minutes)
2025-09-14 17:35:01,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 17:35:09,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1502.89111 ± 251.225
2025-09-14 17:35:09,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1411.3202), np.float32(1994.9463), np.float32(1478.0674), np.float32(1330.5354), np.float32(1075.137), np.float32(1260.5027), np.float32(1528.4908), np.float32(1483.6221), np.float32(1705.5273), np.float32(1760.7627)]
2025-09-14 17:35:09,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:35:09,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 5 hours, 8 minutes, 33 seconds)
2025-09-14 17:40:54,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 17:41:01,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2062.58643 ± 561.479
2025-09-14 17:41:01,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3131.0054), np.float32(1132.475), np.float32(2105.0369), np.float32(1574.9185), np.float32(2634.242), np.float32(2227.3608), np.float32(1966.5361), np.float32(1832.519), np.float32(2503.2742), np.float32(1518.4982)]
2025-09-14 17:41:01,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:41:01,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2062.59) for latency 15
2025-09-14 17:41:01,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 5 hours, 1 minute, 11 seconds)
2025-09-14 17:46:53,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 17:47:01,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1633.80603 ± 565.662
2025-09-14 17:47:01,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1063.481), np.float32(2371.6326), np.float32(1658.9244), np.float32(1627.1134), np.float32(2299.2725), np.float32(1109.2677), np.float32(1195.665), np.float32(1374.1191), np.float32(1018.2942), np.float32(2620.2913)]
2025-09-14 17:47:01,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:47:01,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 4 hours, 53 minutes, 24 seconds)
2025-09-14 17:52:28,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 17:52:36,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1485.85864 ± 370.977
2025-09-14 17:52:36,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1613.4277), np.float32(1970.3264), np.float32(1400.0603), np.float32(1131.9839), np.float32(2118.1663), np.float32(1555.6573), np.float32(1095.3359), np.float32(1543.0914), np.float32(1593.2646), np.float32(837.27203)]
2025-09-14 17:52:36,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:52:36,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 4 hours, 39 minutes, 17 seconds)
2025-09-14 17:58:28,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 17:58:35,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1738.24438 ± 539.214
2025-09-14 17:58:35,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1273.9742), np.float32(971.6517), np.float32(1607.9138), np.float32(2601.1943), np.float32(1861.6367), np.float32(1610.2034), np.float32(2080.4822), np.float32(2427.475), np.float32(935.8323), np.float32(2012.0792)]
2025-09-14 17:58:35,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:58:35,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 4 hours, 31 minutes, 24 seconds)
2025-09-14 18:04:52,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 18:04:59,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1885.92224 ± 585.182
2025-09-14 18:04:59,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2133.5), np.float32(1243.6108), np.float32(2407.108), np.float32(1352.3033), np.float32(1262.443), np.float32(2188.8), np.float32(1868.8787), np.float32(1930.366), np.float32(3148.8374), np.float32(1323.3755)]
2025-09-14 18:04:59,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:04:59,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 4 hours, 28 minutes, 30 seconds)
2025-09-14 18:11:27,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 18:11:35,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2095.60986 ± 647.727
2025-09-14 18:11:35,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1918.8512), np.float32(1571.3279), np.float32(1148.9736), np.float32(1642.9423), np.float32(2711.5679), np.float32(2533.295), np.float32(2600.7485), np.float32(2375.5125), np.float32(3188.117), np.float32(1264.7622)]
2025-09-14 18:11:35,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:11:35,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2095.61) for latency 15
2025-09-14 18:11:35,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 4 hours, 28 minutes, 55 seconds)
2025-09-14 18:17:50,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 18:17:58,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1820.59180 ± 595.926
2025-09-14 18:17:58,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2523.7087), np.float32(1550.1484), np.float32(1055.8578), np.float32(1755.1614), np.float32(1067.9775), np.float32(2238.971), np.float32(1848.0718), np.float32(2680.8416), np.float32(2413.5015), np.float32(1071.6799)]
2025-09-14 18:17:58,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:17:58,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 4 hours, 26 minutes, 11 seconds)
2025-09-14 18:24:11,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 18:24:19,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2089.34082 ± 854.938
2025-09-14 18:24:19,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3099.8684), np.float32(829.60675), np.float32(1079.5623), np.float32(2412.7144), np.float32(1239.9148), np.float32(2828.526), np.float32(1742.404), np.float32(2982.0793), np.float32(1543.575), np.float32(3135.159)]
2025-09-14 18:24:19,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:24:19,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 4 hours, 26 minutes, 19 seconds)
2025-09-14 18:30:44,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 18:30:52,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1758.19788 ± 562.861
2025-09-14 18:30:52,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1101.2677), np.float32(1352.1855), np.float32(2511.9346), np.float32(2490.055), np.float32(1627.5575), np.float32(1590.5425), np.float32(2702.6453), np.float32(1714.7424), np.float32(1205.0648), np.float32(1285.9839)]
2025-09-14 18:30:52,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:30:52,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 4 hours, 24 minutes, 44 seconds)
2025-09-14 18:36:44,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 18:36:51,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2369.44214 ± 902.812
2025-09-14 18:36:51,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1214.7217), np.float32(3241.909), np.float32(1494.8065), np.float32(3281.7156), np.float32(2952.6692), np.float32(2272.5217), np.float32(3319.5085), np.float32(3257.1794), np.float32(971.582), np.float32(1687.8082)]
2025-09-14 18:36:51,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:36:51,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2369.44) for latency 15
2025-09-14 18:36:51,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 4 hours, 15 minutes)
2025-09-14 18:42:43,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 18:42:50,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2288.65942 ± 675.961
2025-09-14 18:42:50,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3354.1753), np.float32(2856.607), np.float32(2450.0796), np.float32(1881.8811), np.float32(1796.5116), np.float32(1468.7723), np.float32(3270.8442), np.float32(2524.2515), np.float32(1913.9305), np.float32(1369.5419)]
2025-09-14 18:42:50,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:42:50,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 4 hours, 3 minutes, 47 seconds)
2025-09-14 18:49:00,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 18:49:07,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2075.24854 ± 946.190
2025-09-14 18:49:07,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3257.7458), np.float32(643.6302), np.float32(3295.7205), np.float32(2706.9902), np.float32(1344.7241), np.float32(1813.7975), np.float32(1238.4955), np.float32(1229.2585), np.float32(1895.4501), np.float32(3326.6733)]
2025-09-14 18:49:07,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:49:07,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 3 hours, 56 minutes, 48 seconds)
2025-09-14 18:55:05,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 18:55:13,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2535.97412 ± 730.661
2025-09-14 18:55:13,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2643.2134), np.float32(1666.3724), np.float32(3024.2932), np.float32(2062.6396), np.float32(3304.0068), np.float32(1863.5673), np.float32(3331.9624), np.float32(1263.7246), np.float32(3398.3284), np.float32(2801.6318)]
2025-09-14 18:55:13,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:55:13,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2535.97) for latency 15
2025-09-14 18:55:13,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 3 hours, 48 minutes, 41 seconds)
2025-09-14 19:00:48,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 19:00:56,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2071.49463 ± 669.899
2025-09-14 19:00:56,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3162.729), np.float32(2763.674), np.float32(1484.2092), np.float32(3162.9543), np.float32(1222.1467), np.float32(1823.7377), np.float32(1627.3898), np.float32(2092.0479), np.float32(1628.7418), np.float32(1747.3141)]
2025-09-14 19:00:56,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:00:56,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 3 hours, 36 minutes, 26 seconds)
2025-09-14 19:06:35,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 19:06:42,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2148.81567 ± 782.880
2025-09-14 19:06:42,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3283.6943), np.float32(3331.2422), np.float32(1853.4624), np.float32(2089.2307), np.float32(1826.3679), np.float32(911.64594), np.float32(1692.7485), np.float32(2399.3462), np.float32(1205.4077), np.float32(2895.0098)]
2025-09-14 19:06:42,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:06:42,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 3 hours, 28 minutes, 53 seconds)
2025-09-14 19:12:24,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 19:12:31,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2802.24463 ± 903.999
2025-09-14 19:12:31,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1433.0153), np.float32(3398.3489), np.float32(1295.8136), np.float32(3218.0244), np.float32(3353.463), np.float32(1563.5331), np.float32(3304.5186), np.float32(3551.7507), np.float32(3491.8245), np.float32(3412.155)]
2025-09-14 19:12:31,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:12:31,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2802.24) for latency 15
2025-09-14 19:12:31,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 3 hours, 21 minutes, 50 seconds)
2025-09-14 19:18:37,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 19:18:45,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2868.96436 ± 386.505
2025-09-14 19:18:45,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2152.7605), np.float32(2581.9714), np.float32(2977.8398), np.float32(2370.8154), np.float32(2783.057), np.float32(2925.9153), np.float32(2978.606), np.float32(3446.8804), np.float32(3156.4368), np.float32(3315.3586)]
2025-09-14 19:18:45,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:18:45,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2868.96) for latency 15
2025-09-14 19:18:45,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 3 hours, 15 minutes, 30 seconds)
2025-09-14 19:24:41,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 19:24:48,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2058.25586 ± 860.487
2025-09-14 19:24:48,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1608.6112), np.float32(3449.7058), np.float32(1742.5476), np.float32(1654.5864), np.float32(2494.844), np.float32(3058.0007), np.float32(3119.9954), np.float32(1533.5293), np.float32(961.9569), np.float32(958.7816)]
2025-09-14 19:24:48,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:24:48,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 3 hours, 9 minutes, 22 seconds)
2025-09-14 19:30:35,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 19:30:42,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2832.25488 ± 746.267
2025-09-14 19:30:42,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2638.2935), np.float32(3059.886), np.float32(3061.138), np.float32(3402.3037), np.float32(1027.8259), np.float32(3385.9587), np.float32(2274.0444), np.float32(3647.4717), np.float32(2388.0974), np.float32(3437.5295)]
2025-09-14 19:30:42,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:30:42,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 3 hours, 4 minutes, 36 seconds)
2025-09-14 19:36:38,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 19:36:45,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2447.03125 ± 941.875
2025-09-14 19:36:45,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3457.6187), np.float32(1994.9202), np.float32(3547.643), np.float32(3396.536), np.float32(1226.8267), np.float32(3011.849), np.float32(1809.7441), np.float32(1730.2089), np.float32(3293.2012), np.float32(1001.763)]
2025-09-14 19:36:45,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:36:45,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 3 hours, 21 seconds)
2025-09-14 19:42:31,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 19:42:39,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2733.25171 ± 825.257
2025-09-14 19:42:39,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3726.9326), np.float32(3322.0173), np.float32(3362.7964), np.float32(1415.3086), np.float32(1883.686), np.float32(3211.0889), np.float32(2909.3062), np.float32(3128.7114), np.float32(1289.8333), np.float32(3082.8357)]
2025-09-14 19:42:39,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:42:39,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 54 minutes, 45 seconds)
2025-09-14 19:48:38,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 19:48:46,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3226.18506 ± 225.699
2025-09-14 19:48:46,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3385.0234), np.float32(3186.4053), np.float32(3359.0986), np.float32(2736.8625), np.float32(3460.5137), np.float32(3093.2615), np.float32(3194.462), np.float32(3587.541), np.float32(3165.1458), np.float32(3093.5354)]
2025-09-14 19:48:46,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:48:46,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (3226.19) for latency 15
2025-09-14 19:48:46,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 2 hours, 48 minutes, 5 seconds)
2025-09-14 19:54:43,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 19:54:50,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2704.12085 ± 815.028
2025-09-14 19:54:50,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3680.5227), np.float32(3230.8774), np.float32(3171.3423), np.float32(1076.0756), np.float32(2556.5356), np.float32(3274.4434), np.float32(3291.7961), np.float32(1546.2334), np.float32(3102.145), np.float32(2111.2378)]
2025-09-14 19:54:50,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:54:50,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 2 hours, 42 minutes, 11 seconds)
2025-09-14 20:00:39,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 20:00:47,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2594.17725 ± 964.976
2025-09-14 20:00:47,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2867.6504), np.float32(3191.1975), np.float32(3614.41), np.float32(1481.1686), np.float32(3358.212), np.float32(3425.9592), np.float32(1467.5969), np.float32(1157.3461), np.float32(1695.7404), np.float32(3682.4915)]
2025-09-14 20:00:47,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:00:47,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 2 hours, 36 minutes, 24 seconds)
2025-09-14 20:06:41,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 20:06:49,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2413.60913 ± 864.924
2025-09-14 20:06:49,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2099.5952), np.float32(1012.7125), np.float32(2463.5764), np.float32(1789.6698), np.float32(3377.0098), np.float32(3641.6296), np.float32(3480.846), np.float32(1915.8423), np.float32(2917.091), np.float32(1438.1166)]
2025-09-14 20:06:49,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:06:49,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 2 hours, 30 minutes, 15 seconds)
2025-09-14 20:12:43,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 20:12:50,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3090.11450 ± 692.887
2025-09-14 20:12:50,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3485.4644), np.float32(1320.7344), np.float32(3462.1196), np.float32(3253.0217), np.float32(2385.16), np.float32(3180.4526), np.float32(3587.4556), np.float32(2977.47), np.float32(3703.4084), np.float32(3545.8574)]
2025-09-14 20:12:50,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:12:50,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 2 hours, 24 minutes, 53 seconds)
2025-09-14 20:19:17,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 20:19:25,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3315.19922 ± 378.622
2025-09-14 20:19:25,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2905.2996), np.float32(3478.764), np.float32(2703.1643), np.float32(3783.5693), np.float32(3005.1414), np.float32(3696.2058), np.float32(2880.9297), np.float32(3451.0093), np.float32(3570.014), np.float32(3677.895)]
2025-09-14 20:19:25,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:19:25,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (3315.20) for latency 15
2025-09-14 20:19:25,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 2 hours, 20 minutes, 59 seconds)
2025-09-14 20:25:24,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 20:25:32,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2894.69092 ± 700.056
2025-09-14 20:25:32,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3690.2488), np.float32(1817.383), np.float32(2770.206), np.float32(3641.2673), np.float32(1576.4186), np.float32(3629.2397), np.float32(2950.2078), np.float32(3359.0896), np.float32(2759.721), np.float32(2753.128)]
2025-09-14 20:25:32,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:25:32,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 2 hours, 15 minutes, 1 second)
2025-09-14 20:31:10,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 20:31:18,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2932.87305 ± 842.585
2025-09-14 20:31:18,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3317.7458), np.float32(2293.5864), np.float32(3582.639), np.float32(3470.8835), np.float32(2845.8335), np.float32(3620.1663), np.float32(901.32275), np.float32(3420.327), np.float32(3640.3293), np.float32(2235.8975)]
2025-09-14 20:31:18,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:31:18,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 2 hours, 8 minutes, 10 seconds)
2025-09-14 20:37:04,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 20:37:12,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3115.27417 ± 558.198
2025-09-14 20:37:12,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2764.7244), np.float32(3406.2734), np.float32(3163.5645), np.float32(3111.06), np.float32(3524.1736), np.float32(1695.9995), np.float32(3479.4658), np.float32(3584.3845), np.float32(2791.3367), np.float32(3631.7588)]
2025-09-14 20:37:12,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:37:12,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 2 hours, 1 minute, 33 seconds)
2025-09-14 20:43:11,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 20:43:18,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2809.22314 ± 911.708
2025-09-14 20:43:18,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1423.8384), np.float32(3219.558), np.float32(2538.122), np.float32(3596.23), np.float32(3506.8474), np.float32(1144.6007), np.float32(3648.324), np.float32(2020.6061), np.float32(3467.945), np.float32(3526.16)]
2025-09-14 20:43:18,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:43:18,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 55 minutes, 46 seconds)
2025-09-14 20:49:08,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 20:49:16,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3111.32129 ± 756.147
2025-09-14 20:49:16,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1250.288), np.float32(3603.103), np.float32(2468.736), np.float32(2482.386), np.float32(3423.5334), np.float32(3535.3762), np.float32(3676.598), np.float32(3499.1077), np.float32(3631.6196), np.float32(3542.4653)]
2025-09-14 20:49:16,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:49:16,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 47 minutes, 27 seconds)
2025-09-14 20:55:25,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 20:55:32,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3139.07593 ± 588.981
2025-09-14 20:55:32,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2135.6013), np.float32(3684.0605), np.float32(3035.2493), np.float32(3467.245), np.float32(3611.3225), np.float32(3193.8037), np.float32(1978.5282), np.float32(3577.5305), np.float32(3661.3477), np.float32(3046.0718)]
2025-09-14 20:55:32,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:55:32,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 42 minutes, 2 seconds)
2025-09-14 21:01:22,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 21:01:30,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2947.97900 ± 753.444
2025-09-14 21:01:30,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2027.9474), np.float32(3331.791), np.float32(2108.8206), np.float32(1396.9918), np.float32(3692.7805), np.float32(3318.336), np.float32(3190.2627), np.float32(3508.8394), np.float32(3453.3516), np.float32(3450.6707)]
2025-09-14 21:01:30,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:01:30,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 36 minutes, 38 seconds)
2025-09-14 21:07:39,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 21:07:46,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2976.00000 ± 700.563
2025-09-14 21:07:46,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3700.1738), np.float32(3399.452), np.float32(1710.0779), np.float32(3595.2012), np.float32(2706.2952), np.float32(3146.2527), np.float32(3560.979), np.float32(3492.5078), np.float32(2659.3015), np.float32(1789.7568)]
2025-09-14 21:07:46,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:07:46,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 31 minutes, 43 seconds)
2025-09-14 21:13:42,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 21:13:49,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2912.27930 ± 611.433
2025-09-14 21:13:49,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3014.2778), np.float32(2210.8687), np.float32(3149.746), np.float32(1451.8052), np.float32(2701.6035), np.float32(3571.2341), np.float32(3008.9048), np.float32(3408.1191), np.float32(3364.1108), np.float32(3242.123)]
2025-09-14 21:13:49,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:13:49,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 1 hour, 25 minutes, 26 seconds)
2025-09-14 21:20:08,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 21:20:16,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2919.71069 ± 756.860
2025-09-14 21:20:16,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1481.1694), np.float32(3301.2495), np.float32(3566.689), np.float32(1894.4374), np.float32(3653.0295), np.float32(2116.5962), np.float32(3544.1409), np.float32(3543.291), np.float32(2891.622), np.float32(3204.8843)]
2025-09-14 21:20:16,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:20:16,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 1 hour, 20 minutes, 36 seconds)
2025-09-14 21:26:37,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 21:26:45,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3170.23877 ± 661.546
2025-09-14 21:26:45,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3548.5742), np.float32(2081.9922), np.float32(3532.007), np.float32(3395.2832), np.float32(3480.797), np.float32(3207.455), np.float32(3568.6052), np.float32(3596.6226), np.float32(3614.6782), np.float32(1676.375)]
2025-09-14 21:26:45,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:26:45,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 1 hour, 14 minutes, 53 seconds)
2025-09-14 21:32:33,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 21:32:40,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2547.80713 ± 929.441
2025-09-14 21:32:40,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2224.6343), np.float32(1142.2526), np.float32(1201.699), np.float32(3397.7417), np.float32(2166.0645), np.float32(3566.0518), np.float32(1692.1707), np.float32(3142.695), np.float32(3623.9546), np.float32(3320.808)]
2025-09-14 21:32:40,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:32:40,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 1 hour, 8 minutes, 34 seconds)
2025-09-14 21:38:45,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 21:38:53,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3015.23901 ± 845.904
2025-09-14 21:38:53,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1273.9856), np.float32(3742.8245), np.float32(3533.9834), np.float32(3480.8694), np.float32(3669.2253), np.float32(2880.041), np.float32(1605.153), np.float32(2907.201), np.float32(3271.4297), np.float32(3787.678)]
2025-09-14 21:38:53,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:38:53,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 1 hour, 2 minutes, 12 seconds)
2025-09-14 21:44:45,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 21:44:52,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2728.74585 ± 1063.973
2025-09-14 21:44:52,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1096.8671), np.float32(3646.495), np.float32(1119.988), np.float32(3648.3435), np.float32(3440.5596), np.float32(1585.1172), np.float32(3536.125), np.float32(3453.3806), np.float32(3700.013), np.float32(2060.571)]
2025-09-14 21:44:52,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:44:52,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 55 minutes, 53 seconds)
2025-09-14 21:50:54,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 21:51:02,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3209.39380 ± 609.057
2025-09-14 21:51:02,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3175.3655), np.float32(3657.0647), np.float32(3375.0671), np.float32(3670.9785), np.float32(2319.006), np.float32(3543.663), np.float32(3688.978), np.float32(1793.2622), np.float32(3322.5073), np.float32(3548.0437)]
2025-09-14 21:51:02,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:51:02,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 49 minutes, 13 seconds)
2025-09-14 21:57:14,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 21:57:22,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3005.72803 ± 864.227
2025-09-14 21:57:22,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3473.3418), np.float32(3574.4358), np.float32(3659.5903), np.float32(1569.4279), np.float32(3498.91), np.float32(3473.884), np.float32(3537.233), np.float32(2744.0308), np.float32(1140.6581), np.float32(3385.7722)]
2025-09-14 21:57:22,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:57:22,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 42 minutes, 52 seconds)
2025-09-14 22:04:57,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 22:05:04,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2784.57861 ± 968.462
2025-09-14 22:05:04,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2204.7983), np.float32(3248.183), np.float32(3462.348), np.float32(3547.2266), np.float32(3521.955), np.float32(3713.8914), np.float32(1553.2056), np.float32(3769.8271), np.float32(1662.9645), np.float32(1161.3884)]
2025-09-14 22:05:04,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 22:05:04,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 38 minutes, 52 seconds)
2025-09-14 22:12:30,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 22:12:37,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2489.59033 ± 815.328
2025-09-14 22:12:37,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3243.3103), np.float32(3069.5112), np.float32(1325.2473), np.float32(1171.6702), np.float32(1447.3691), np.float32(3297.168), np.float32(2653.8914), np.float32(2391.2578), np.float32(3085.433), np.float32(3211.0442)]
2025-09-14 22:12:37,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 22:12:37,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 33 minutes, 44 seconds)
2025-09-14 22:19:40,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 22:19:47,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3391.18506 ± 301.512
2025-09-14 22:19:47,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3608.3425), np.float32(3393.259), np.float32(3355.467), np.float32(2569.193), np.float32(3525.5957), np.float32(3696.9575), np.float32(3424.3508), np.float32(3635.761), np.float32(3419.8315), np.float32(3283.0918)]
2025-09-14 22:19:47,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 22:19:47,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (3391.19) for latency 15
2025-09-14 22:19:47,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 27 minutes, 55 seconds)
2025-09-14 22:27:00,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 22:27:07,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2935.17188 ± 885.856
2025-09-14 22:27:07,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3453.9385), np.float32(3492.7395), np.float32(3531.6533), np.float32(3203.6196), np.float32(3416.6265), np.float32(3734.971), np.float32(911.22986), np.float32(2019.5225), np.float32(3498.445), np.float32(2088.9727)]
2025-09-14 22:27:07,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 22:27:07,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 21 minutes, 39 seconds)
2025-09-14 22:34:20,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 22:34:28,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3220.64673 ± 688.471
2025-09-14 22:34:28,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3687.8298), np.float32(1314.3214), np.float32(3618.1382), np.float32(3675.4382), np.float32(3159.4128), np.float32(3349.114), np.float32(3233.4446), np.float32(3699.5293), np.float32(2865.4172), np.float32(3603.8225)]
2025-09-14 22:34:28,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 22:34:28,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 14 minutes, 50 seconds)
2025-09-14 22:42:19,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 22:42:26,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3188.78174 ± 624.280
2025-09-14 22:42:26,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3618.1333), np.float32(3324.3289), np.float32(3544.4907), np.float32(3209.7126), np.float32(3387.4246), np.float32(3292.5083), np.float32(3312.0686), np.float32(1363.071), np.float32(3596.6655), np.float32(3239.414)]
2025-09-14 22:42:26,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 22:42:26,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 7 minutes, 28 seconds)
2025-09-14 22:49:27,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 22:49:34,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3483.06885 ± 91.092
2025-09-14 22:49:34,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3392.8367), np.float32(3416.5679), np.float32(3567.77), np.float32(3427.8254), np.float32(3427.521), np.float32(3375.9082), np.float32(3451.242), np.float32(3538.5547), np.float32(3561.1729), np.float32(3671.2876)]
2025-09-14 22:49:34,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 22:49:34,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (3483.07) for latency 15
2025-09-14 22:49:34,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1251 [DEBUG]: Training session finished
