2025-09-14 08:43:01,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.075-delay_6
2025-09-14 08:43:01,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.075-delay_6
2025-09-14 08:43:01,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'6': <latency_env.delayed_mdp.ConstantDelay object at 0x7efa98677aa0>}
2025-09-14 08:43:01,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 08:43:01,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 08:43:01,764 baseline-bpql-noisepromille75-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=53, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 08:43:01,764 baseline-bpql-noisepromille75-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 08:43:03,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 08:43:03,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 08:46:39,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 08:46:46,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: -294.05414 ± 38.747
2025-09-14 08:46:46,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-363.89673), np.float32(-323.06384), np.float32(-313.7073), np.float32(-300.103), np.float32(-291.1345), np.float32(-224.90222), np.float32(-330.177), np.float32(-270.4464), np.float32(-254.2735), np.float32(-268.83694)]
2025-09-14 08:46:46,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:46:46,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (-294.05) for latency 6
2025-09-14 08:46:46,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 6 hours, 8 minutes, 28 seconds)
2025-09-14 08:50:20,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 08:50:27,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: -236.86858 ± 45.702
2025-09-14 08:50:27,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-244.02745), np.float32(-224.5369), np.float32(-239.0675), np.float32(-244.97525), np.float32(-231.29642), np.float32(-231.20882), np.float32(-206.08891), np.float32(-339.58298), np.float32(-262.62296), np.float32(-145.27864)]
2025-09-14 08:50:27,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:50:27,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (-236.87) for latency 6
2025-09-14 08:50:27,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 2 minutes, 51 seconds)
2025-09-14 08:53:58,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 08:54:06,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: -33.77243 ± 54.792
2025-09-14 08:54:06,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-58.237232), np.float32(-15.028818), np.float32(-126.67707), np.float32(-2.2664034), np.float32(-14.55148), np.float32(32.439487), np.float32(-16.001163), np.float32(-33.38403), np.float32(30.797478), np.float32(-134.8151)]
2025-09-14 08:54:06,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:54:06,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (-33.77) for latency 6
2025-09-14 08:54:06,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 57 minutes, 4 seconds)
2025-09-14 08:57:43,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 08:57:51,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 420.84015 ± 64.832
2025-09-14 08:57:51,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(493.4145), np.float32(451.3517), np.float32(343.41983), np.float32(332.46503), np.float32(549.2861), np.float32(410.33694), np.float32(432.7636), np.float32(364.28378), np.float32(383.72845), np.float32(447.35147)]
2025-09-14 08:57:51,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:57:51,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (420.84) for latency 6
2025-09-14 08:57:51,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 55 minutes, 17 seconds)
2025-09-14 09:01:25,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:01:32,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 854.26416 ± 245.113
2025-09-14 09:01:32,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1048.4677), np.float32(874.73865), np.float32(920.9242), np.float32(899.5536), np.float32(1150.5834), np.float32(190.41171), np.float32(976.03253), np.float32(827.0259), np.float32(900.4636), np.float32(754.4404)]
2025-09-14 09:01:32,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:01:32,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (854.26) for latency 6
2025-09-14 09:01:32,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 51 minutes, 17 seconds)
2025-09-14 09:04:55,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:05:03,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1689.57886 ± 332.238
2025-09-14 09:05:03,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1700.98), np.float32(2313.8901), np.float32(1980.1576), np.float32(953.1005), np.float32(1783.0548), np.float32(1524.9187), np.float32(1787.9221), np.float32(1722.9094), np.float32(1631.9872), np.float32(1496.8691)]
2025-09-14 09:05:03,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:05:03,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1689.58) for latency 6
2025-09-14 09:05:03,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 43 minutes, 45 seconds)
2025-09-14 09:08:24,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:08:31,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2221.85010 ± 504.786
2025-09-14 09:08:31,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2413.35), np.float32(2823.377), np.float32(2549.2664), np.float32(2416.0757), np.float32(2209.3206), np.float32(1183.0653), np.float32(1408.823), np.float32(2546.4268), np.float32(2075.8142), np.float32(2592.9836)]
2025-09-14 09:08:31,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:08:31,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2221.85) for latency 6
2025-09-14 09:08:31,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 36 minutes, 2 seconds)
2025-09-14 09:11:30,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:11:37,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2086.39697 ± 874.366
2025-09-14 09:11:37,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(577.9837), np.float32(2803.1443), np.float32(2361.5833), np.float32(2676.124), np.float32(2441.5503), np.float32(2253.8394), np.float32(2543.0642), np.float32(2435.5266), np.float32(167.85031), np.float32(2603.3037)]
2025-09-14 09:11:37,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:11:37,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 22 minutes, 24 seconds)
2025-09-14 09:14:27,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:14:33,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2471.34595 ± 648.001
2025-09-14 09:14:33,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2928.9736), np.float32(1560.718), np.float32(3051.119), np.float32(2513.725), np.float32(2452.7996), np.float32(955.0732), np.float32(2717.7295), np.float32(2990.456), np.float32(2851.2668), np.float32(2691.5972)]
2025-09-14 09:14:33,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:14:33,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2471.35) for latency 6
2025-09-14 09:14:33,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 3 minutes, 59 seconds)
2025-09-14 09:17:24,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:17:30,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2562.76709 ± 768.635
2025-09-14 09:17:30,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2977.4294), np.float32(2908.1697), np.float32(3049.1416), np.float32(3035.2617), np.float32(658.0305), np.float32(2505.5933), np.float32(3092.5874), np.float32(2912.7703), np.float32(1566.2401), np.float32(2922.4482)]
2025-09-14 09:17:30,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:17:30,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2562.77) for latency 6
2025-09-14 09:17:30,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 47 minutes, 25 seconds)
2025-09-14 09:20:06,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:20:13,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2825.58398 ± 711.114
2025-09-14 09:20:13,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3467.8198), np.float32(3278.8418), np.float32(1179.7363), np.float32(3184.147), np.float32(1964.256), np.float32(3244.4739), np.float32(2323.254), np.float32(3232.2322), np.float32(3183.3997), np.float32(3197.677)]
2025-09-14 09:20:13,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:20:13,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2825.58) for latency 6
2025-09-14 09:20:13,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 29 minutes, 41 seconds)
2025-09-14 09:23:33,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:23:41,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3093.41064 ± 1026.194
2025-09-14 09:23:41,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3320.304), np.float32(3620.3767), np.float32(3797.2288), np.float32(3823.5986), np.float32(1480.0111), np.float32(3869.6882), np.float32(3080.7913), np.float32(794.9889), np.float32(3907.7544), np.float32(3239.3652)]
2025-09-14 09:23:41,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:23:41,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3093.41) for latency 6
2025-09-14 09:23:41,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 26 minutes, 49 seconds)
2025-09-14 09:27:06,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:27:14,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3906.14844 ± 104.541
2025-09-14 09:27:14,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3937.469), np.float32(3893.4573), np.float32(3823.89), np.float32(3777.9163), np.float32(3934.961), np.float32(4045.3584), np.float32(4059.9287), np.float32(3709.382), np.float32(3931.506), np.float32(3947.6118)]
2025-09-14 09:27:14,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:27:14,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3906.15) for latency 6
2025-09-14 09:27:14,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 31 minutes, 51 seconds)
2025-09-14 09:30:39,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:30:47,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4001.03125 ± 101.744
2025-09-14 09:30:47,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3946.7964), np.float32(3935.9255), np.float32(4089.1907), np.float32(3817.79), np.float32(3939.383), np.float32(4028.3774), np.float32(3988.684), np.float32(4108.177), np.float32(4194.101), np.float32(3961.8848)]
2025-09-14 09:30:47,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:30:47,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4001.03) for latency 6
2025-09-14 09:30:47,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 39 minutes, 3 seconds)
2025-09-14 09:34:13,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:34:21,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3875.89209 ± 259.279
2025-09-14 09:34:21,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4005.4802), np.float32(3939.2275), np.float32(4096.199), np.float32(3805.4968), np.float32(3521.9243), np.float32(3480.2783), np.float32(3968.3203), np.float32(4109.5454), np.float32(4269.613), np.float32(3562.8357)]
2025-09-14 09:34:21,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:34:21,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 46 minutes, 13 seconds)
2025-09-14 09:37:46,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:37:54,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4092.71753 ± 242.689
2025-09-14 09:37:54,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3992.8945), np.float32(4189.227), np.float32(3925.4526), np.float32(4404.931), np.float32(4091.0134), np.float32(3481.4404), np.float32(4147.2085), np.float32(4311.91), np.float32(4195.622), np.float32(4187.475)]
2025-09-14 09:37:54,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:37:54,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4092.72) for latency 6
2025-09-14 09:37:54,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 57 minutes, 12 seconds)
2025-09-14 09:41:19,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:41:27,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3922.71021 ± 370.522
2025-09-14 09:41:27,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4016.1255), np.float32(3155.2993), np.float32(4337.426), np.float32(3537.4204), np.float32(3738.1233), np.float32(3771.1155), np.float32(4172.1475), np.float32(4273.5054), np.float32(4377.598), np.float32(3848.3413)]
2025-09-14 09:41:27,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:41:27,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 54 minutes, 55 seconds)
2025-09-14 09:44:53,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:45:01,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4213.87891 ± 159.591
2025-09-14 09:45:01,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4401.2485), np.float32(4072.4282), np.float32(4304.749), np.float32(3904.5898), np.float32(4137.087), np.float32(4442.468), np.float32(4378.732), np.float32(4124.561), np.float32(4165.814), np.float32(4207.1143)]
2025-09-14 09:45:01,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:45:01,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4213.88) for latency 6
2025-09-14 09:45:01,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 51 minutes, 38 seconds)
2025-09-14 09:48:27,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:48:35,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4098.69580 ± 241.552
2025-09-14 09:48:35,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3915.5344), np.float32(4298.0337), np.float32(4139.4043), np.float32(4245.672), np.float32(4046.5667), np.float32(4427.219), np.float32(3511.0466), np.float32(4264.3613), np.float32(4039.4568), np.float32(4099.66)]
2025-09-14 09:48:35,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:48:35,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 48 minutes, 29 seconds)
2025-09-14 09:52:01,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:52:09,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4123.57812 ± 371.115
2025-09-14 09:52:09,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4229.95), np.float32(3920.9111), np.float32(3194.2024), np.float32(4220.495), np.float32(4306.3203), np.float32(4052.7183), np.float32(3945.2007), np.float32(4630.3315), np.float32(4419.753), np.float32(4315.9)]
2025-09-14 09:52:09,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:52:09,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 44 minutes, 59 seconds)
2025-09-14 09:55:23,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:55:31,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4265.07031 ± 183.795
2025-09-14 09:55:31,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4184.3564), np.float32(3831.909), np.float32(4469.422), np.float32(4133.536), np.float32(4508.2573), np.float32(4278.311), np.float32(4411.8843), np.float32(4324.2466), np.float32(4253.0923), np.float32(4255.6885)]
2025-09-14 09:55:31,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:55:31,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4265.07) for latency 6
2025-09-14 09:55:31,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 38 minutes, 14 seconds)
2025-09-14 09:58:20,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:58:27,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4298.31396 ± 316.405
2025-09-14 09:58:27,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4404.6665), np.float32(3445.6711), np.float32(4410.4414), np.float32(4450.9087), np.float32(4164.544), np.float32(4102.9653), np.float32(4525.9214), np.float32(4545.16), np.float32(4437.8535), np.float32(4495.007)]
2025-09-14 09:58:27,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:58:27,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4298.31) for latency 6
2025-09-14 09:58:27,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 25 minutes, 10 seconds)
2025-09-14 10:00:55,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:01:01,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4343.63135 ± 170.373
2025-09-14 10:01:01,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4560.97), np.float32(4221.792), np.float32(4214.4844), np.float32(3995.161), np.float32(4419.3174), np.float32(4476.5454), np.float32(4483.2183), np.float32(4221.376), np.float32(4521.862), np.float32(4321.5864)]
2025-09-14 10:01:01,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:01:01,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4343.63) for latency 6
2025-09-14 10:01:01,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 6 minutes, 15 seconds)
2025-09-14 10:03:20,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:03:25,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4449.12793 ± 188.901
2025-09-14 10:03:25,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4692.6055), np.float32(4222.437), np.float32(4531.7563), np.float32(4590.697), np.float32(4255.8477), np.float32(4530.0854), np.float32(4723.8223), np.float32(4166.605), np.float32(4308.684), np.float32(4468.734)]
2025-09-14 10:03:25,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:03:25,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4449.13) for latency 6
2025-09-14 10:03:25,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 45 minutes, 22 seconds)
2025-09-14 10:05:34,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:05:39,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4265.28125 ± 124.073
2025-09-14 10:05:39,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4363.415), np.float32(4384.977), np.float32(4283.3354), np.float32(4368.6226), np.float32(4045.0203), np.float32(4222.0566), np.float32(4231.0767), np.float32(4403.8535), np.float32(4048.7283), np.float32(4301.728)]
2025-09-14 10:05:39,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:05:39,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 22 minutes, 25 seconds)
2025-09-14 10:07:47,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:07:53,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4264.65039 ± 221.254
2025-09-14 10:07:53,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4516.7246), np.float32(4476.5938), np.float32(3712.7935), np.float32(4369.077), np.float32(4229.8306), np.float32(4198.4565), np.float32(4461.651), np.float32(4136.2705), np.float32(4224.531), np.float32(4320.5728)]
2025-09-14 10:07:53,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:07:53,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 3 minutes, 4 seconds)
2025-09-14 10:10:01,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:10:07,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4372.47363 ± 211.586
2025-09-14 10:10:07,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4317.5703), np.float32(4495.9277), np.float32(4299.1675), np.float32(4538.9966), np.float32(4519.563), np.float32(4537.132), np.float32(4288.577), np.float32(3883.8564), np.float32(4642.082), np.float32(4201.8667)]
2025-09-14 10:10:07,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:10:07,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 50 minutes, 16 seconds)
2025-09-14 10:12:15,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:12:20,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4145.26855 ± 211.251
2025-09-14 10:12:20,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4223.496), np.float32(4316.416), np.float32(4289.428), np.float32(4509.6733), np.float32(3891.3276), np.float32(4254.467), np.float32(3866.13), np.float32(4051.1746), np.float32(3844.578), np.float32(4205.991)]
2025-09-14 10:12:20,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:12:20,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 43 minutes, 5 seconds)
2025-09-14 10:14:29,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:14:34,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4466.83057 ± 161.793
2025-09-14 10:14:34,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4555.575), np.float32(4496.5474), np.float32(4642.8145), np.float32(4213.232), np.float32(4359.923), np.float32(4628.7817), np.float32(4621.9077), np.float32(4595.4243), np.float32(4216.0073), np.float32(4338.091)]
2025-09-14 10:14:34,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:14:34,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4466.83) for latency 6
2025-09-14 10:14:34,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 38 minutes, 25 seconds)
2025-09-14 10:16:43,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:16:48,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4478.98096 ± 226.519
2025-09-14 10:16:48,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4787.244), np.float32(4374.808), np.float32(4165.8843), np.float32(4350.9697), np.float32(4120.794), np.float32(4396.945), np.float32(4495.6753), np.float32(4629.526), np.float32(4782.927), np.float32(4685.0405)]
2025-09-14 10:16:48,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:16:48,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4478.98) for latency 6
2025-09-14 10:16:48,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 36 minutes, 8 seconds)
2025-09-14 10:18:57,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:19:02,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4448.26074 ± 227.996
2025-09-14 10:19:02,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4581.9897), np.float32(4676.2017), np.float32(4812.841), np.float32(4517.0273), np.float32(4347.153), np.float32(4117.5063), np.float32(4194.478), np.float32(4566.6753), np.float32(4544.5356), np.float32(4124.201)]
2025-09-14 10:19:02,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:19:02,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 33 minutes, 57 seconds)
2025-09-14 10:21:11,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:21:16,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4412.29150 ± 244.704
2025-09-14 10:21:16,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4662.3696), np.float32(4444.448), np.float32(4392.2124), np.float32(4500.001), np.float32(3748.4258), np.float32(4414.3506), np.float32(4701.561), np.float32(4384.612), np.float32(4456.15), np.float32(4418.785)]
2025-09-14 10:21:16,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:21:16,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 31 minutes, 44 seconds)
2025-09-14 10:23:24,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:23:30,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4320.90918 ± 246.540
2025-09-14 10:23:30,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4302.0293), np.float32(4434.998), np.float32(4220.201), np.float32(3904.992), np.float32(3871.1172), np.float32(4369.5283), np.float32(4363.5103), np.float32(4547.3193), np.float32(4584.735), np.float32(4610.665)]
2025-09-14 10:23:30,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:23:30,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 29 minutes, 30 seconds)
2025-09-14 10:25:38,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:25:44,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4360.50098 ± 184.112
2025-09-14 10:25:44,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4438.6685), np.float32(4187.1606), np.float32(4171.8022), np.float32(4458.2524), np.float32(4393.7324), np.float32(4701.1987), np.float32(4150.073), np.float32(4245.501), np.float32(4237.6084), np.float32(4621.0103)]
2025-09-14 10:25:44,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:25:44,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 27 minutes, 13 seconds)
2025-09-14 10:27:52,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:27:57,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4407.54004 ± 181.281
2025-09-14 10:27:57,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4585.321), np.float32(4423.595), np.float32(4472.1724), np.float32(4244.616), np.float32(4551.2227), np.float32(4048.3262), np.float32(4710.552), np.float32(4284.464), np.float32(4315.917), np.float32(4439.2114)]
2025-09-14 10:27:57,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:27:57,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 24 minutes, 59 seconds)
2025-09-14 10:30:06,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:30:11,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4514.76074 ± 147.470
2025-09-14 10:30:11,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4431.4067), np.float32(4437.7144), np.float32(4631.3857), np.float32(4617.994), np.float32(4764.669), np.float32(4679.183), np.float32(4237.195), np.float32(4458.0337), np.float32(4465.657), np.float32(4424.375)]
2025-09-14 10:30:11,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:30:11,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4514.76) for latency 6
2025-09-14 10:30:11,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 22 minutes, 44 seconds)
2025-09-14 10:32:20,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:32:25,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4508.40918 ± 206.756
2025-09-14 10:32:25,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4546.0596), np.float32(4646.2686), np.float32(4113.231), np.float32(4738.7207), np.float32(4384.161), np.float32(4342.1255), np.float32(4280.534), np.float32(4715.3276), np.float32(4744.75), np.float32(4572.9155)]
2025-09-14 10:32:25,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:32:25,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 20 minutes, 32 seconds)
2025-09-14 10:34:34,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:34:39,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4464.26562 ± 186.602
2025-09-14 10:34:39,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4231.684), np.float32(4375.5054), np.float32(4257.6074), np.float32(4174.6787), np.float32(4559.926), np.float32(4474.501), np.float32(4682.603), np.float32(4733.0005), np.float32(4628.0137), np.float32(4525.131)]
2025-09-14 10:34:39,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:34:39,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 18 minutes, 20 seconds)
2025-09-14 10:36:48,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:36:53,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4492.05371 ± 116.205
2025-09-14 10:36:53,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4390.3623), np.float32(4367.137), np.float32(4700.9985), np.float32(4578.251), np.float32(4478.6694), np.float32(4595.842), np.float32(4439.918), np.float32(4330.7505), np.float32(4427.5723), np.float32(4611.037)]
2025-09-14 10:36:53,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:36:53,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 16 minutes, 9 seconds)
2025-09-14 10:39:02,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:39:07,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4470.78613 ± 159.944
2025-09-14 10:39:07,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4726.0176), np.float32(4529.144), np.float32(4555.2393), np.float32(4529.869), np.float32(4166.136), np.float32(4680.021), np.float32(4440.3203), np.float32(4327.5513), np.float32(4398.468), np.float32(4355.096)]
2025-09-14 10:39:07,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:39:07,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 13 minutes, 57 seconds)
2025-09-14 10:41:16,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:41:21,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4465.90771 ± 331.601
2025-09-14 10:41:21,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4402.78), np.float32(4570.9175), np.float32(4479.119), np.float32(4695.513), np.float32(4707.613), np.float32(4673.453), np.float32(4582.377), np.float32(3518.1514), np.float32(4597.186), np.float32(4431.967)]
2025-09-14 10:41:21,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:41:21,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 11 minutes, 45 seconds)
2025-09-14 10:43:30,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:43:35,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4521.94873 ± 142.370
2025-09-14 10:43:35,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4552.875), np.float32(4336.97), np.float32(4662.59), np.float32(4408.1763), np.float32(4591.6885), np.float32(4654.922), np.float32(4450.8813), np.float32(4267.045), np.float32(4573.476), np.float32(4720.862)]
2025-09-14 10:43:35,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:43:35,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4521.95) for latency 6
2025-09-14 10:43:35,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 9 minutes, 32 seconds)
2025-09-14 10:45:44,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:45:49,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4640.59521 ± 103.368
2025-09-14 10:45:49,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4519.7617), np.float32(4554.8765), np.float32(4491.963), np.float32(4786.915), np.float32(4618.411), np.float32(4734.923), np.float32(4737.782), np.float32(4734.3823), np.float32(4535.0586), np.float32(4691.881)]
2025-09-14 10:45:49,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:45:49,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4640.60) for latency 6
2025-09-14 10:45:49,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 7 minutes, 18 seconds)
2025-09-14 10:47:58,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:48:04,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4232.31152 ± 633.962
2025-09-14 10:48:04,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4569.3936), np.float32(4346.081), np.float32(2569.5632), np.float32(4758.496), np.float32(3582.576), np.float32(4481.4775), np.float32(4469.8027), np.float32(4426.452), np.float32(4730.357), np.float32(4388.913)]
2025-09-14 10:48:04,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:48:04,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 5 minutes, 8 seconds)
2025-09-14 10:50:13,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:50:18,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4218.38086 ± 922.158
2025-09-14 10:50:18,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4712.358), np.float32(4502.829), np.float32(4711.5874), np.float32(4581.112), np.float32(4304.456), np.float32(4567.2515), np.float32(4219.557), np.float32(4642.044), np.float32(4452.014), np.float32(1490.598)]
2025-09-14 10:50:18,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:50:18,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 2 minutes, 58 seconds)
2025-09-14 10:52:27,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:52:32,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4571.19824 ± 137.893
2025-09-14 10:52:32,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4597.2534), np.float32(4739.5537), np.float32(4619.6733), np.float32(4744.6123), np.float32(4551.5337), np.float32(4322.2114), np.float32(4671.332), np.float32(4353.664), np.float32(4621.317), np.float32(4490.832)]
2025-09-14 10:52:32,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:52:32,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 47 seconds)
2025-09-14 10:54:41,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:54:46,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4459.78857 ± 103.722
2025-09-14 10:54:46,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4530.001), np.float32(4353.4014), np.float32(4533.7993), np.float32(4554.8223), np.float32(4418.1475), np.float32(4347.006), np.float32(4363.66), np.float32(4436.9062), np.float32(4676.236), np.float32(4383.9116)]
2025-09-14 10:54:46,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:54:46,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 58 minutes, 31 seconds)
2025-09-14 10:56:55,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:57:00,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4427.67334 ± 136.173
2025-09-14 10:57:00,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4389.9893), np.float32(4316.9497), np.float32(4585.16), np.float32(4538.9673), np.float32(4319.2993), np.float32(4230.6743), np.float32(4302.0947), np.float32(4672.7515), np.float32(4405.508), np.float32(4515.34)]
2025-09-14 10:57:00,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:57:00,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 56 minutes, 16 seconds)
2025-09-14 10:59:09,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:59:14,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4451.28809 ± 167.416
2025-09-14 10:59:14,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4530.094), np.float32(4756.3647), np.float32(4374.0366), np.float32(4292.6904), np.float32(4492.6187), np.float32(4281.08), np.float32(4648.586), np.float32(4373.671), np.float32(4202.131), np.float32(4561.6074)]
2025-09-14 10:59:14,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:59:14,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 54 minutes)
2025-09-14 11:01:23,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:01:28,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4134.30664 ± 182.779
2025-09-14 11:01:28,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4175.321), np.float32(4229.18), np.float32(4272.05), np.float32(4306.1606), np.float32(4277.4893), np.float32(3645.535), np.float32(4088.3904), np.float32(4040.6394), np.float32(4110.4214), np.float32(4197.88)]
2025-09-14 11:01:28,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:01:28,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 51 minutes, 43 seconds)
2025-09-14 11:03:37,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:03:42,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4525.09424 ± 185.052
2025-09-14 11:03:42,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4173.934), np.float32(4236.0107), np.float32(4758.7446), np.float32(4454.423), np.float32(4672.789), np.float32(4608.5195), np.float32(4639.185), np.float32(4590.9497), np.float32(4440.6064), np.float32(4675.78)]
2025-09-14 11:03:42,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:03:42,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 49 minutes, 25 seconds)
2025-09-14 11:05:51,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:05:56,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4501.53760 ± 221.171
2025-09-14 11:05:56,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4441.196), np.float32(4643.277), np.float32(4679.516), np.float32(4459.0117), np.float32(4729.367), np.float32(4240.8135), np.float32(4545.2144), np.float32(4064.7651), np.float32(4826.9233), np.float32(4385.293)]
2025-09-14 11:05:56,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:05:56,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 47 minutes, 11 seconds)
2025-09-14 11:08:05,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:08:10,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4670.60059 ± 140.835
2025-09-14 11:08:10,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4872.371), np.float32(4656.784), np.float32(4781.416), np.float32(4605.614), np.float32(4695.7915), np.float32(4625.458), np.float32(4866.932), np.float32(4395.3604), np.float32(4516.9404), np.float32(4689.342)]
2025-09-14 11:08:10,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:08:10,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4670.60) for latency 6
2025-09-14 11:08:10,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 44 minutes, 56 seconds)
2025-09-14 11:10:18,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:10:24,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4470.44434 ± 226.121
2025-09-14 11:10:24,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4321.1553), np.float32(4164.4497), np.float32(4646.17), np.float32(4698.9214), np.float32(4875.1123), np.float32(4320.848), np.float32(4145.696), np.float32(4544.3794), np.float32(4409.7456), np.float32(4577.963)]
2025-09-14 11:10:24,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:10:24,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 42 minutes, 40 seconds)
2025-09-14 11:12:32,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:12:38,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4403.86035 ± 175.031
2025-09-14 11:12:38,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4346.5566), np.float32(4582.732), np.float32(4237.705), np.float32(4632.355), np.float32(4504.017), np.float32(4330.898), np.float32(4408.565), np.float32(4219.8145), np.float32(4121.9873), np.float32(4653.978)]
2025-09-14 11:12:38,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:12:38,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 40 minutes, 24 seconds)
2025-09-14 11:14:46,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:14:52,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4203.91699 ± 762.541
2025-09-14 11:14:52,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4842.6587), np.float32(4382.508), np.float32(4467.849), np.float32(4440.089), np.float32(3984.7273), np.float32(4605.164), np.float32(1997.9288), np.float32(4506.3433), np.float32(4385.4746), np.float32(4426.43)]
2025-09-14 11:14:52,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:14:52,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 38 minutes, 12 seconds)
2025-09-14 11:17:00,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:17:06,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4385.64160 ± 155.672
2025-09-14 11:17:06,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4278.84), np.float32(4487.8833), np.float32(4295.5024), np.float32(4331.031), np.float32(4645.2744), np.float32(4410.401), np.float32(4232.6387), np.float32(4259.8223), np.float32(4243.294), np.float32(4671.7285)]
2025-09-14 11:17:06,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:17:06,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 35 minutes, 57 seconds)
2025-09-14 11:19:14,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:19:19,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4536.31787 ± 228.563
2025-09-14 11:19:19,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4695.0195), np.float32(4243.7495), np.float32(4509.847), np.float32(4998.6714), np.float32(4629.8164), np.float32(4190.7944), np.float32(4396.3975), np.float32(4448.472), np.float32(4739.0684), np.float32(4511.345)]
2025-09-14 11:19:19,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:19:19,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 33 minutes, 43 seconds)
2025-09-14 11:21:28,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:21:33,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4375.38135 ± 157.018
2025-09-14 11:21:33,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4295.9346), np.float32(4233.546), np.float32(4451.3394), np.float32(4239.915), np.float32(4471.765), np.float32(4246.7617), np.float32(4571.0625), np.float32(4306.278), np.float32(4702.8223), np.float32(4234.3896)]
2025-09-14 11:21:33,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:21:33,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 31 minutes, 29 seconds)
2025-09-14 11:23:42,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:23:47,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4562.42432 ± 145.747
2025-09-14 11:23:47,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4656.1807), np.float32(4160.6406), np.float32(4588.76), np.float32(4739.772), np.float32(4546.351), np.float32(4572.4717), np.float32(4538.214), np.float32(4595.6206), np.float32(4650.9937), np.float32(4575.24)]
2025-09-14 11:23:47,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:23:47,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 29 minutes, 15 seconds)
2025-09-14 11:25:56,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:26:01,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4658.83887 ± 155.817
2025-09-14 11:26:01,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4775.3784), np.float32(4500.852), np.float32(4595.84), np.float32(4633.004), np.float32(4480.9116), np.float32(4507.8154), np.float32(4751.384), np.float32(4618.9253), np.float32(4700.388), np.float32(5023.889)]
2025-09-14 11:26:01,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:26:01,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 27 minutes, 1 second)
2025-09-14 11:28:10,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:28:15,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4431.87793 ± 278.322
2025-09-14 11:28:15,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4122.459), np.float32(4246.182), np.float32(4670.2446), np.float32(5017.2705), np.float32(4366.6694), np.float32(4103.9785), np.float32(4160.749), np.float32(4639.656), np.float32(4431.43), np.float32(4560.1416)]
2025-09-14 11:28:15,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:28:15,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 24 minutes, 48 seconds)
2025-09-14 11:30:24,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:30:29,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4511.83936 ± 197.713
2025-09-14 11:30:29,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4182.1562), np.float32(4306.666), np.float32(4716.8667), np.float32(4524.9487), np.float32(4385.3203), np.float32(4454.7104), np.float32(4803.1514), np.float32(4527.2866), np.float32(4802.0273), np.float32(4415.261)]
2025-09-14 11:30:29,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:30:29,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 22 minutes, 36 seconds)
2025-09-14 11:32:38,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:32:43,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4439.01172 ± 412.415
2025-09-14 11:32:43,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4698.4883), np.float32(4512.05), np.float32(4734.728), np.float32(4573.424), np.float32(4628.5312), np.float32(4345.6245), np.float32(4632.162), np.float32(3243.2424), np.float32(4494.624), np.float32(4527.242)]
2025-09-14 11:32:43,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:32:43,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 20 minutes, 23 seconds)
2025-09-14 11:34:52,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:34:57,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4627.01709 ± 221.699
2025-09-14 11:34:57,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4716.1816), np.float32(4862.312), np.float32(4648.8955), np.float32(4263.683), np.float32(4633.089), np.float32(4288.344), np.float32(4848.1953), np.float32(4420.002), np.float32(4923.054), np.float32(4666.414)]
2025-09-14 11:34:57,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:34:57,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 18 minutes, 9 seconds)
2025-09-14 11:37:06,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:37:11,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4733.04639 ± 213.959
2025-09-14 11:37:11,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4701.626), np.float32(4817.3154), np.float32(4928.193), np.float32(4500.1094), np.float32(4710.9346), np.float32(4782.8677), np.float32(4739.0894), np.float32(4903.508), np.float32(5011.517), np.float32(4235.306)]
2025-09-14 11:37:11,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:37:11,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4733.05) for latency 6
2025-09-14 11:37:11,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 15 minutes, 55 seconds)
2025-09-14 11:39:19,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:39:25,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4620.45996 ± 173.951
2025-09-14 11:39:25,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4441.6807), np.float32(4674.101), np.float32(4540.9463), np.float32(4479.8433), np.float32(4811.384), np.float32(4824.0566), np.float32(4303.7983), np.float32(4779.759), np.float32(4547.5215), np.float32(4801.5103)]
2025-09-14 11:39:25,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:39:25,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 13 minutes, 39 seconds)
2025-09-14 11:41:33,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:41:39,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4567.01709 ± 227.371
2025-09-14 11:41:39,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4689.278), np.float32(4733.3096), np.float32(4839.3765), np.float32(4542.5913), np.float32(4288.19), np.float32(4726.6777), np.float32(4224.9883), np.float32(4376.987), np.float32(4885.8086), np.float32(4362.965)]
2025-09-14 11:41:39,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:41:39,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 11 minutes, 23 seconds)
2025-09-14 11:43:47,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:43:52,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4581.96631 ± 198.770
2025-09-14 11:43:52,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4711.015), np.float32(4363.211), np.float32(4629.3516), np.float32(4487.392), np.float32(4497.7544), np.float32(4794.5845), np.float32(4686.8687), np.float32(4912.501), np.float32(4201.608), np.float32(4535.377)]
2025-09-14 11:43:52,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:43:52,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 9 minutes, 7 seconds)
2025-09-14 11:46:01,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:46:06,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4572.79980 ± 144.423
2025-09-14 11:46:06,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4408.8735), np.float32(4763.788), np.float32(4761.831), np.float32(4560.08), np.float32(4694.5747), np.float32(4509.119), np.float32(4468.844), np.float32(4717.1787), np.float32(4340.517), np.float32(4503.1963)]
2025-09-14 11:46:06,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:46:06,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 6 minutes, 53 seconds)
2025-09-14 11:48:15,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:48:20,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4514.08057 ± 151.201
2025-09-14 11:48:20,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4702.1475), np.float32(4599.1177), np.float32(4429.1587), np.float32(4278.3745), np.float32(4234.739), np.float32(4567.9873), np.float32(4684.1694), np.float32(4555.052), np.float32(4614.9785), np.float32(4475.079)]
2025-09-14 11:48:20,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:48:20,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 4 minutes, 41 seconds)
2025-09-14 11:50:29,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:50:34,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4688.22412 ± 253.694
2025-09-14 11:50:34,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4923.326), np.float32(4645.4023), np.float32(4608.759), np.float32(3994.3987), np.float32(4799.2856), np.float32(4774.7427), np.float32(4940.452), np.float32(4668.519), np.float32(4805.9717), np.float32(4721.3813)]
2025-09-14 11:50:34,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:50:34,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 2 minutes, 28 seconds)
2025-09-14 11:52:43,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:52:48,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4548.20459 ± 186.872
2025-09-14 11:52:48,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4495.0044), np.float32(4512.927), np.float32(4445.034), np.float32(4273.024), np.float32(4596.6055), np.float32(4827.778), np.float32(4872.0513), np.float32(4675.892), np.float32(4455.8955), np.float32(4327.8354)]
2025-09-14 11:52:48,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:52:48,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 17 seconds)
2025-09-14 11:54:57,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:55:02,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4619.48438 ± 155.937
2025-09-14 11:55:02,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4616.176), np.float32(4505.9233), np.float32(4898.0967), np.float32(4614.2217), np.float32(4472.325), np.float32(4874.226), np.float32(4414.975), np.float32(4719.8467), np.float32(4565.709), np.float32(4513.342)]
2025-09-14 11:55:02,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:55:02,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 58 minutes, 3 seconds)
2025-09-14 11:57:11,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:57:16,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4464.75488 ± 126.545
2025-09-14 11:57:16,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4643.8276), np.float32(4572.058), np.float32(4612.4854), np.float32(4255.363), np.float32(4505.242), np.float32(4444.0957), np.float32(4500.3896), np.float32(4283.6953), np.float32(4345.0), np.float32(4485.3955)]
2025-09-14 11:57:16,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:57:16,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 55 minutes, 49 seconds)
2025-09-14 11:59:25,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:59:30,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4776.96777 ± 240.377
2025-09-14 11:59:30,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4878.5044), np.float32(4687.1885), np.float32(5188.954), np.float32(4294.5728), np.float32(4900.8433), np.float32(4720.416), np.float32(4834.0835), np.float32(5046.2495), np.float32(4666.647), np.float32(4552.2144)]
2025-09-14 11:59:30,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:59:30,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4776.97) for latency 6
2025-09-14 11:59:30,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 53 minutes, 34 seconds)
2025-09-14 12:01:38,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:01:44,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4543.33691 ± 158.951
2025-09-14 12:01:44,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4646.821), np.float32(4571.505), np.float32(4789.2285), np.float32(4537.846), np.float32(4444.0195), np.float32(4509.0815), np.float32(4774.6777), np.float32(4555.0444), np.float32(4298.768), np.float32(4306.3755)]
2025-09-14 12:01:44,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:01:44,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 51 minutes, 20 seconds)
2025-09-14 12:03:53,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:03:58,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4727.68066 ± 180.788
2025-09-14 12:03:58,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4466.42), np.float32(4617.859), np.float32(4923.116), np.float32(4951.7056), np.float32(4441.18), np.float32(4879.343), np.float32(4939.922), np.float32(4727.187), np.float32(4656.03), np.float32(4674.0415)]
2025-09-14 12:03:58,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:03:58,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 49 minutes, 6 seconds)
2025-09-14 12:06:07,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:06:12,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4691.74902 ± 209.098
2025-09-14 12:06:12,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4629.393), np.float32(4526.847), np.float32(4736.9624), np.float32(4805.819), np.float32(5030.8516), np.float32(5092.9023), np.float32(4503.352), np.float32(4550.6406), np.float32(4497.823), np.float32(4542.902)]
2025-09-14 12:06:12,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:06:12,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 46 minutes, 53 seconds)
2025-09-14 12:08:21,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:08:26,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4662.56543 ± 177.282
2025-09-14 12:08:26,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4871.738), np.float32(4417.519), np.float32(4386.5547), np.float32(4702.168), np.float32(4739.6), np.float32(4807.0767), np.float32(4863.1533), np.float32(4424.1704), np.float32(4657.176), np.float32(4756.497)]
2025-09-14 12:08:26,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:08:26,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 44 minutes, 41 seconds)
2025-09-14 12:10:35,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:10:40,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4485.83887 ± 487.630
2025-09-14 12:10:40,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4669.428), np.float32(4313.131), np.float32(4684.4966), np.float32(4715.6885), np.float32(4686.731), np.float32(4747.146), np.float32(4633.3867), np.float32(4952.7847), np.float32(3126.0105), np.float32(4329.582)]
2025-09-14 12:10:40,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:10:40,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 42 minutes, 27 seconds)
2025-09-14 12:12:49,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:12:54,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4842.65527 ± 189.130
2025-09-14 12:12:54,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4946.5063), np.float32(4484.8477), np.float32(4990.0454), np.float32(4950.5366), np.float32(5099.9033), np.float32(4860.001), np.float32(5031.5415), np.float32(4700.6147), np.float32(4755.8076), np.float32(4606.747)]
2025-09-14 12:12:54,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:12:54,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4842.66) for latency 6
2025-09-14 12:12:54,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 40 minutes, 13 seconds)
2025-09-14 12:15:03,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:15:08,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4451.86426 ± 111.813
2025-09-14 12:15:08,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4463.1655), np.float32(4239.0005), np.float32(4587.0625), np.float32(4537.9917), np.float32(4567.9316), np.float32(4306.614), np.float32(4518.363), np.float32(4362.7583), np.float32(4524.6343), np.float32(4411.1245)]
2025-09-14 12:15:08,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:15:08,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 37 minutes, 58 seconds)
2025-09-14 12:17:17,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:17:22,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4679.36377 ± 233.886
2025-09-14 12:17:22,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4842.255), np.float32(4511.838), np.float32(4916.9243), np.float32(4148.122), np.float32(4771.129), np.float32(4688.4487), np.float32(4515.2944), np.float32(4918.9536), np.float32(4576.7563), np.float32(4903.9087)]
2025-09-14 12:17:22,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:17:22,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 35 minutes, 44 seconds)
2025-09-14 12:19:31,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:19:36,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4832.15869 ± 140.623
2025-09-14 12:19:36,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4983.912), np.float32(5097.3896), np.float32(4690.9697), np.float32(4744.166), np.float32(4581.9546), np.float32(4759.328), np.float32(4864.528), np.float32(4910.539), np.float32(4818.9175), np.float32(4869.878)]
2025-09-14 12:19:36,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:19:36,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 33 minutes, 28 seconds)
2025-09-14 12:21:44,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:21:50,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4430.01465 ± 205.561
2025-09-14 12:21:50,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4692.814), np.float32(4551.536), np.float32(4247.1113), np.float32(4136.4736), np.float32(4532.235), np.float32(4359.2505), np.float32(4626.679), np.float32(4454.21), np.float32(4074.646), np.float32(4625.196)]
2025-09-14 12:21:50,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:21:50,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 31 minutes, 14 seconds)
2025-09-14 12:23:58,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:24:04,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4691.42773 ± 122.267
2025-09-14 12:24:04,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4640.9385), np.float32(4479.63), np.float32(4669.284), np.float32(4864.265), np.float32(4668.7207), np.float32(4544.9556), np.float32(4659.9946), np.float32(4897.4663), np.float32(4762.891), np.float32(4726.1333)]
2025-09-14 12:24:04,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:24:04,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 29 minutes)
2025-09-14 12:26:12,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:26:18,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4675.81689 ± 200.811
2025-09-14 12:26:18,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4846.5947), np.float32(4570.7173), np.float32(4233.9307), np.float32(4925.724), np.float32(4744.599), np.float32(4767.1846), np.float32(4425.8296), np.float32(4844.6074), np.float32(4691.1455), np.float32(4707.8345)]
2025-09-14 12:26:18,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:26:18,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 26 minutes, 47 seconds)
2025-09-14 12:28:27,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:28:32,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4619.34277 ± 214.319
2025-09-14 12:28:32,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4506.5264), np.float32(4766.045), np.float32(4716.976), np.float32(4851.7617), np.float32(4994.5117), np.float32(4736.1465), np.float32(4423.891), np.float32(4518.2563), np.float32(4373.274), np.float32(4306.038)]
2025-09-14 12:28:32,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:28:32,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 24 minutes, 33 seconds)
2025-09-14 12:30:41,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:30:46,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4396.59619 ± 775.321
2025-09-14 12:30:46,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4662.8364), np.float32(4648.0786), np.float32(4795.7227), np.float32(4326.706), np.float32(2117.0066), np.float32(4487.6396), np.float32(4724.223), np.float32(4657.402), np.float32(4616.4634), np.float32(4929.877)]
2025-09-14 12:30:46,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:30:46,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 22 minutes, 20 seconds)
2025-09-14 12:32:55,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:33:00,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4733.66309 ± 143.581
2025-09-14 12:33:00,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4799.6475), np.float32(4764.055), np.float32(4500.555), np.float32(4764.264), np.float32(4902.6436), np.float32(4881.327), np.float32(4596.508), np.float32(4644.958), np.float32(4556.3403), np.float32(4926.3276)]
2025-09-14 12:33:00,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:33:00,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 20 minutes, 6 seconds)
2025-09-14 12:35:10,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:35:15,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4666.29834 ± 219.455
2025-09-14 12:35:15,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4069.8215), np.float32(4868.024), np.float32(4783.7646), np.float32(4651.0796), np.float32(4660.1416), np.float32(4672.857), np.float32(4749.412), np.float32(4571.387), np.float32(4885.0454), np.float32(4751.455)]
2025-09-14 12:35:15,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:35:15,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 17 minutes, 53 seconds)
2025-09-14 12:37:24,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:37:29,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4675.67676 ± 171.629
2025-09-14 12:37:29,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4537.299), np.float32(4681.0405), np.float32(4952.132), np.float32(4556.7544), np.float32(4555.8003), np.float32(4821.6577), np.float32(4982.1094), np.float32(4568.9355), np.float32(4471.416), np.float32(4629.6196)]
2025-09-14 12:37:29,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:37:29,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 40 seconds)
2025-09-14 12:39:38,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:39:43,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4797.53809 ± 176.742
2025-09-14 12:39:43,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4524.2666), np.float32(4622.618), np.float32(5021.878), np.float32(4733.203), np.float32(4838.5264), np.float32(4760.39), np.float32(4983.6294), np.float32(4988.8247), np.float32(4943.415), np.float32(4558.633)]
2025-09-14 12:39:43,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:39:43,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 25 seconds)
2025-09-14 12:41:52,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:41:58,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4791.69043 ± 230.219
2025-09-14 12:41:58,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5011.989), np.float32(4990.4033), np.float32(4782.951), np.float32(4615.188), np.float32(5016.835), np.float32(4846.659), np.float32(4996.6636), np.float32(4545.7363), np.float32(4288.103), np.float32(4822.38)]
2025-09-14 12:41:58,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:41:58,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 11 seconds)
2025-09-14 12:44:07,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:44:12,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4747.94189 ± 284.590
2025-09-14 12:44:12,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4198.188), np.float32(4983.1987), np.float32(5019.1943), np.float32(4312.686), np.float32(5142.4546), np.float32(4884.979), np.float32(4635.401), np.float32(4782.199), np.float32(4779.7686), np.float32(4741.346)]
2025-09-14 12:44:12,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:44:12,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 57 seconds)
2025-09-14 12:46:21,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:46:26,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4350.72949 ± 213.842
2025-09-14 12:46:26,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4051.3396), np.float32(4499.6523), np.float32(4699.78), np.float32(4181.4404), np.float32(4546.851), np.float32(4388.509), np.float32(4418.503), np.float32(4213.042), np.float32(4016.5432), np.float32(4491.632)]
2025-09-14 12:46:26,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:46:26,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 42 seconds)
2025-09-14 12:48:35,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:48:40,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4767.49707 ± 219.187
2025-09-14 12:48:40,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5024.2603), np.float32(4721.195), np.float32(4494.733), np.float32(4848.803), np.float32(4747.797), np.float32(4347.001), np.float32(5054.9434), np.float32(4844.557), np.float32(4974.6895), np.float32(4616.993)]
2025-09-14 12:48:40,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:48:40,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 28 seconds)
2025-09-14 12:50:49,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:50:54,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4883.80664 ± 138.268
2025-09-14 12:50:54,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4614.086), np.float32(4804.2573), np.float32(4813.9043), np.float32(4793.494), np.float32(5081.072), np.float32(4903.074), np.float32(4958.617), np.float32(4827.6636), np.float32(5102.8394), np.float32(4939.054)]
2025-09-14 12:50:54,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:50:54,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4883.81) for latency 6
2025-09-14 12:50:54,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 14 seconds)
2025-09-14 12:53:00,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:53:05,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4869.99170 ± 165.941
2025-09-14 12:53:05,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5035.951), np.float32(4803.8477), np.float32(4930.9277), np.float32(4432.4062), np.float32(4811.5767), np.float32(5015.1455), np.float32(4870.843), np.float32(4922.69), np.float32(4861.871), np.float32(5014.6553)]
2025-09-14 12:53:05,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:53:05,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1251 [DEBUG]: Training session finished
