2025-09-10 06:30:36,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval/halfcheetah/bpql-noise_0.075-delay_3
2025-09-10 06:30:36,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval/halfcheetah/bpql-noise_0.075-delay_3
2025-09-10 06:30:36,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'3': <latency_env.delayed_mdp.ConstantDelay object at 0x7bff54167ec0>}
2025-09-10 06:30:36,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-10 06:30:36,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-10 06:30:36,678 baseline-bpql-noisepromille75-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=35, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-10 06:30:36,679 baseline-bpql-noisepromille75-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-10 06:30:37,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-10 06:30:37,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-10 06:33:14,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 06:33:30,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: -402.63959 ± 89.781
2025-09-10 06:33:30,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-233.63914), np.float32(-512.15063), np.float32(-393.08295), np.float32(-525.40283), np.float32(-456.57043), np.float32(-434.85306), np.float32(-391.27197), np.float32(-261.15405), np.float32(-428.0184), np.float32(-390.2525)]
2025-09-10 06:33:30,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 06:33:30,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (-402.64) for latency 3
2025-09-10 06:33:30,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 45 minutes, 45 seconds)
2025-09-10 06:36:18,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 06:36:34,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 102.73269 ± 139.082
2025-09-10 06:36:34,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(253.79465), np.float32(-45.040325), np.float32(243.37279), np.float32(-59.21676), np.float32(209.3732), np.float32(134.36833), np.float32(-143.23514), np.float32(18.81864), np.float32(186.48102), np.float32(228.6105)]
2025-09-10 06:36:34,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 06:36:34,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (102.73) for latency 3
2025-09-10 06:36:34,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 51 minutes, 31 seconds)
2025-09-10 06:39:21,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 06:39:37,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 847.42126 ± 473.970
2025-09-10 06:39:37,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1485.4182), np.float32(706.0042), np.float32(781.83307), np.float32(169.40004), np.float32(814.9952), np.float32(1569.2211), np.float32(298.99963), np.float32(1433.9886), np.float32(798.3756), np.float32(415.97662)]
2025-09-10 06:39:37,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 06:39:37,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (847.42) for latency 3
2025-09-10 06:39:37,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 51 minutes, 11 seconds)
2025-09-10 06:42:25,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 06:42:41,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1744.91577 ± 241.174
2025-09-10 06:42:41,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1815.8755), np.float32(1594.2478), np.float32(1943.5232), np.float32(1953.7252), np.float32(1885.2334), np.float32(1861.9597), np.float32(1774.6409), np.float32(1725.5768), np.float32(1807.0034), np.float32(1087.3708)]
2025-09-10 06:42:41,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 06:42:41,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1744.92) for latency 3
2025-09-10 06:42:41,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 49 minutes, 37 seconds)
2025-09-10 06:45:28,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 06:45:45,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1318.12683 ± 629.242
2025-09-10 06:45:45,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1923.1985), np.float32(792.0435), np.float32(2125.383), np.float32(1907.1213), np.float32(1652.0044), np.float32(662.78046), np.float32(520.2944), np.float32(526.44507), np.float32(1078.3873), np.float32(1993.6104)]
2025-09-10 06:45:45,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 06:45:45,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 47 minutes, 28 seconds)
2025-09-10 06:48:33,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 06:48:49,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2026.06409 ± 943.413
2025-09-10 06:48:49,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(43.915085), np.float32(2534.7615), np.float32(2650.906), np.float32(2518.2214), np.float32(2611.7263), np.float32(2661.0151), np.float32(2211.628), np.float32(2287.1565), np.float32(2459.7065), np.float32(281.60507)]
2025-09-10 06:48:49,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 06:48:49,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2026.06) for latency 3
2025-09-10 06:48:49,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 47 minutes, 52 seconds)
2025-09-10 06:51:37,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 06:51:53,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2704.35107 ± 889.936
2025-09-10 06:51:53,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3259.3132), np.float32(2984.7314), np.float32(77.123116), np.float32(3114.622), np.float32(3115.9414), np.float32(2878.5022), np.float32(3062.4727), np.float32(2943.901), np.float32(2973.165), np.float32(2633.7395)]
2025-09-10 06:51:53,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 06:51:53,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2704.35) for latency 3
2025-09-10 06:51:53,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 44 minutes, 53 seconds)
2025-09-10 06:54:41,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 06:54:57,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3349.67773 ± 202.973
2025-09-10 06:54:57,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3080.5842), np.float32(3184.8254), np.float32(3726.4792), np.float32(3441.0308), np.float32(3570.798), np.float32(3364.0205), np.float32(3532.7603), np.float32(3112.7473), np.float32(3238.5576), np.float32(3244.9724)]
2025-09-10 06:54:57,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 06:54:57,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3349.68) for latency 3
2025-09-10 06:54:57,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 42 minutes, 5 seconds)
2025-09-10 06:57:45,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 06:58:01,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3856.62256 ± 266.691
2025-09-10 06:58:01,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4273.8345), np.float32(3684.6687), np.float32(3831.9954), np.float32(4271.126), np.float32(3864.3582), np.float32(3683.387), np.float32(3536.916), np.float32(4063.5786), np.float32(3897.7188), np.float32(3458.643)]
2025-09-10 06:58:01,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 06:58:01,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3856.62) for latency 3
2025-09-10 06:58:01,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 39 minutes, 5 seconds)
2025-09-10 07:00:49,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 07:01:05,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4023.06128 ± 191.992
2025-09-10 07:01:05,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3984.9421), np.float32(3885.54), np.float32(3976.656), np.float32(3652.6428), np.float32(4207.672), np.float32(3830.832), np.float32(4336.47), np.float32(4202.0317), np.float32(4106.862), np.float32(4046.966)]
2025-09-10 07:01:05,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 07:01:05,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4023.06) for latency 3
2025-09-10 07:01:05,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 36 minutes, 8 seconds)
2025-09-10 07:03:53,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 07:04:09,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4324.98633 ± 232.666
2025-09-10 07:04:09,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4212.427), np.float32(4242.859), np.float32(4260.4355), np.float32(4634.7407), np.float32(4365.7515), np.float32(4589.9346), np.float32(4153.0586), np.float32(4047.3928), np.float32(4025.9622), np.float32(4717.3057)]
2025-09-10 07:04:09,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 07:04:09,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4324.99) for latency 3
2025-09-10 07:04:09,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 33 minutes, 1 second)
2025-09-10 07:06:57,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 07:07:13,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4320.34473 ± 666.239
2025-09-10 07:07:13,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2359.5017), np.float32(4309.424), np.float32(4353.99), np.float32(4670.801), np.float32(4717.1025), np.float32(4496.272), np.float32(4603.3076), np.float32(4474.839), np.float32(4682.249), np.float32(4535.962)]
2025-09-10 07:07:13,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 07:07:13,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 29 minutes, 58 seconds)
2025-09-10 07:10:01,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 07:10:18,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4533.35547 ± 270.099
2025-09-10 07:10:18,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4394.574), np.float32(4835.2563), np.float32(4571.671), np.float32(4789.8965), np.float32(3849.3071), np.float32(4516.5283), np.float32(4534.556), np.float32(4817.347), np.float32(4479.835), np.float32(4544.5825)]
2025-09-10 07:10:18,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 07:10:18,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4533.36) for latency 3
2025-09-10 07:10:18,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 26 minutes, 54 seconds)
2025-09-10 07:13:06,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 07:13:22,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4723.05566 ± 114.594
2025-09-10 07:13:22,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4670.4595), np.float32(4794.931), np.float32(4750.384), np.float32(4708.8135), np.float32(4801.4033), np.float32(4817.4434), np.float32(4594.83), np.float32(4458.012), np.float32(4778.4497), np.float32(4855.831)]
2025-09-10 07:13:22,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 07:13:22,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4723.06) for latency 3
2025-09-10 07:13:22,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 24 minutes, 1 second)
2025-09-10 07:16:10,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 07:16:27,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4586.66016 ± 1192.819
2025-09-10 07:16:27,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1015.0904), np.float32(5116.0137), np.float32(5022.253), np.float32(5011.067), np.float32(5012.9697), np.float32(4962.399), np.float32(4951.617), np.float32(4988.8213), np.float32(4983.2754), np.float32(4803.0933)]
2025-09-10 07:16:27,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 07:16:27,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 21 minutes, 2 seconds)
2025-09-10 07:19:15,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 07:19:31,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4504.42627 ± 1473.868
2025-09-10 07:19:31,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5438.5234), np.float32(5056.478), np.float32(5066.203), np.float32(2327.0667), np.float32(5344.1025), np.float32(5295.0327), np.float32(5159.845), np.float32(5244.811), np.float32(5175.7607), np.float32(936.4375)]
2025-09-10 07:19:31,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 07:19:31,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 18 minutes, 2 seconds)
2025-09-10 07:22:19,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 07:22:35,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 5176.85986 ± 150.482
2025-09-10 07:22:35,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4892.6816), np.float32(5218.583), np.float32(5274.104), np.float32(5256.712), np.float32(5496.095), np.float32(5145.9624), np.float32(5181.3535), np.float32(5061.565), np.float32(5067.6367), np.float32(5173.9014)]
2025-09-10 07:22:35,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 07:22:35,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (5176.86) for latency 3
2025-09-10 07:22:35,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 14 minutes, 59 seconds)
2025-09-10 07:25:23,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 07:25:39,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 5162.34863 ± 1071.521
2025-09-10 07:25:39,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5344.1973), np.float32(5492.3843), np.float32(5621.4517), np.float32(5528.1997), np.float32(5630.696), np.float32(5470.8975), np.float32(1959.2517), np.float32(5398.1885), np.float32(5607.9727), np.float32(5570.2466)]
2025-09-10 07:25:39,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 07:25:39,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 11 minutes, 58 seconds)
2025-09-10 07:28:28,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 07:28:44,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4666.86914 ± 1082.685
2025-09-10 07:28:44,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3136.9377), np.float32(5211.8887), np.float32(5371.6743), np.float32(5234.115), np.float32(5448.646), np.float32(5474.9395), np.float32(3853.738), np.float32(2318.449), np.float32(5296.0273), np.float32(5322.275)]
2025-09-10 07:28:44,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 07:28:44,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 8 minutes, 53 seconds)
2025-09-10 07:31:32,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 07:31:48,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 5709.38916 ± 133.033
2025-09-10 07:31:48,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5877.1807), np.float32(5630.0024), np.float32(5676.2856), np.float32(5806.964), np.float32(5809.962), np.float32(5497.6274), np.float32(5593.635), np.float32(5920.4067), np.float32(5571.485), np.float32(5710.3457)]
2025-09-10 07:31:48,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 07:31:48,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (5709.39) for latency 3
2025-09-10 07:31:48,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 5 minutes, 43 seconds)
2025-09-10 07:34:36,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 07:34:52,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 5599.41553 ± 458.093
2025-09-10 07:34:52,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5558.919), np.float32(6056.3066), np.float32(5455.91), np.float32(5700.265), np.float32(5838.0923), np.float32(5933.3477), np.float32(4357.1235), np.float32(5423.234), np.float32(5796.223), np.float32(5874.736)]
2025-09-10 07:34:52,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 07:34:52,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 2 minutes, 33 seconds)
2025-09-10 07:37:41,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 07:37:57,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 5985.70361 ± 241.752
2025-09-10 07:37:57,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5595.447), np.float32(6251.3037), np.float32(5811.394), np.float32(5946.8467), np.float32(6111.2944), np.float32(6230.71), np.float32(5585.763), np.float32(6288.8315), np.float32(6015.225), np.float32(6020.221)]
2025-09-10 07:37:57,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 07:37:57,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (5985.70) for latency 3
2025-09-10 07:37:57,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 59 minutes, 43 seconds)
2025-09-10 07:40:45,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 07:41:01,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6080.45850 ± 94.922
2025-09-10 07:41:01,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6005.5703), np.float32(6170.902), np.float32(5976.7935), np.float32(6234.4727), np.float32(6104.155), np.float32(6081.1465), np.float32(5959.209), np.float32(6213.1084), np.float32(6073.3022), np.float32(5985.9287)]
2025-09-10 07:41:01,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 07:41:01,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (6080.46) for latency 3
2025-09-10 07:41:01,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 56 minutes, 39 seconds)
2025-09-10 07:43:49,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 07:44:06,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 5493.87988 ± 97.525
2025-09-10 07:44:06,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5489.455), np.float32(5375.9395), np.float32(5661.1157), np.float32(5527.525), np.float32(5480.531), np.float32(5537.683), np.float32(5580.6104), np.float32(5349.8584), np.float32(5569.001), np.float32(5367.082)]
2025-09-10 07:44:06,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 07:44:06,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 53 minutes, 29 seconds)
2025-09-10 07:46:54,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 07:47:10,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6133.07764 ± 193.332
2025-09-10 07:47:10,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6302.6606), np.float32(6337.7646), np.float32(6304.1104), np.float32(6019.921), np.float32(6332.2446), np.float32(5919.88), np.float32(6065.0376), np.float32(6298.8125), np.float32(5948.7124), np.float32(5801.6357)]
2025-09-10 07:47:10,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 07:47:10,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (6133.08) for latency 3
2025-09-10 07:47:10,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 50 minutes, 28 seconds)
2025-09-10 07:49:58,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 07:50:14,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6118.34131 ± 70.680
2025-09-10 07:50:14,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6118.2935), np.float32(6133.2446), np.float32(6058.655), np.float32(6148.78), np.float32(6133.7236), np.float32(6189.0596), np.float32(6023.657), np.float32(6025.0522), np.float32(6088.21), np.float32(6264.7373)]
2025-09-10 07:50:14,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 07:50:14,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 47 minutes, 26 seconds)
2025-09-10 07:53:02,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 07:53:19,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6252.29834 ± 133.510
2025-09-10 07:53:19,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6356.241), np.float32(6108.725), np.float32(6052.295), np.float32(6452.11), np.float32(6147.792), np.float32(6245.661), np.float32(6204.9634), np.float32(6481.3447), np.float32(6256.487), np.float32(6217.3633)]
2025-09-10 07:53:19,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 07:53:19,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (6252.30) for latency 3
2025-09-10 07:53:19,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 44 minutes, 17 seconds)
2025-09-10 07:56:07,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 07:56:23,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6371.03809 ± 106.732
2025-09-10 07:56:23,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6479.023), np.float32(6249.34), np.float32(6265.561), np.float32(6274.828), np.float32(6424.9473), np.float32(6357.569), np.float32(6275.304), np.float32(6603.108), np.float32(6380.1416), np.float32(6400.5596)]
2025-09-10 07:56:23,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 07:56:23,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (6371.04) for latency 3
2025-09-10 07:56:23,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 41 minutes, 14 seconds)
2025-09-10 07:59:11,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 07:59:27,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 5975.12500 ± 86.240
2025-09-10 07:59:27,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6129.191), np.float32(6025.401), np.float32(5804.941), np.float32(6018.6816), np.float32(5909.374), np.float32(6005.656), np.float32(5924.8506), np.float32(6002.531), np.float32(6030.59), np.float32(5900.0337)]
2025-09-10 07:59:27,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 07:59:27,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 38 minutes, 1 second)
2025-09-10 08:02:14,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 08:02:30,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6420.48242 ± 105.975
2025-09-10 08:02:30,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6414.4175), np.float32(6332.9653), np.float32(6504.478), np.float32(6303.3706), np.float32(6349.403), np.float32(6419.696), np.float32(6431.399), np.float32(6284.6885), np.float32(6645.1353), np.float32(6519.268)]
2025-09-10 08:02:30,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 08:02:30,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (6420.48) for latency 3
2025-09-10 08:02:30,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 34 minutes, 45 seconds)
2025-09-10 08:05:18,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 08:05:34,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6111.06885 ± 168.449
2025-09-10 08:05:34,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5966.765), np.float32(6033.277), np.float32(6047.7847), np.float32(5892.6724), np.float32(6118.525), np.float32(6128.242), np.float32(6339.9907), np.float32(5898.733), np.float32(6329.0684), np.float32(6355.6235)]
2025-09-10 08:05:34,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 08:05:34,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 31 minutes, 33 seconds)
2025-09-10 08:08:21,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 08:08:37,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6384.58984 ± 161.166
2025-09-10 08:08:37,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6560.6133), np.float32(6548.7383), np.float32(6254.0083), np.float32(6336.3115), np.float32(6452.3687), np.float32(6478.4697), np.float32(6031.731), np.float32(6223.1177), np.float32(6488.4424), np.float32(6472.0996)]
2025-09-10 08:08:37,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 08:08:37,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 28 minutes, 12 seconds)
2025-09-10 08:11:25,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 08:11:41,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6547.95654 ± 128.095
2025-09-10 08:11:41,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6683.148), np.float32(6539.881), np.float32(6563.4185), np.float32(6657.582), np.float32(6699.108), np.float32(6401.786), np.float32(6617.5537), np.float32(6618.438), np.float32(6332.688), np.float32(6365.9644)]
2025-09-10 08:11:41,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 08:11:41,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (6547.96) for latency 3
2025-09-10 08:11:41,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 24 minutes, 55 seconds)
2025-09-10 08:14:28,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 08:14:44,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6597.97803 ± 73.026
2025-09-10 08:14:44,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6555.1), np.float32(6631.2627), np.float32(6635.819), np.float32(6601.8193), np.float32(6536.38), np.float32(6492.1826), np.float32(6632.9414), np.float32(6540.0767), np.float32(6585.3027), np.float32(6768.8955)]
2025-09-10 08:14:44,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 08:14:44,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (6597.98) for latency 3
2025-09-10 08:14:44,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 21 minutes, 52 seconds)
2025-09-10 08:17:31,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 08:17:48,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6659.48438 ± 95.484
2025-09-10 08:17:48,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6586.65), np.float32(6499.942), np.float32(6646.916), np.float32(6701.0117), np.float32(6771.2583), np.float32(6629.3364), np.float32(6617.5005), np.float32(6765.729), np.float32(6564.917), np.float32(6811.586)]
2025-09-10 08:17:48,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 08:17:48,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (6659.48) for latency 3
2025-09-10 08:17:48,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 18 minutes, 47 seconds)
2025-09-10 08:20:35,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 08:20:51,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6141.60938 ± 228.214
2025-09-10 08:20:51,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6233.5957), np.float32(6304.103), np.float32(6136.3003), np.float32(6382.5493), np.float32(5889.502), np.float32(6230.684), np.float32(5626.5654), np.float32(5974.478), np.float32(6311.125), np.float32(6327.19)]
2025-09-10 08:20:51,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 08:20:51,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 15 minutes, 42 seconds)
2025-09-10 08:23:39,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 08:23:56,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6515.48535 ± 111.174
2025-09-10 08:23:56,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6442.355), np.float32(6491.7495), np.float32(6395.7695), np.float32(6619.475), np.float32(6599.7524), np.float32(6664.3506), np.float32(6671.345), np.float32(6518.265), np.float32(6374.104), np.float32(6377.688)]
2025-09-10 08:23:56,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 08:23:56,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 12 minutes, 50 seconds)
2025-09-10 08:26:43,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 08:26:59,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6632.10693 ± 196.295
2025-09-10 08:26:59,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6343.723), np.float32(6235.1206), np.float32(6801.3403), np.float32(6559.9927), np.float32(6790.4907), np.float32(6795.4033), np.float32(6735.029), np.float32(6803.553), np.float32(6725.539), np.float32(6530.88)]
2025-09-10 08:26:59,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 08:26:59,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 9 minutes, 47 seconds)
2025-09-10 08:29:47,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 08:30:03,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6645.77881 ± 106.234
2025-09-10 08:30:03,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6618.5947), np.float32(6877.6445), np.float32(6576.436), np.float32(6505.0547), np.float32(6618.3022), np.float32(6603.7993), np.float32(6619.2964), np.float32(6752.836), np.float32(6740.5024), np.float32(6545.317)]
2025-09-10 08:30:03,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 08:30:03,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 6 minutes, 44 seconds)
2025-09-10 08:32:50,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 08:33:06,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6546.55371 ± 102.970
2025-09-10 08:33:06,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6551.947), np.float32(6531.2153), np.float32(6415.6484), np.float32(6611.258), np.float32(6424.0366), np.float32(6389.2417), np.float32(6718.905), np.float32(6597.9644), np.float32(6570.257), np.float32(6655.0635)]
2025-09-10 08:33:06,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 08:33:06,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 3 minutes, 45 seconds)
2025-09-10 08:35:54,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 08:36:10,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6196.30176 ± 1958.944
2025-09-10 08:36:10,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6952.2056), np.float32(6934.5874), np.float32(6291.7007), np.float32(6701.5615), np.float32(6988.1025), np.float32(6880.723), np.float32(7231.196), np.float32(360.7537), np.float32(6905.5474), np.float32(6716.6396)]
2025-09-10 08:36:10,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 08:36:10,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 44 seconds)
2025-09-10 08:38:58,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 08:39:14,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6833.35791 ± 148.202
2025-09-10 08:39:14,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6636.994), np.float32(6915.8745), np.float32(6997.288), np.float32(6993.3057), np.float32(6850.311), np.float32(6642.397), np.float32(6967.2812), np.float32(6780.0444), np.float32(6605.5073), np.float32(6944.5713)]
2025-09-10 08:39:14,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 08:39:14,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (6833.36) for latency 3
2025-09-10 08:39:14,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 57 minutes, 38 seconds)
2025-09-10 08:42:02,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 08:42:18,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6748.07812 ± 129.010
2025-09-10 08:42:18,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6753.55), np.float32(6857.0796), np.float32(6442.911), np.float32(6932.1064), np.float32(6672.25), np.float32(6712.724), np.float32(6761.1987), np.float32(6726.898), np.float32(6733.6235), np.float32(6888.438)]
2025-09-10 08:42:18,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 08:42:18,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 54 minutes, 35 seconds)
2025-09-10 08:45:05,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 08:45:22,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6707.20410 ± 124.012
2025-09-10 08:45:22,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6956.0996), np.float32(6680.8203), np.float32(6613.136), np.float32(6588.429), np.float32(6583.353), np.float32(6645.547), np.float32(6857.754), np.float32(6585.836), np.float32(6801.7407), np.float32(6759.322)]
2025-09-10 08:45:22,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 08:45:22,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 51 minutes, 32 seconds)
2025-09-10 08:48:10,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 08:48:26,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6902.13428 ± 374.593
2025-09-10 08:48:26,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7034.5796), np.float32(5874.239), np.float32(7084.4214), np.float32(6995.408), np.float32(6770.755), np.float32(7030.701), np.float32(6872.5254), np.float32(6873.653), np.float32(7342.377), np.float32(7142.6855)]
2025-09-10 08:48:26,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 08:48:26,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (6902.13) for latency 3
2025-09-10 08:48:26,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 48 minutes, 32 seconds)
2025-09-10 08:51:14,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 08:51:30,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6763.62354 ± 383.880
2025-09-10 08:51:30,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7184.931), np.float32(6899.9478), np.float32(6996.6343), np.float32(5730.1816), np.float32(6649.7793), np.float32(6941.7583), np.float32(6940.9365), np.float32(6539.03), np.float32(6916.468), np.float32(6836.57)]
2025-09-10 08:51:30,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 08:51:30,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 45 minutes, 36 seconds)
2025-09-10 08:54:18,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 08:54:34,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6881.50488 ± 167.841
2025-09-10 08:54:34,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6580.722), np.float32(6766.4805), np.float32(6842.4204), np.float32(6786.0864), np.float32(6887.659), np.float32(7054.716), np.float32(6848.2666), np.float32(6900.4907), np.float32(7249.7144), np.float32(6898.4854)]
2025-09-10 08:54:34,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 08:54:34,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 42 minutes, 27 seconds)
2025-09-10 08:57:21,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 08:57:38,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 5448.66895 ± 2121.859
2025-09-10 08:57:38,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7118.601), np.float32(6400.959), np.float32(1777.2823), np.float32(6933.1943), np.float32(1424.9595), np.float32(6843.448), np.float32(3871.4268), np.float32(6756.543), np.float32(6307.632), np.float32(7052.644)]
2025-09-10 08:57:38,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 08:57:38,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 39 minutes, 23 seconds)
2025-09-10 09:00:25,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 09:00:41,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7082.08057 ± 152.047
2025-09-10 09:00:41,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7016.6704), np.float32(7071.001), np.float32(7272.502), np.float32(7016.4067), np.float32(7243.0156), np.float32(7099.4297), np.float32(6831.7007), np.float32(7324.948), np.float32(6887.871), np.float32(7057.2617)]
2025-09-10 09:00:41,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 09:00:41,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (7082.08) for latency 3
2025-09-10 09:00:41,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 36 minutes, 17 seconds)
2025-09-10 09:03:28,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 09:03:45,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7050.89990 ± 164.669
2025-09-10 09:03:45,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7068.0386), np.float32(7006.9136), np.float32(7000.2515), np.float32(7175.7065), np.float32(7255.887), np.float32(7172.2847), np.float32(7059.68), np.float32(7035.627), np.float32(7118.609), np.float32(6615.9966)]
2025-09-10 09:03:45,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 09:03:45,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 33 minutes, 8 seconds)
2025-09-10 09:06:32,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 09:06:48,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6509.70703 ± 1448.426
2025-09-10 09:06:48,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2182.006), np.float32(7079.994), np.float32(7004.5806), np.float32(7139.785), np.float32(7114.3555), np.float32(6901.845), np.float32(7167.749), np.float32(6771.6606), np.float32(6922.952), np.float32(6812.139)]
2025-09-10 09:06:48,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 09:06:48,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 29 minutes, 56 seconds)
2025-09-10 09:09:35,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 09:09:52,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6947.76953 ± 162.550
2025-09-10 09:09:52,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6942.1963), np.float32(7051.7754), np.float32(7092.221), np.float32(6912.649), np.float32(7016.0405), np.float32(6930.136), np.float32(6881.7505), np.float32(6509.8286), np.float32(7069.1562), np.float32(7071.941)]
2025-09-10 09:09:52,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 09:09:52,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 26 minutes, 50 seconds)
2025-09-10 09:12:39,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 09:12:56,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7135.06250 ± 100.540
2025-09-10 09:12:56,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7179.5435), np.float32(7150.373), np.float32(7334.8994), np.float32(7283.7764), np.float32(7067.814), np.float32(7019.526), np.float32(7066.04), np.float32(7030.663), np.float32(7077.9556), np.float32(7140.0312)]
2025-09-10 09:12:56,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 09:12:56,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (7135.06) for latency 3
2025-09-10 09:12:56,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 23 minutes, 49 seconds)
2025-09-10 09:15:43,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 09:16:00,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 5492.42285 ± 2241.091
2025-09-10 09:16:00,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6699.7046), np.float32(7046.459), np.float32(2114.761), np.float32(5270.807), np.float32(453.84097), np.float32(5242.7935), np.float32(6921.9316), np.float32(7008.927), np.float32(6932.482), np.float32(7232.5205)]
2025-09-10 09:16:00,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 09:16:00,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 20 minutes, 50 seconds)
2025-09-10 09:18:48,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 09:19:05,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6758.47266 ± 772.332
2025-09-10 09:19:05,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6989.7954), np.float32(7144.242), np.float32(6999.9653), np.float32(6875.871), np.float32(6971.1533), np.float32(7178.959), np.float32(7143.351), np.float32(6814.1865), np.float32(7001.777), np.float32(4465.4272)]
2025-09-10 09:19:05,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 09:19:05,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 18 minutes)
2025-09-10 09:21:52,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 09:22:09,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6933.56152 ± 793.240
2025-09-10 09:22:09,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7534.293), np.float32(7100.9863), np.float32(7169.351), np.float32(4595.3), np.float32(7071.9775), np.float32(7210.82), np.float32(7248.8774), np.float32(6936.431), np.float32(7185.8857), np.float32(7281.6963)]
2025-09-10 09:22:09,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 09:22:09,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 14 minutes, 59 seconds)
2025-09-10 09:24:56,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 09:25:12,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7009.01025 ± 104.571
2025-09-10 09:25:12,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7018.6245), np.float32(7097.342), np.float32(7216.244), np.float32(7067.143), np.float32(7089.1978), np.float32(6861.734), np.float32(6973.0938), np.float32(6918.737), np.float32(6963.457), np.float32(6884.53)]
2025-09-10 09:25:12,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 09:25:12,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 11 minutes, 57 seconds)
2025-09-10 09:27:59,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 09:28:16,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7145.27344 ± 109.983
2025-09-10 09:28:16,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7118.7573), np.float32(7272.6772), np.float32(7257.541), np.float32(7053.9), np.float32(7297.1074), np.float32(7149.287), np.float32(7204.6084), np.float32(7013.64), np.float32(6947.538), np.float32(7137.6826)]
2025-09-10 09:28:16,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 09:28:16,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (7145.27) for latency 3
2025-09-10 09:28:16,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 8 minutes, 48 seconds)
2025-09-10 09:31:03,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 09:31:19,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7200.66016 ± 140.172
2025-09-10 09:31:19,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7257.098), np.float32(7116.388), np.float32(6931.5), np.float32(7187.966), np.float32(7100.4805), np.float32(7184.231), np.float32(7403.335), np.float32(7166.8384), np.float32(7447.535), np.float32(7211.229)]
2025-09-10 09:31:19,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 09:31:19,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (7200.66) for latency 3
2025-09-10 09:31:19,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 5 minutes, 42 seconds)
2025-09-10 09:34:07,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 09:34:23,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7108.38818 ± 493.354
2025-09-10 09:34:23,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7235.6777), np.float32(7264.5254), np.float32(7241.037), np.float32(7256.1997), np.float32(7437.0527), np.float32(5648.515), np.float32(7374.2397), np.float32(7167.227), np.float32(7304.302), np.float32(7155.104)]
2025-09-10 09:34:23,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 09:34:23,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 2 minutes, 27 seconds)
2025-09-10 09:37:11,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 09:37:27,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6627.69434 ± 1556.018
2025-09-10 09:37:27,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7229.8564), np.float32(7250.0303), np.float32(7154.7383), np.float32(7017.416), np.float32(7200.1865), np.float32(7151.1763), np.float32(7044.378), np.float32(6983.2188), np.float32(1968.4733), np.float32(7277.4697)]
2025-09-10 09:37:27,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 09:37:27,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 59 minutes, 26 seconds)
2025-09-10 09:40:15,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 09:40:31,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7304.08740 ± 96.997
2025-09-10 09:40:31,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7403.9053), np.float32(7280.033), np.float32(7090.407), np.float32(7489.375), np.float32(7316.7075), np.float32(7302.5337), np.float32(7315.4863), np.float32(7266.1973), np.float32(7262.48), np.float32(7313.7524)]
2025-09-10 09:40:31,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 09:40:31,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (7304.09) for latency 3
2025-09-10 09:40:31,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 56 minutes, 22 seconds)
2025-09-10 09:43:19,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 09:43:35,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7296.41016 ± 137.110
2025-09-10 09:43:35,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7242.324), np.float32(7082.2993), np.float32(7341.1353), np.float32(7315.742), np.float32(7374.9883), np.float32(7127.36), np.float32(7512.629), np.float32(7483.043), np.float32(7327.8965), np.float32(7156.69)]
2025-09-10 09:43:35,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 09:43:35,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 53 minutes, 23 seconds)
2025-09-10 09:46:23,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 09:46:39,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7266.90527 ± 166.635
2025-09-10 09:46:39,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7234.999), np.float32(7147.2173), np.float32(7279.9), np.float32(7165.7324), np.float32(6980.6157), np.float32(7261.1704), np.float32(7509.8765), np.float32(7596.2036), np.float32(7275.302), np.float32(7218.038)]
2025-09-10 09:46:39,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 09:46:39,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 50 minutes, 21 seconds)
2025-09-10 09:49:27,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 09:49:43,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7326.58057 ± 167.341
2025-09-10 09:49:43,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7068.08), np.float32(7414.491), np.float32(7481.944), np.float32(7568.1436), np.float32(7029.364), np.float32(7214.324), np.float32(7335.079), np.float32(7466.3135), np.float32(7374.378), np.float32(7313.687)]
2025-09-10 09:49:43,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 09:49:43,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (7326.58) for latency 3
2025-09-10 09:49:43,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 47 minutes, 21 seconds)
2025-09-10 09:52:31,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 09:52:47,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7257.14160 ± 79.065
2025-09-10 09:52:47,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7175.0874), np.float32(7137.9375), np.float32(7385.358), np.float32(7241.316), np.float32(7286.134), np.float32(7181.1187), np.float32(7370.565), np.float32(7286.432), np.float32(7301.245), np.float32(7206.2207)]
2025-09-10 09:52:47,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 09:52:47,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 44 minutes, 13 seconds)
2025-09-10 09:55:34,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 09:55:51,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 6744.66699 ± 1532.801
2025-09-10 09:55:51,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7244.867), np.float32(7221.6836), np.float32(6988.747), np.float32(2156.3677), np.float32(7303.844), np.float32(7379.8174), np.float32(7237.4155), np.float32(7363.6553), np.float32(7266.559), np.float32(7283.7188)]
2025-09-10 09:55:51,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 09:55:51,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 41 minutes, 10 seconds)
2025-09-10 09:58:38,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 09:58:54,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7366.76318 ± 157.128
2025-09-10 09:58:54,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7689.0537), np.float32(7160.79), np.float32(7248.774), np.float32(7421.8745), np.float32(7236.501), np.float32(7269.066), np.float32(7292.9746), np.float32(7341.113), np.float32(7421.605), np.float32(7585.881)]
2025-09-10 09:58:54,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 09:58:54,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (7366.76) for latency 3
2025-09-10 09:58:54,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 38 minutes, 3 seconds)
2025-09-10 10:01:42,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 10:01:58,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7281.50879 ± 188.386
2025-09-10 10:01:58,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7061.0024), np.float32(7353.8374), np.float32(6829.186), np.float32(7410.851), np.float32(7466.402), np.float32(7254.8403), np.float32(7408.9917), np.float32(7443.974), np.float32(7323.8774), np.float32(7262.1216)]
2025-09-10 10:01:58,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 10:01:58,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 34 minutes, 57 seconds)
2025-09-10 10:04:46,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 10:05:02,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7205.37500 ± 161.289
2025-09-10 10:05:02,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7211.48), np.float32(7318.3584), np.float32(7035.652), np.float32(7098.9507), np.float32(7368.8696), np.float32(7402.2), np.float32(7117.5923), np.float32(6958.378), np.float32(7450.5415), np.float32(7091.7246)]
2025-09-10 10:05:02,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 10:05:02,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 31 minutes, 53 seconds)
2025-09-10 10:07:49,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 10:08:06,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7341.92188 ± 160.296
2025-09-10 10:08:06,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7235.0806), np.float32(7528.5903), np.float32(7249.8247), np.float32(7322.938), np.float32(7602.655), np.float32(7015.437), np.float32(7295.4), np.float32(7493.549), np.float32(7370.145), np.float32(7305.593)]
2025-09-10 10:08:06,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 10:08:06,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 28 minutes, 48 seconds)
2025-09-10 10:10:53,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 10:11:10,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7324.92969 ± 214.861
2025-09-10 10:11:10,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7481.4175), np.float32(7569.6685), np.float32(7203.662), np.float32(7283.7075), np.float32(7544.534), np.float32(7200.562), np.float32(7492.8203), np.float32(6806.4316), np.float32(7331.9634), np.float32(7334.5273)]
2025-09-10 10:11:10,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 10:11:10,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 25 minutes, 45 seconds)
2025-09-10 10:13:57,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 10:14:13,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7362.07031 ± 129.694
2025-09-10 10:14:13,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7326.7485), np.float32(7468.601), np.float32(7188.5957), np.float32(7364.795), np.float32(7368.917), np.float32(7291.3794), np.float32(7631.142), np.float32(7297.9336), np.float32(7488.075), np.float32(7194.516)]
2025-09-10 10:14:13,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 10:14:13,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 22 minutes, 43 seconds)
2025-09-10 10:17:01,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 10:17:17,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7238.78516 ± 126.894
2025-09-10 10:17:17,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7278.252), np.float32(7263.1304), np.float32(7270.6943), np.float32(7152.6484), np.float32(7078.981), np.float32(7427.095), np.float32(7110.945), np.float32(7360.191), np.float32(7393.0327), np.float32(7052.882)]
2025-09-10 10:17:17,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 10:17:17,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 19 minutes, 39 seconds)
2025-09-10 10:20:05,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 10:20:21,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7306.31934 ± 125.085
2025-09-10 10:20:21,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7287.991), np.float32(7162.439), np.float32(7480.935), np.float32(7459.269), np.float32(7167.974), np.float32(7454.4263), np.float32(7328.184), np.float32(7175.136), np.float32(7374.9453), np.float32(7171.9014)]
2025-09-10 10:20:21,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 10:20:21,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 16 minutes, 33 seconds)
2025-09-10 10:23:09,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 10:23:25,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7467.12988 ± 132.254
2025-09-10 10:23:25,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7241.5347), np.float32(7636.715), np.float32(7578.3247), np.float32(7434.436), np.float32(7562.046), np.float32(7270.9023), np.float32(7608.354), np.float32(7361.014), np.float32(7456.178), np.float32(7521.79)]
2025-09-10 10:23:25,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 10:23:25,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (7467.13) for latency 3
2025-09-10 10:23:25,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 13 minutes, 31 seconds)
2025-09-10 10:26:12,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 10:26:28,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7375.09131 ± 133.427
2025-09-10 10:26:28,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7384.5283), np.float32(7337.8135), np.float32(7470.0166), np.float32(7306.138), np.float32(7321.0215), np.float32(7708.9355), np.float32(7425.391), np.float32(7193.9546), np.float32(7335.1274), np.float32(7267.992)]
2025-09-10 10:26:28,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 10:26:28,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 10 minutes, 25 seconds)
2025-09-10 10:29:15,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 10:29:31,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7472.86865 ± 75.621
2025-09-10 10:29:31,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7398.0176), np.float32(7418.621), np.float32(7573.3726), np.float32(7408.7114), np.float32(7578.7407), np.float32(7550.825), np.float32(7363.968), np.float32(7533.6694), np.float32(7433.2666), np.float32(7469.491)]
2025-09-10 10:29:31,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 10:29:31,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (7472.87) for latency 3
2025-09-10 10:29:31,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 7 minutes, 19 seconds)
2025-09-10 10:32:19,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 10:32:35,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7574.54199 ± 86.156
2025-09-10 10:32:35,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7635.8765), np.float32(7532.39), np.float32(7512.1772), np.float32(7406.958), np.float32(7699.186), np.float32(7484.728), np.float32(7617.424), np.float32(7583.746), np.float32(7678.4204), np.float32(7594.5146)]
2025-09-10 10:32:35,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 10:32:35,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (7574.54) for latency 3
2025-09-10 10:32:35,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 4 minutes, 15 seconds)
2025-09-10 10:35:22,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 10:35:39,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7517.32715 ± 132.803
2025-09-10 10:35:39,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7390.132), np.float32(7555.189), np.float32(7578.273), np.float32(7846.075), np.float32(7501.6436), np.float32(7536.114), np.float32(7415.16), np.float32(7569.175), np.float32(7366.292), np.float32(7415.2163)]
2025-09-10 10:35:39,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 10:35:39,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 1 minute, 11 seconds)
2025-09-10 10:38:26,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 10:38:42,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7060.04297 ± 1443.866
2025-09-10 10:38:42,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7721.747), np.float32(7465.782), np.float32(7548.3525), np.float32(2754.0515), np.float32(7694.751), np.float32(7649.6196), np.float32(7709.379), np.float32(7410.234), np.float32(7197.331), np.float32(7449.1797)]
2025-09-10 10:38:42,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 10:38:42,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 58 minutes, 6 seconds)
2025-09-10 10:41:29,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 10:41:46,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7586.18359 ± 110.443
2025-09-10 10:41:46,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7545.0073), np.float32(7457.944), np.float32(7470.958), np.float32(7798.698), np.float32(7704.607), np.float32(7612.861), np.float32(7579.4814), np.float32(7473.7524), np.float32(7517.3135), np.float32(7701.208)]
2025-09-10 10:41:46,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 10:41:46,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (7586.18) for latency 3
2025-09-10 10:41:46,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 55 minutes, 3 seconds)
2025-09-10 10:44:33,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 10:44:50,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7402.83594 ± 110.977
2025-09-10 10:44:50,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7267.945), np.float32(7542.067), np.float32(7371.354), np.float32(7411.3154), np.float32(7325.53), np.float32(7570.6953), np.float32(7213.7153), np.float32(7369.4507), np.float32(7510.83), np.float32(7445.465)]
2025-09-10 10:44:50,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 10:44:50,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 52 minutes, 2 seconds)
2025-09-10 10:47:37,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 10:47:53,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7604.82178 ± 189.033
2025-09-10 10:47:53,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7400.6), np.float32(7298.4316), np.float32(7810.222), np.float32(7640.983), np.float32(7613.349), np.float32(7766.2773), np.float32(7355.239), np.float32(7801.3315), np.float32(7540.1177), np.float32(7821.6636)]
2025-09-10 10:47:53,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 10:47:53,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (7604.82) for latency 3
2025-09-10 10:47:53,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 48 minutes, 58 seconds)
2025-09-10 10:50:41,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 10:50:57,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7312.51709 ± 1181.058
2025-09-10 10:50:57,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7693.685), np.float32(7891.8), np.float32(7535.589), np.float32(7775.5854), np.float32(3810.1367), np.float32(7817.5625), np.float32(7274.108), np.float32(7625.744), np.float32(7894.036), np.float32(7806.923)]
2025-09-10 10:50:57,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 10:50:57,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 45 minutes, 54 seconds)
2025-09-10 10:53:44,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 10:54:01,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7579.76025 ± 113.966
2025-09-10 10:54:01,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7472.279), np.float32(7482.811), np.float32(7567.1245), np.float32(7604.0977), np.float32(7381.1562), np.float32(7734.806), np.float32(7744.548), np.float32(7590.6665), np.float32(7521.7715), np.float32(7698.333)]
2025-09-10 10:54:01,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 10:54:01,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 42 minutes, 51 seconds)
2025-09-10 10:56:48,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 10:57:04,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7525.82959 ± 84.804
2025-09-10 10:57:04,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7687.618), np.float32(7487.601), np.float32(7495.112), np.float32(7480.3843), np.float32(7567.2607), np.float32(7565.9736), np.float32(7579.801), np.float32(7573.7065), np.float32(7465.642), np.float32(7355.198)]
2025-09-10 10:57:04,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 10:57:04,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 39 minutes, 48 seconds)
2025-09-10 10:59:52,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 11:00:08,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7474.67285 ± 125.960
2025-09-10 11:00:08,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7368.8384), np.float32(7417.68), np.float32(7617.1187), np.float32(7539.6753), np.float32(7696.6855), np.float32(7358.0303), np.float32(7377.1323), np.float32(7287.644), np.float32(7501.6455), np.float32(7582.2705)]
2025-09-10 11:00:08,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 11:00:08,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 36 minutes, 44 seconds)
2025-09-10 11:02:56,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 11:03:12,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7718.16504 ± 186.314
2025-09-10 11:03:12,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7523.5327), np.float32(8161.6313), np.float32(7844.8853), np.float32(7748.7095), np.float32(7745.729), np.float32(7795.89), np.float32(7631.009), np.float32(7690.0483), np.float32(7488.8633), np.float32(7551.348)]
2025-09-10 11:03:12,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 11:03:12,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (7718.17) for latency 3
2025-09-10 11:03:12,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 33 minutes, 40 seconds)
2025-09-10 11:05:59,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 11:06:16,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7747.22363 ± 195.424
2025-09-10 11:06:16,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7949.678), np.float32(7780.9688), np.float32(7740.188), np.float32(7554.21), np.float32(7549.338), np.float32(7474.5825), np.float32(7656.3984), np.float32(8153.505), np.float32(7741.115), np.float32(7872.2476)]
2025-09-10 11:06:16,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 11:06:16,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (7747.22) for latency 3
2025-09-10 11:06:16,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 30 minutes, 37 seconds)
2025-09-10 11:09:03,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 11:09:20,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7586.17480 ± 105.342
2025-09-10 11:09:20,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7744.3604), np.float32(7670.27), np.float32(7527.371), np.float32(7470.38), np.float32(7588.292), np.float32(7385.324), np.float32(7537.224), np.float32(7632.973), np.float32(7588.725), np.float32(7716.827)]
2025-09-10 11:09:20,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 11:09:20,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 27 minutes, 34 seconds)
2025-09-10 11:12:07,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 11:12:23,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7733.86865 ± 87.717
2025-09-10 11:12:23,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7692.205), np.float32(7775.0693), np.float32(7709.3174), np.float32(7815.988), np.float32(7764.0054), np.float32(7782.172), np.float32(7683.703), np.float32(7555.216), np.float32(7670.6465), np.float32(7890.3657)]
2025-09-10 11:12:23,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 11:12:23,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 24 minutes, 30 seconds)
2025-09-10 11:15:11,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 11:15:27,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7469.41895 ± 128.876
2025-09-10 11:15:27,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7499.0703), np.float32(7401.4014), np.float32(7372.57), np.float32(7213.573), np.float32(7535.1963), np.float32(7603.0684), np.float32(7372.7256), np.float32(7472.684), np.float32(7694.319), np.float32(7529.5747)]
2025-09-10 11:15:27,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 11:15:27,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 21 minutes, 27 seconds)
2025-09-10 11:18:15,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 11:18:31,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7655.63428 ± 210.555
2025-09-10 11:18:31,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7565.907), np.float32(7516.661), np.float32(7594.846), np.float32(7863.076), np.float32(7841.2363), np.float32(7354.315), np.float32(7958.2666), np.float32(7779.038), np.float32(7777.224), np.float32(7305.772)]
2025-09-10 11:18:31,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 11:18:31,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 18 minutes, 23 seconds)
2025-09-10 11:21:19,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 11:21:35,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7093.11475 ± 1943.361
2025-09-10 11:21:35,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7969.4814), np.float32(7454.3604), np.float32(1288.805), np.float32(8036.4004), np.float32(7600.9404), np.float32(7698.19), np.float32(7764.3965), np.float32(7931.1807), np.float32(7664.614), np.float32(7522.7803)]
2025-09-10 11:21:35,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 11:21:36,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 15 minutes, 19 seconds)
2025-09-10 11:24:24,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 11:24:40,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7839.50684 ± 169.199
2025-09-10 11:24:40,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(8028.345), np.float32(7668.0566), np.float32(8184.8276), np.float32(8003.57), np.float32(7850.388), np.float32(7660.0747), np.float32(7661.4253), np.float32(7772.0786), np.float32(7774.824), np.float32(7791.473)]
2025-09-10 11:24:40,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 11:24:40,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (7839.51) for latency 3
2025-09-10 11:24:40,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 12 minutes, 16 seconds)
2025-09-10 11:27:27,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 11:27:44,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7661.92822 ± 147.902
2025-09-10 11:27:44,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7460.172), np.float32(7793.4697), np.float32(7638.701), np.float32(7823.0957), np.float32(7704.17), np.float32(7403.338), np.float32(7820.921), np.float32(7805.5703), np.float32(7647.3823), np.float32(7522.461)]
2025-09-10 11:27:44,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 11:27:44,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 9 minutes, 12 seconds)
2025-09-10 11:30:32,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 11:30:48,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7151.19141 ± 1920.467
2025-09-10 11:30:48,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7772.0425), np.float32(7842.9287), np.float32(7879.9834), np.float32(7673.0723), np.float32(1415.3503), np.float32(7917.57), np.float32(7575.6616), np.float32(7892.3315), np.float32(8109.3335), np.float32(7433.639)]
2025-09-10 11:30:48,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 11:30:48,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 8 seconds)
2025-09-10 11:33:36,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 11:33:52,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7819.45605 ± 100.372
2025-09-10 11:33:52,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7694.4707), np.float32(7782.264), np.float32(7701.129), np.float32(7869.335), np.float32(7751.139), np.float32(7969.393), np.float32(7856.256), np.float32(7739.878), np.float32(7831.598), np.float32(7999.094)]
2025-09-10 11:33:52,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 11:33:52,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 4 seconds)
2025-09-10 11:36:40,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-10 11:36:56,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 7382.12646 ± 1233.436
2025-09-10 11:36:56,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3720.3547), np.float32(7669.556), np.float32(7565.572), np.float32(7631.6074), np.float32(7750.472), np.float32(7817.8306), np.float32(8087.8584), np.float32(7576.731), np.float32(7999.368), np.float32(8001.9165)]
2025-09-10 11:36:56,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-10 11:36:56,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1251 [DEBUG]: Training session finished
