2025-09-14 16:33:58,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.200-delay_24
2025-09-14 16:33:58,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.200-delay_24
2025-09-14 16:33:58,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'24': <latency_env.delayed_mdp.ConstantDelay object at 0x7fb76698fc20>}
2025-09-14 16:33:58,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 16:33:58,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 16:33:58,753 baseline-bpql-noisepromille200-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=161, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 16:33:58,754 baseline-bpql-noisepromille200-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 16:34:00,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 16:34:00,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 16:36:11,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:36:20,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -375.81396 ± 63.775
2025-09-14 16:36:20,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-344.8716), np.float32(-326.85806), np.float32(-323.0097), np.float32(-359.64032), np.float32(-472.43295), np.float32(-399.55582), np.float32(-328.86923), np.float32(-330.74826), np.float32(-355.85483), np.float32(-516.29913)]
2025-09-14 16:36:20,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:36:20,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-375.81) for latency 24
2025-09-14 16:36:20,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 51 minutes, 18 seconds)
2025-09-14 16:38:37,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:38:46,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -265.11588 ± 53.636
2025-09-14 16:38:46,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-235.95743), np.float32(-330.35757), np.float32(-210.79437), np.float32(-305.0679), np.float32(-207.01189), np.float32(-329.27536), np.float32(-234.41734), np.float32(-268.25485), np.float32(-191.97066), np.float32(-338.05115)]
2025-09-14 16:38:46,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:38:46,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-265.12) for latency 24
2025-09-14 16:38:46,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 53 minutes, 20 seconds)
2025-09-14 16:41:02,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:41:11,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -241.75342 ± 47.031
2025-09-14 16:41:11,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-259.51047), np.float32(-215.24829), np.float32(-279.95605), np.float32(-263.79648), np.float32(-208.59747), np.float32(-338.54), np.float32(-210.8599), np.float32(-261.6805), np.float32(-220.5982), np.float32(-158.74695)]
2025-09-14 16:41:11,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:41:11,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-241.75) for latency 24
2025-09-14 16:41:11,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 52 minutes, 18 seconds)
2025-09-14 16:43:27,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:43:36,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -127.23690 ± 163.900
2025-09-14 16:43:36,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-309.68066), np.float32(-13.167957), np.float32(57.021713), np.float32(-384.2362), np.float32(-315.11783), np.float32(-259.89343), np.float32(39.59919), np.float32(57.274235), np.float32(-44.926765), np.float32(-99.241295)]
2025-09-14 16:43:36,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:43:36,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-127.24) for latency 24
2025-09-14 16:43:36,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 50 minutes, 29 seconds)
2025-09-14 16:45:53,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:46:02,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -172.00443 ± 90.555
2025-09-14 16:46:02,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-313.63586), np.float32(-82.69347), np.float32(-324.5475), np.float32(-231.90091), np.float32(-137.56622), np.float32(-99.17936), np.float32(-189.77945), np.float32(-195.43945), np.float32(-75.10431), np.float32(-70.1979)]
2025-09-14 16:46:02,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:46:02,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 48 minutes, 34 seconds)
2025-09-14 16:48:18,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:48:27,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -112.02821 ± 79.298
2025-09-14 16:48:27,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-156.14612), np.float32(-163.15474), np.float32(-179.39249), np.float32(-145.25517), np.float32(-26.32731), np.float32(65.68821), np.float32(-32.996685), np.float32(-146.79016), np.float32(-172.55965), np.float32(-163.34807)]
2025-09-14 16:48:27,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:48:27,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-112.03) for latency 24
2025-09-14 16:48:27,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 47 minutes, 46 seconds)
2025-09-14 16:50:44,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:50:53,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -61.86047 ± 76.878
2025-09-14 16:50:53,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-70.45622), np.float32(-0.27750072), np.float32(-18.079708), np.float32(-48.07842), np.float32(-83.93436), np.float32(-136.29587), np.float32(-76.1679), np.float32(-156.76541), np.float32(-143.42831), np.float32(114.879074)]
2025-09-14 16:50:53,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:50:53,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-61.86) for latency 24
2025-09-14 16:50:53,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 45 minutes, 27 seconds)
2025-09-14 16:53:10,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:53:18,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 97.47080 ± 57.288
2025-09-14 16:53:18,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(124.892944), np.float32(102.83466), np.float32(106.1218), np.float32(64.049515), np.float32(170.957), np.float32(69.025475), np.float32(-3.0684018), np.float32(19.040127), np.float32(145.41011), np.float32(175.4448)]
2025-09-14 16:53:18,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:53:18,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (97.47) for latency 24
2025-09-14 16:53:18,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 43 minutes)
2025-09-14 16:55:31,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:55:39,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 344.40594 ± 132.707
2025-09-14 16:55:39,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(350.805), np.float32(261.9387), np.float32(470.44128), np.float32(383.4908), np.float32(223.86333), np.float32(424.2529), np.float32(496.67157), np.float32(394.91956), np.float32(410.7751), np.float32(26.901192)]
2025-09-14 16:55:39,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:55:39,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (344.41) for latency 24
2025-09-14 16:55:39,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 39 minutes, 20 seconds)
2025-09-14 16:57:45,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:57:54,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 475.84180 ± 246.931
2025-09-14 16:57:54,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(682.96), np.float32(522.7869), np.float32(623.47656), np.float32(487.71713), np.float32(488.51035), np.float32(-101.911125), np.float32(591.0685), np.float32(681.1874), np.float32(654.95557), np.float32(127.66639)]
2025-09-14 16:57:54,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:57:54,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (475.84) for latency 24
2025-09-14 16:57:54,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 33 minutes, 29 seconds)
2025-09-14 16:59:59,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:00:08,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 470.06232 ± 174.215
2025-09-14 17:00:08,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(476.37262), np.float32(489.68616), np.float32(646.24927), np.float32(108.98884), np.float32(270.41953), np.float32(497.0503), np.float32(675.9492), np.float32(550.25037), np.float32(652.86755), np.float32(332.7892)]
2025-09-14 17:00:08,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:00:08,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 27 minutes, 55 seconds)
2025-09-14 17:02:14,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:02:23,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 425.15845 ± 160.466
2025-09-14 17:02:23,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(155.63597), np.float32(724.95074), np.float32(582.2569), np.float32(297.94604), np.float32(383.44333), np.float32(370.014), np.float32(308.4591), np.float32(547.2464), np.float32(541.6018), np.float32(340.03015)]
2025-09-14 17:02:23,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:02:23,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 3 hours, 22 minutes, 23 seconds)
2025-09-14 17:04:29,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:04:38,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 440.65039 ± 227.650
2025-09-14 17:04:38,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(736.6565), np.float32(663.3002), np.float32(325.9382), np.float32(339.91995), np.float32(478.14346), np.float32(172.08395), np.float32(597.0976), np.float32(292.95267), np.float32(742.7346), np.float32(57.67685)]
2025-09-14 17:04:38,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:04:38,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 3 hours, 17 minutes, 4 seconds)
2025-09-14 17:06:44,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:06:53,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 495.89487 ± 230.279
2025-09-14 17:06:53,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(705.1041), np.float32(80.408905), np.float32(337.2386), np.float32(653.64557), np.float32(640.371), np.float32(712.4832), np.float32(101.50463), np.float32(571.1545), np.float32(681.44366), np.float32(475.59445)]
2025-09-14 17:06:53,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:06:53,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (495.89) for latency 24
2025-09-14 17:06:53,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 3 hours, 13 minutes, 1 second)
2025-09-14 17:08:58,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:09:07,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 580.40155 ± 183.043
2025-09-14 17:09:07,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(723.44965), np.float32(705.83575), np.float32(750.75977), np.float32(618.8954), np.float32(290.79276), np.float32(574.64624), np.float32(438.76938), np.float32(808.0674), np.float32(642.15234), np.float32(250.6472)]
2025-09-14 17:09:07,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:09:07,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (580.40) for latency 24
2025-09-14 17:09:07,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 3 hours, 10 minutes, 52 seconds)
2025-09-14 17:11:14,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:11:23,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 563.83551 ± 151.249
2025-09-14 17:11:23,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(338.8323), np.float32(663.29224), np.float32(463.91028), np.float32(679.1937), np.float32(493.20776), np.float32(586.3948), np.float32(292.7233), np.float32(673.5116), np.float32(761.37616), np.float32(685.9129)]
2025-09-14 17:11:23,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:11:23,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 8 minutes, 55 seconds)
2025-09-14 17:13:41,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:13:49,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 546.80750 ± 305.195
2025-09-14 17:13:49,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(662.6396), np.float32(737.88293), np.float32(376.4889), np.float32(610.2726), np.float32(755.1177), np.float32(574.6771), np.float32(-285.50452), np.float32(752.6093), np.float32(482.0903), np.float32(801.80145)]
2025-09-14 17:13:49,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:13:49,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 9 minutes, 52 seconds)
2025-09-14 17:16:08,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:16:16,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 469.00293 ± 261.310
2025-09-14 17:16:16,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(533.7225), np.float32(606.0748), np.float32(267.7425), np.float32(710.6049), np.float32(646.2132), np.float32(-101.72008), np.float32(688.94403), np.float32(199.24094), np.float32(392.8802), np.float32(746.3265)]
2025-09-14 17:16:16,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:16:16,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 10 minutes, 54 seconds)
2025-09-14 17:18:34,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:18:43,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 448.97046 ± 195.344
2025-09-14 17:18:43,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(402.91245), np.float32(415.7419), np.float32(571.8257), np.float32(10.218121), np.float32(406.7237), np.float32(588.85834), np.float32(813.0659), np.float32(501.54877), np.float32(330.2756), np.float32(448.53378)]
2025-09-14 17:18:43,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:18:43,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 11 minutes, 47 seconds)
2025-09-14 17:21:01,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:21:10,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 423.94733 ± 242.366
2025-09-14 17:21:10,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(347.2451), np.float32(627.7864), np.float32(503.51425), np.float32(-100.53955), np.float32(583.80035), np.float32(654.9472), np.float32(687.25604), np.float32(529.2979), np.float32(174.30922), np.float32(231.85632)]
2025-09-14 17:21:10,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:21:10,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 12 minutes, 47 seconds)
2025-09-14 17:23:29,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:23:38,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 477.79639 ± 251.460
2025-09-14 17:23:38,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(269.76706), np.float32(3.859261), np.float32(249.27565), np.float32(724.4324), np.float32(718.1067), np.float32(508.89084), np.float32(669.7328), np.float32(299.00574), np.float32(828.5674), np.float32(506.32623)]
2025-09-14 17:23:38,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:23:38,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 13 minutes, 34 seconds)
2025-09-14 17:25:56,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:26:05,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 484.50482 ± 216.413
2025-09-14 17:26:05,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(255.47379), np.float32(352.72162), np.float32(760.2889), np.float32(203.26137), np.float32(801.20966), np.float32(600.24115), np.float32(726.5931), np.float32(271.42188), np.float32(539.71484), np.float32(334.12186)]
2025-09-14 17:26:05,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:26:05,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 11 minutes, 14 seconds)
2025-09-14 17:28:22,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:28:31,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 497.87738 ± 172.793
2025-09-14 17:28:31,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(123.59688), np.float32(757.7123), np.float32(475.43823), np.float32(487.64133), np.float32(537.6248), np.float32(562.186), np.float32(335.62057), np.float32(721.40796), np.float32(424.77762), np.float32(552.7679)]
2025-09-14 17:28:31,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:28:31,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 8 minutes, 33 seconds)
2025-09-14 17:30:48,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:30:57,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 386.38980 ± 208.052
2025-09-14 17:30:57,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(262.11508), np.float32(581.058), np.float32(238.64185), np.float32(649.8697), np.float32(510.74234), np.float32(151.03093), np.float32(136.44658), np.float32(197.64084), np.float32(731.2964), np.float32(405.05637)]
2025-09-14 17:30:57,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:30:57,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 5 minutes, 57 seconds)
2025-09-14 17:33:15,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:33:24,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 438.14902 ± 265.923
2025-09-14 17:33:24,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-4.799554), np.float32(823.545), np.float32(21.681873), np.float32(293.22006), np.float32(396.54614), np.float32(436.50208), np.float32(670.6171), np.float32(754.8214), np.float32(547.214), np.float32(442.14197)]
2025-09-14 17:33:24,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:33:24,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 3 minutes, 20 seconds)
2025-09-14 17:35:40,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:35:49,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 458.58942 ± 181.083
2025-09-14 17:35:49,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(58.376926), np.float32(315.3255), np.float32(442.1436), np.float32(511.01923), np.float32(468.4429), np.float32(524.0096), np.float32(426.0888), np.float32(653.05695), np.float32(773.19714), np.float32(414.23346)]
2025-09-14 17:35:49,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:35:49,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 24 seconds)
2025-09-14 17:38:07,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:38:16,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 504.04547 ± 188.839
2025-09-14 17:38:16,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(224.83289), np.float32(718.1218), np.float32(413.5896), np.float32(795.1245), np.float32(355.51215), np.float32(685.6101), np.float32(267.11603), np.float32(522.3275), np.float32(412.59467), np.float32(645.6249)]
2025-09-14 17:38:16,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:38:16,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 57 minutes, 47 seconds)
2025-09-14 17:40:30,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:40:39,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 521.18201 ± 264.871
2025-09-14 17:40:39,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(308.1336), np.float32(853.57184), np.float32(530.2689), np.float32(602.36707), np.float32(821.78046), np.float32(821.0307), np.float32(303.42325), np.float32(21.040075), np.float32(306.36444), np.float32(643.8398)]
2025-09-14 17:40:39,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:40:39,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 54 minutes, 43 seconds)
2025-09-14 17:42:53,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:43:02,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 680.17633 ± 100.359
2025-09-14 17:43:02,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(661.98285), np.float32(820.88794), np.float32(693.7834), np.float32(567.88904), np.float32(709.98), np.float32(753.0765), np.float32(742.5096), np.float32(542.55035), np.float32(517.7078), np.float32(791.396)]
2025-09-14 17:43:02,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:43:02,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (680.18) for latency 24
2025-09-14 17:43:02,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 51 minutes, 27 seconds)
2025-09-14 17:45:14,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:45:23,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 600.69177 ± 144.428
2025-09-14 17:45:23,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(686.7409), np.float32(626.1811), np.float32(704.4685), np.float32(592.6426), np.float32(292.59402), np.float32(641.43933), np.float32(365.2164), np.float32(679.02466), np.float32(772.3222), np.float32(646.288)]
2025-09-14 17:45:23,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:45:23,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 47 minutes, 50 seconds)
2025-09-14 17:47:36,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:47:45,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 606.75024 ± 201.063
2025-09-14 17:47:45,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(697.04254), np.float32(432.05005), np.float32(685.10077), np.float32(767.6388), np.float32(733.8534), np.float32(758.74695), np.float32(716.5341), np.float32(159.61795), np.float32(361.3978), np.float32(755.51984)]
2025-09-14 17:47:45,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:47:45,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 44 minutes, 40 seconds)
2025-09-14 17:49:54,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:50:03,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 537.93475 ± 243.987
2025-09-14 17:50:03,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(839.7181), np.float32(333.13464), np.float32(755.67755), np.float32(309.1686), np.float32(234.19948), np.float32(677.47546), np.float32(888.8617), np.float32(244.50679), np.float32(404.52936), np.float32(692.076)]
2025-09-14 17:50:03,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:50:03,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 40 minutes, 21 seconds)
2025-09-14 17:52:11,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:52:20,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 504.69809 ± 246.497
2025-09-14 17:52:20,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(34.16197), np.float32(841.9743), np.float32(464.78586), np.float32(417.76428), np.float32(449.47305), np.float32(726.2731), np.float32(831.2966), np.float32(295.66385), np.float32(318.8603), np.float32(666.72736)]
2025-09-14 17:52:20,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:52:20,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 36 minutes, 34 seconds)
2025-09-14 17:54:30,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:54:39,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 722.53790 ± 98.396
2025-09-14 17:54:39,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(816.112), np.float32(772.3603), np.float32(583.75214), np.float32(669.0427), np.float32(635.81903), np.float32(680.04486), np.float32(593.83514), np.float32(851.3922), np.float32(862.1586), np.float32(760.86176)]
2025-09-14 17:54:39,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:54:39,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (722.54) for latency 24
2025-09-14 17:54:39,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 33 minutes, 25 seconds)
2025-09-14 17:56:53,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:57:02,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 543.26086 ± 187.999
2025-09-14 17:57:02,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(432.5925), np.float32(725.8855), np.float32(714.2013), np.float32(793.4839), np.float32(310.7813), np.float32(307.34814), np.float32(730.4015), np.float32(666.2993), np.float32(381.40683), np.float32(370.2086)]
2025-09-14 17:57:02,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:57:02,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 31 minutes, 22 seconds)
2025-09-14 17:59:15,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:59:24,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 590.48785 ± 332.367
2025-09-14 17:59:24,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(453.20993), np.float32(727.9491), np.float32(415.51688), np.float32(808.9262), np.float32(787.36145), np.float32(-313.77267), np.float32(634.13007), np.float32(760.44147), np.float32(821.00476), np.float32(810.1114)]
2025-09-14 17:59:24,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:59:24,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 29 minutes, 5 seconds)
2025-09-14 18:01:38,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:01:47,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 510.56876 ± 243.285
2025-09-14 18:01:47,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(465.3506), np.float32(431.5759), np.float32(579.07904), np.float32(-39.802578), np.float32(698.945), np.float32(618.78656), np.float32(218.65872), np.float32(624.88257), np.float32(676.99457), np.float32(831.2169)]
2025-09-14 18:01:47,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:01:47,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 27 minutes, 45 seconds)
2025-09-14 18:04:00,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:04:09,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 678.91949 ± 140.471
2025-09-14 18:04:09,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(675.2095), np.float32(793.7362), np.float32(800.0036), np.float32(390.997), np.float32(591.6267), np.float32(740.4704), np.float32(805.52124), np.float32(599.9851), np.float32(857.56915), np.float32(534.07556)]
2025-09-14 18:04:09,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:04:09,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 26 minutes, 32 seconds)
2025-09-14 18:06:23,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:06:32,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 662.35565 ± 179.443
2025-09-14 18:06:32,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(215.35089), np.float32(719.73676), np.float32(632.6635), np.float32(769.9115), np.float32(465.84723), np.float32(840.14966), np.float32(772.18945), np.float32(758.05725), np.float32(664.87646), np.float32(784.774)]
2025-09-14 18:06:32,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:06:32,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 24 minutes, 55 seconds)
2025-09-14 18:08:45,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:08:54,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 557.22803 ± 210.705
2025-09-14 18:08:54,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(326.6145), np.float32(237.70494), np.float32(477.08246), np.float32(260.78915), np.float32(542.6959), np.float32(651.5179), np.float32(726.8396), np.float32(781.46356), np.float32(741.4642), np.float32(826.10815)]
2025-09-14 18:08:54,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:08:54,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 22 minutes, 30 seconds)
2025-09-14 18:11:07,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:11:16,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 536.62811 ± 112.961
2025-09-14 18:11:16,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(500.23132), np.float32(545.8702), np.float32(658.12085), np.float32(559.7355), np.float32(520.432), np.float32(620.62695), np.float32(466.75247), np.float32(439.4147), np.float32(738.59485), np.float32(316.50226)]
2025-09-14 18:11:16,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:11:16,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 20 minutes, 2 seconds)
2025-09-14 18:13:29,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:13:37,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 695.63214 ± 148.161
2025-09-14 18:13:37,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(315.12476), np.float32(701.4662), np.float32(808.09406), np.float32(697.64166), np.float32(745.594), np.float32(609.1863), np.float32(655.42474), np.float32(718.75073), np.float32(833.8562), np.float32(871.1828)]
2025-09-14 18:13:37,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:13:37,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 17 minutes, 25 seconds)
2025-09-14 18:15:46,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:15:55,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 624.08630 ± 201.478
2025-09-14 18:15:55,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(847.0586), np.float32(759.7636), np.float32(424.54355), np.float32(404.0428), np.float32(470.16898), np.float32(481.60638), np.float32(738.79395), np.float32(931.6066), np.float32(372.83572), np.float32(810.44226)]
2025-09-14 18:15:55,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:15:55,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 14 minutes, 3 seconds)
2025-09-14 18:18:04,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:18:13,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 607.72125 ± 183.755
2025-09-14 18:18:13,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(850.9344), np.float32(770.64166), np.float32(646.242), np.float32(317.0167), np.float32(498.66248), np.float32(546.45294), np.float32(341.74332), np.float32(746.0568), np.float32(521.5036), np.float32(837.95917)]
2025-09-14 18:18:13,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:18:13,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 10 minutes, 52 seconds)
2025-09-14 18:20:22,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:20:31,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 446.99115 ± 127.632
2025-09-14 18:20:31,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(578.1568), np.float32(203.49959), np.float32(617.886), np.float32(342.22205), np.float32(435.64343), np.float32(533.04645), np.float32(528.2919), np.float32(297.34006), np.float32(400.23465), np.float32(533.59094)]
2025-09-14 18:20:31,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:20:31,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 7 minutes, 42 seconds)
2025-09-14 18:22:40,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:22:48,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 655.05634 ± 144.501
2025-09-14 18:22:48,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(500.177), np.float32(704.3949), np.float32(712.3695), np.float32(665.978), np.float32(369.35254), np.float32(693.91614), np.float32(753.0207), np.float32(515.9308), np.float32(901.30664), np.float32(734.11725)]
2025-09-14 18:22:48,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:22:48,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 4 minutes, 35 seconds)
2025-09-14 18:24:58,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:25:06,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 694.69135 ± 193.824
2025-09-14 18:25:06,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(878.16504), np.float32(515.84546), np.float32(786.8847), np.float32(631.52716), np.float32(886.86774), np.float32(827.834), np.float32(805.7687), np.float32(577.3573), np.float32(240.32852), np.float32(796.3348)]
2025-09-14 18:25:06,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:25:06,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 1 minute, 43 seconds)
2025-09-14 18:27:15,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:27:24,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 579.00806 ± 157.797
2025-09-14 18:27:24,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(811.111), np.float32(725.7875), np.float32(481.21774), np.float32(747.94244), np.float32(348.88376), np.float32(651.2446), np.float32(559.5842), np.float32(347.0214), np.float32(453.0977), np.float32(664.19025)]
2025-09-14 18:27:24,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:27:24,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 59 minutes, 28 seconds)
2025-09-14 18:29:33,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:29:42,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 824.26367 ± 131.293
2025-09-14 18:29:42,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1040.7107), np.float32(800.95215), np.float32(624.3495), np.float32(937.9309), np.float32(784.67883), np.float32(898.1443), np.float32(784.7077), np.float32(973.2834), np.float32(769.26605), np.float32(628.61273)]
2025-09-14 18:29:42,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:29:42,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (824.26) for latency 24
2025-09-14 18:29:42,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 57 minutes, 7 seconds)
2025-09-14 18:31:51,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:32:00,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 690.70618 ± 121.676
2025-09-14 18:32:00,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(404.89148), np.float32(590.1632), np.float32(763.3951), np.float32(746.81805), np.float32(735.34296), np.float32(839.9487), np.float32(581.1226), np.float32(778.7871), np.float32(724.38086), np.float32(742.2113)]
2025-09-14 18:32:00,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:32:00,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 54 minutes, 52 seconds)
2025-09-14 18:34:09,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:34:18,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 648.86090 ± 197.971
2025-09-14 18:34:18,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(923.64777), np.float32(228.47386), np.float32(924.05554), np.float32(626.8539), np.float32(695.3417), np.float32(725.2202), np.float32(480.66733), np.float32(542.5475), np.float32(757.2754), np.float32(584.5259)]
2025-09-14 18:34:18,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:34:18,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 52 minutes, 39 seconds)
2025-09-14 18:36:28,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:36:37,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 766.67938 ± 99.036
2025-09-14 18:36:37,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(522.26025), np.float32(795.2024), np.float32(758.6615), np.float32(770.00006), np.float32(771.6158), np.float32(704.0473), np.float32(913.85657), np.float32(867.9529), np.float32(757.7061), np.float32(805.4913)]
2025-09-14 18:36:37,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:36:37,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 50 minutes, 28 seconds)
2025-09-14 18:38:50,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:38:59,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 726.98651 ± 130.996
2025-09-14 18:38:59,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(806.61316), np.float32(951.99976), np.float32(597.38324), np.float32(612.72614), np.float32(871.66205), np.float32(748.4219), np.float32(672.87646), np.float32(676.9419), np.float32(506.84073), np.float32(824.3997)]
2025-09-14 18:38:59,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:38:59,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 48 minutes, 48 seconds)
2025-09-14 18:41:12,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:41:20,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 731.42328 ± 116.705
2025-09-14 18:41:20,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(734.79675), np.float32(636.6636), np.float32(637.9918), np.float32(838.4004), np.float32(943.89197), np.float32(649.4368), np.float32(701.453), np.float32(662.26013), np.float32(910.24744), np.float32(599.09064)]
2025-09-14 18:41:20,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:41:20,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 47 minutes, 6 seconds)
2025-09-14 18:43:33,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:43:42,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 740.30341 ± 75.527
2025-09-14 18:43:42,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(695.2895), np.float32(831.69507), np.float32(773.114), np.float32(742.4699), np.float32(689.0249), np.float32(876.3584), np.float32(599.78674), np.float32(691.5045), np.float32(783.8928), np.float32(719.8983)]
2025-09-14 18:43:42,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:43:42,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 45 minutes, 21 seconds)
2025-09-14 18:45:57,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:46:06,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 810.63556 ± 145.996
2025-09-14 18:46:06,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(632.13965), np.float32(799.29626), np.float32(998.7498), np.float32(610.3743), np.float32(1059.1023), np.float32(778.51294), np.float32(651.2174), np.float32(937.3851), np.float32(844.8114), np.float32(794.76624)]
2025-09-14 18:46:06,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:46:06,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 43 minutes, 45 seconds)
2025-09-14 18:48:20,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:48:29,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 753.23132 ± 90.444
2025-09-14 18:48:29,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(678.36115), np.float32(578.88116), np.float32(821.8104), np.float32(688.0186), np.float32(711.5863), np.float32(778.61676), np.float32(741.2422), np.float32(859.4241), np.float32(770.307), np.float32(904.06573)]
2025-09-14 18:48:29,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:48:29,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 42 minutes)
2025-09-14 18:50:41,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:50:50,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 786.66681 ± 71.113
2025-09-14 18:50:50,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(870.55096), np.float32(805.2742), np.float32(814.2184), np.float32(744.14246), np.float32(875.665), np.float32(698.09937), np.float32(733.3819), np.float32(712.98505), np.float32(717.21954), np.float32(895.13086)]
2025-09-14 18:50:50,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:50:50,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 39 minutes, 32 seconds)
2025-09-14 18:53:02,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:53:11,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 717.04285 ± 63.518
2025-09-14 18:53:11,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(727.1225), np.float32(685.0218), np.float32(711.1233), np.float32(627.7536), np.float32(667.85156), np.float32(671.02734), np.float32(850.9852), np.float32(751.81915), np.float32(796.18555), np.float32(681.5381)]
2025-09-14 18:53:11,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:53:11,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 37 minutes, 6 seconds)
2025-09-14 18:55:24,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:55:33,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 778.50702 ± 200.371
2025-09-14 18:55:33,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1009.8641), np.float32(499.91266), np.float32(773.8266), np.float32(756.41754), np.float32(980.2232), np.float32(734.67487), np.float32(1047.4441), np.float32(752.39325), np.float32(839.7082), np.float32(390.6057)]
2025-09-14 18:55:33,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:55:33,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 34 minutes, 43 seconds)
2025-09-14 18:57:46,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:57:55,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 822.40173 ± 195.893
2025-09-14 18:57:55,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(851.81696), np.float32(518.44324), np.float32(660.0287), np.float32(667.5964), np.float32(1250.8806), np.float32(974.6242), np.float32(889.2307), np.float32(706.56665), np.float32(769.64374), np.float32(935.18634)]
2025-09-14 18:57:55,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:57:55,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 32 minutes, 12 seconds)
2025-09-14 19:00:08,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:00:17,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 911.38538 ± 141.341
2025-09-14 19:00:17,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1032.289), np.float32(750.5384), np.float32(1047.8099), np.float32(1147.1124), np.float32(864.7897), np.float32(846.9104), np.float32(803.54474), np.float32(975.2965), np.float32(976.66583), np.float32(668.8968)]
2025-09-14 19:00:17,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:00:17,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (911.39) for latency 24
2025-09-14 19:00:17,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 29 minutes, 44 seconds)
2025-09-14 19:02:30,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:02:39,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 836.54572 ± 131.708
2025-09-14 19:02:39,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(733.05396), np.float32(1024.9017), np.float32(888.8055), np.float32(863.40704), np.float32(976.4971), np.float32(803.8111), np.float32(543.4987), np.float32(753.9795), np.float32(838.6148), np.float32(938.8878)]
2025-09-14 19:02:39,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:02:39,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 27 minutes, 31 seconds)
2025-09-14 19:04:49,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:04:58,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 866.33185 ± 142.552
2025-09-14 19:04:58,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(798.6443), np.float32(597.6227), np.float32(1141.5673), np.float32(976.9809), np.float32(728.621), np.float32(837.55817), np.float32(876.2867), np.float32(865.8537), np.float32(999.9526), np.float32(840.23114)]
2025-09-14 19:04:58,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:04:58,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 24 minutes, 48 seconds)
2025-09-14 19:07:07,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:07:16,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 934.42090 ± 171.527
2025-09-14 19:07:16,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1005.71545), np.float32(863.84906), np.float32(1036.9069), np.float32(1007.2034), np.float32(1200.9464), np.float32(800.1966), np.float32(973.02515), np.float32(949.8693), np.float32(989.81134), np.float32(516.68585)]
2025-09-14 19:07:16,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:07:16,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (934.42) for latency 24
2025-09-14 19:07:16,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 22 minutes, 3 seconds)
2025-09-14 19:09:26,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:09:34,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 985.98712 ± 134.138
2025-09-14 19:09:34,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(831.78455), np.float32(844.3438), np.float32(1179.1539), np.float32(1109.3362), np.float32(1219.8197), np.float32(922.53516), np.float32(914.0679), np.float32(1020.845), np.float32(966.487), np.float32(851.49677)]
2025-09-14 19:09:34,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:09:34,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (985.99) for latency 24
2025-09-14 19:09:34,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 19 minutes, 16 seconds)
2025-09-14 19:11:44,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:11:53,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 855.58801 ± 216.310
2025-09-14 19:11:53,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1057.4152), np.float32(771.88434), np.float32(833.6435), np.float32(774.9126), np.float32(761.876), np.float32(1124.9774), np.float32(493.242), np.float32(1243.7437), np.float32(632.38715), np.float32(861.7986)]
2025-09-14 19:11:53,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:11:53,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 16 minutes, 33 seconds)
2025-09-14 19:14:03,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:14:11,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 911.39777 ± 108.122
2025-09-14 19:14:11,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1030.5485), np.float32(862.02856), np.float32(815.55566), np.float32(1002.56885), np.float32(847.86096), np.float32(710.2847), np.float32(1076.0842), np.float32(1007.3518), np.float32(887.66376), np.float32(874.03)]
2025-09-14 19:14:11,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:14:11,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 13 minutes, 50 seconds)
2025-09-14 19:16:20,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:16:29,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 998.56427 ± 127.510
2025-09-14 19:16:29,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1045.9813), np.float32(816.14575), np.float32(1190.4976), np.float32(972.13043), np.float32(953.3093), np.float32(1004.47504), np.float32(821.7681), np.float32(888.72546), np.float32(1161.604), np.float32(1131.0056)]
2025-09-14 19:16:29,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:16:29,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (998.56) for latency 24
2025-09-14 19:16:29,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 11 minutes, 27 seconds)
2025-09-14 19:18:39,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:18:48,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 786.26190 ± 231.817
2025-09-14 19:18:48,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(797.7956), np.float32(679.03046), np.float32(928.89105), np.float32(972.15546), np.float32(1005.9872), np.float32(164.16689), np.float32(708.2948), np.float32(939.0547), np.float32(862.5135), np.float32(804.7293)]
2025-09-14 19:18:48,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:18:48,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 9 minutes, 9 seconds)
2025-09-14 19:20:57,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:21:06,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1016.98889 ± 104.500
2025-09-14 19:21:06,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(924.3245), np.float32(1048.9896), np.float32(1003.05164), np.float32(1236.9989), np.float32(990.1686), np.float32(950.00745), np.float32(893.4307), np.float32(1163.9122), np.float32(1034.8369), np.float32(924.168)]
2025-09-14 19:21:06,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:21:06,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1016.99) for latency 24
2025-09-14 19:21:06,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 6 minutes, 53 seconds)
2025-09-14 19:23:16,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:23:25,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 899.29181 ± 160.804
2025-09-14 19:23:25,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(876.43036), np.float32(789.54517), np.float32(820.7161), np.float32(899.7361), np.float32(1092.0287), np.float32(548.7705), np.float32(963.21533), np.float32(1164.5602), np.float32(966.89966), np.float32(871.01514)]
2025-09-14 19:23:25,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:23:25,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 4 minutes, 33 seconds)
2025-09-14 19:25:34,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:25:43,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 962.04602 ± 159.904
2025-09-14 19:25:43,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(768.4334), np.float32(920.28534), np.float32(1006.84375), np.float32(1042.2231), np.float32(1339.5925), np.float32(790.47125), np.float32(946.5479), np.float32(787.0695), np.float32(1028.0405), np.float32(990.95197)]
2025-09-14 19:25:43,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:25:43,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 2 minutes, 12 seconds)
2025-09-14 19:27:52,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:28:01,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 956.70441 ± 111.522
2025-09-14 19:28:01,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1202.2288), np.float32(778.6718), np.float32(1003.72723), np.float32(1016.1768), np.float32(863.3239), np.float32(896.1856), np.float32(921.91284), np.float32(1004.1749), np.float32(1011.9555), np.float32(868.68616)]
2025-09-14 19:28:01,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:28:01,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 59 minutes, 58 seconds)
2025-09-14 19:30:11,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:30:20,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 869.85498 ± 131.188
2025-09-14 19:30:20,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(847.2308), np.float32(917.79645), np.float32(886.35406), np.float32(704.7664), np.float32(1070.452), np.float32(715.9499), np.float32(1053.2058), np.float32(907.3237), np.float32(670.4252), np.float32(925.04504)]
2025-09-14 19:30:20,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:30:20,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 57 minutes, 40 seconds)
2025-09-14 19:32:30,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:32:39,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1037.70264 ± 144.171
2025-09-14 19:32:39,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1145.2587), np.float32(839.5441), np.float32(1034.1993), np.float32(1209.0911), np.float32(1196.4807), np.float32(871.3017), np.float32(933.7198), np.float32(1254.7783), np.float32(953.3499), np.float32(939.30396)]
2025-09-14 19:32:39,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:32:39,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1037.70) for latency 24
2025-09-14 19:32:39,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 55 minutes, 22 seconds)
2025-09-14 19:34:48,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:34:57,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 931.35303 ± 190.358
2025-09-14 19:34:57,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(922.2174), np.float32(931.04425), np.float32(759.65424), np.float32(809.087), np.float32(845.8498), np.float32(989.26074), np.float32(960.4639), np.float32(994.9629), np.float32(680.4273), np.float32(1420.5632)]
2025-09-14 19:34:57,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:34:57,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 53 minutes, 2 seconds)
2025-09-14 19:37:06,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:37:15,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1143.30298 ± 239.883
2025-09-14 19:37:15,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(856.96985), np.float32(1518.0475), np.float32(1406.5558), np.float32(869.64264), np.float32(1044.861), np.float32(951.3248), np.float32(979.37634), np.float32(1218.424), np.float32(1499.3389), np.float32(1088.4895)]
2025-09-14 19:37:15,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:37:15,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1143.30) for latency 24
2025-09-14 19:37:15,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 50 minutes, 46 seconds)
2025-09-14 19:39:24,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:39:33,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1055.68286 ± 171.252
2025-09-14 19:39:33,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1046.9714), np.float32(1237.5049), np.float32(1258.9375), np.float32(1304.2158), np.float32(786.2754), np.float32(996.8622), np.float32(1189.7793), np.float32(910.42004), np.float32(888.6849), np.float32(937.1763)]
2025-09-14 19:39:33,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:39:33,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 48 minutes, 26 seconds)
2025-09-14 19:41:43,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:41:52,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1300.96460 ± 298.354
2025-09-14 19:41:52,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(991.55536), np.float32(1145.311), np.float32(1356.9417), np.float32(1688.0146), np.float32(1082.1715), np.float32(1614.0396), np.float32(1572.7737), np.float32(1083.7146), np.float32(823.8942), np.float32(1651.2294)]
2025-09-14 19:41:52,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:41:52,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1300.96) for latency 24
2025-09-14 19:41:52,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 46 minutes, 7 seconds)
2025-09-14 19:44:01,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:44:10,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 969.12463 ± 222.259
2025-09-14 19:44:10,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1129.2535), np.float32(802.6011), np.float32(1050.2562), np.float32(1437.3839), np.float32(772.30536), np.float32(945.75287), np.float32(557.80084), np.float32(964.92554), np.float32(983.9927), np.float32(1046.9744)]
2025-09-14 19:44:10,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:44:10,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 43 minutes, 47 seconds)
2025-09-14 19:46:19,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:46:28,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1044.43896 ± 215.335
2025-09-14 19:46:28,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1566.6061), np.float32(939.5467), np.float32(876.5864), np.float32(1280.1345), np.float32(1101.6703), np.float32(906.1745), np.float32(1014.8333), np.float32(992.6305), np.float32(787.4732), np.float32(978.73334)]
2025-09-14 19:46:28,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:46:28,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 41 minutes, 29 seconds)
2025-09-14 19:48:38,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:48:47,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1095.36194 ± 342.657
2025-09-14 19:48:47,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1258.3333), np.float32(987.9582), np.float32(1041.9683), np.float32(699.53076), np.float32(1722.0026), np.float32(802.83704), np.float32(919.9624), np.float32(1016.4758), np.float32(798.3866), np.float32(1706.1643)]
2025-09-14 19:48:47,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:48:47,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 39 minutes, 11 seconds)
2025-09-14 19:50:56,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:51:05,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1125.32507 ± 236.327
2025-09-14 19:51:05,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1509.7205), np.float32(1024.5787), np.float32(857.7661), np.float32(1377.1305), np.float32(997.1118), np.float32(1496.9976), np.float32(854.7504), np.float32(1113.2631), np.float32(935.11145), np.float32(1086.8209)]
2025-09-14 19:51:05,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:51:05,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 36 minutes, 53 seconds)
2025-09-14 19:53:14,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:53:23,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 898.56805 ± 177.594
2025-09-14 19:53:23,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1085.0201), np.float32(898.4336), np.float32(774.55475), np.float32(1052.9054), np.float32(723.8483), np.float32(589.37537), np.float32(795.04694), np.float32(1182.9968), np.float32(841.8502), np.float32(1041.649)]
2025-09-14 19:53:23,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:53:23,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 34 minutes, 33 seconds)
2025-09-14 19:55:32,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:55:41,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 993.33411 ± 182.379
2025-09-14 19:55:41,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(923.2363), np.float32(811.9711), np.float32(731.05524), np.float32(1244.1128), np.float32(1065.5513), np.float32(815.0432), np.float32(1305.2921), np.float32(1138.6326), np.float32(905.5699), np.float32(992.876)]
2025-09-14 19:55:41,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:55:41,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 32 minutes, 14 seconds)
2025-09-14 19:57:50,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:57:59,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1008.22644 ± 295.984
2025-09-14 19:57:59,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(964.6772), np.float32(824.682), np.float32(788.48694), np.float32(1680.0085), np.float32(884.6351), np.float32(1206.1624), np.float32(791.99634), np.float32(1378.3612), np.float32(782.0163), np.float32(781.23816)]
2025-09-14 19:57:59,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:57:59,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 29 minutes, 56 seconds)
2025-09-14 20:00:08,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:00:17,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1183.83203 ± 229.667
2025-09-14 20:00:17,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1075.7505), np.float32(1307.4854), np.float32(1445.9393), np.float32(1029.885), np.float32(1346.5255), np.float32(975.103), np.float32(1450.3751), np.float32(1457.2831), np.float32(902.10754), np.float32(847.86597)]
2025-09-14 20:00:17,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:00:17,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 27 minutes, 36 seconds)
2025-09-14 20:02:26,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:02:35,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1143.27014 ± 323.650
2025-09-14 20:02:35,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(957.3724), np.float32(868.54047), np.float32(1788.8824), np.float32(1229.3955), np.float32(1058.5948), np.float32(998.35077), np.float32(791.3325), np.float32(851.496), np.float32(1662.0187), np.float32(1226.7175)]
2025-09-14 20:02:35,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:02:35,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 25 minutes, 18 seconds)
2025-09-14 20:04:45,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:04:53,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 943.08643 ± 116.161
2025-09-14 20:04:53,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(754.42505), np.float32(848.8295), np.float32(1162.5548), np.float32(988.8794), np.float32(916.60944), np.float32(1047.4348), np.float32(912.2429), np.float32(807.1375), np.float32(954.59357), np.float32(1038.1576)]
2025-09-14 20:04:53,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:04:53,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 23 minutes, 1 second)
2025-09-14 20:07:03,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:07:12,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1001.55750 ± 202.322
2025-09-14 20:07:12,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(917.1431), np.float32(1092.7339), np.float32(741.7146), np.float32(1137.2667), np.float32(838.3489), np.float32(1455.8181), np.float32(1040.6733), np.float32(833.60266), np.float32(1127.8525), np.float32(830.4205)]
2025-09-14 20:07:12,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:07:12,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 20 minutes, 43 seconds)
2025-09-14 20:09:21,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:09:30,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1084.54126 ± 240.511
2025-09-14 20:09:30,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1065.9502), np.float32(927.4026), np.float32(833.81635), np.float32(1064.2784), np.float32(964.89264), np.float32(1422.0084), np.float32(972.698), np.float32(1105.5525), np.float32(858.48126), np.float32(1630.3328)]
2025-09-14 20:09:30,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:09:30,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 18 minutes, 25 seconds)
2025-09-14 20:11:39,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:11:48,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1020.35168 ± 228.596
2025-09-14 20:11:48,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1092.6517), np.float32(761.9687), np.float32(989.7925), np.float32(946.9442), np.float32(1038.5558), np.float32(983.56396), np.float32(1611.5294), np.float32(1004.2218), np.float32(717.8463), np.float32(1056.441)]
2025-09-14 20:11:48,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:11:48,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 16 minutes, 7 seconds)
2025-09-14 20:13:57,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:14:06,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1102.53589 ± 250.447
2025-09-14 20:14:06,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1088.3159), np.float32(1014.336), np.float32(912.13806), np.float32(1701.9794), np.float32(940.4835), np.float32(824.07806), np.float32(1191.4874), np.float32(1057.0109), np.float32(917.77155), np.float32(1377.7583)]
2025-09-14 20:14:06,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:14:06,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 49 seconds)
2025-09-14 20:16:15,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:16:24,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1109.11292 ± 166.789
2025-09-14 20:16:24,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1347.0717), np.float32(1024.0609), np.float32(1203.2526), np.float32(892.9961), np.float32(899.9844), np.float32(1339.3477), np.float32(893.6422), np.float32(1103.0498), np.float32(1169.4918), np.float32(1218.232)]
2025-09-14 20:16:24,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:16:24,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 30 seconds)
2025-09-14 20:18:33,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:18:42,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1162.05249 ± 312.362
2025-09-14 20:18:42,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1546.7269), np.float32(949.12), np.float32(1621.3694), np.float32(959.3232), np.float32(823.39014), np.float32(1244.6525), np.float32(861.27734), np.float32(1520.8334), np.float32(1332.4845), np.float32(761.3481)]
2025-09-14 20:18:42,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:18:42,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 9 minutes, 12 seconds)
2025-09-14 20:20:51,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:21:00,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1222.51782 ± 269.795
2025-09-14 20:21:00,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1539.6666), np.float32(1075.1575), np.float32(1207.2133), np.float32(1343.3446), np.float32(1322.2152), np.float32(880.31256), np.float32(1793.5083), np.float32(1049.464), np.float32(920.6271), np.float32(1093.67)]
2025-09-14 20:21:00,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:21:00,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 54 seconds)
2025-09-14 20:23:09,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:23:18,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1036.18994 ± 317.856
2025-09-14 20:23:18,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(879.40106), np.float32(895.32886), np.float32(869.09515), np.float32(907.4132), np.float32(919.76575), np.float32(1954.9767), np.float32(845.64636), np.float32(1012.882), np.float32(922.9206), np.float32(1154.4701)]
2025-09-14 20:23:18,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:23:18,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 35 seconds)
2025-09-14 20:25:27,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:25:35,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1017.78992 ± 207.516
2025-09-14 20:25:35,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(791.10596), np.float32(825.61774), np.float32(818.57117), np.float32(736.9691), np.float32(1201.0116), np.float32(1367.2136), np.float32(1128.785), np.float32(1221.7083), np.float32(963.2161), np.float32(1123.7012)]
2025-09-14 20:25:35,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:25:35,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 17 seconds)
2025-09-14 20:27:44,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:27:53,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1072.40979 ± 239.712
2025-09-14 20:27:53,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(728.3598), np.float32(1287.4205), np.float32(1307.911), np.float32(1367.5637), np.float32(1335.4973), np.float32(938.885), np.float32(1122.2416), np.float32(1007.3915), np.float32(956.82684), np.float32(671.99994)]
2025-09-14 20:27:53,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:27:53,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1251 [DEBUG]: Training session finished
