2025-09-14 14:11:18,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.100-delay_21
2025-09-14 14:11:18,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.100-delay_21
2025-09-14 14:11:18,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'21': <latency_env.delayed_mdp.ConstantDelay object at 0x7ff7ffa43d70>}
2025-09-14 14:11:18,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 14:11:18,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 14:11:18,220 baseline-bpql-noisepromille100-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=143, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 14:11:18,220 baseline-bpql-noisepromille100-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 14:11:19,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 14:11:19,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 14:14:16,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:14:26,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: -403.82074 ± 58.839
2025-09-14 14:14:26,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-372.9957), np.float32(-433.77826), np.float32(-285.29562), np.float32(-385.07013), np.float32(-376.60223), np.float32(-409.05075), np.float32(-418.0248), np.float32(-406.30148), np.float32(-415.9281), np.float32(-535.16034)]
2025-09-14 14:14:26,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:14:26,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (-403.82) for latency 21
2025-09-14 14:14:26,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 8 minutes, 13 seconds)
2025-09-14 14:17:28,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:17:38,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: -235.44247 ± 42.875
2025-09-14 14:17:38,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-201.8469), np.float32(-348.74225), np.float32(-227.01462), np.float32(-236.64862), np.float32(-203.03923), np.float32(-218.37454), np.float32(-263.49023), np.float32(-188.6056), np.float32(-236.01611), np.float32(-230.64677)]
2025-09-14 14:17:38,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:17:38,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (-235.44) for latency 21
2025-09-14 14:17:38,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 9 minutes, 16 seconds)
2025-09-14 14:20:44,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:20:56,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: -196.84935 ± 76.259
2025-09-14 14:20:56,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-347.97278), np.float32(-233.99817), np.float32(-188.17018), np.float32(-205.76279), np.float32(-210.32753), np.float32(-68.471405), np.float32(-239.26491), np.float32(-204.57628), np.float32(-196.63284), np.float32(-73.31664)]
2025-09-14 14:20:56,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:20:56,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (-196.85) for latency 21
2025-09-14 14:20:56,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 10 minutes, 53 seconds)
2025-09-14 14:24:07,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:24:18,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: -107.77393 ± 88.610
2025-09-14 14:24:18,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-165.24496), np.float32(-82.27577), np.float32(3.0204477), np.float32(-43.06635), np.float32(64.34419), np.float32(-221.82477), np.float32(-154.59535), np.float32(-118.15659), np.float32(-142.92035), np.float32(-217.01971)]
2025-09-14 14:24:18,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:24:18,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (-107.77) for latency 21
2025-09-14 14:24:18,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 11 minutes, 47 seconds)
2025-09-14 14:27:32,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:27:43,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: -9.67617 ± 125.641
2025-09-14 14:27:43,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-109.00814), np.float32(305.91217), np.float32(36.09878), np.float32(-98.717415), np.float32(-37.713314), np.float32(46.072994), np.float32(-36.544415), np.float32(-101.49629), np.float32(-155.47029), np.float32(54.10426)]
2025-09-14 14:27:43,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:27:43,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (-9.68) for latency 21
2025-09-14 14:27:43,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 11 minutes, 33 seconds)
2025-09-14 14:30:56,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:31:08,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 93.60093 ± 118.196
2025-09-14 14:31:08,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-62.156353), np.float32(202.02513), np.float32(42.529705), np.float32(211.91176), np.float32(136.50345), np.float32(134.64905), np.float32(311.98224), np.float32(-31.779774), np.float32(14.773401), np.float32(-24.429327)]
2025-09-14 14:31:08,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:31:08,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (93.60) for latency 21
2025-09-14 14:31:08,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 13 minutes, 57 seconds)
2025-09-14 14:34:21,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:34:32,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 212.44092 ± 40.010
2025-09-14 14:34:32,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(265.42896), np.float32(219.09155), np.float32(170.53667), np.float32(135.2316), np.float32(208.10742), np.float32(228.94669), np.float32(231.14864), np.float32(219.09718), np.float32(175.90784), np.float32(270.9126)]
2025-09-14 14:34:32,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:34:32,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (212.44) for latency 21
2025-09-14 14:34:32,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 14 minutes, 31 seconds)
2025-09-14 14:37:46,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:37:57,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 379.75607 ± 127.974
2025-09-14 14:37:57,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(373.31476), np.float32(368.26965), np.float32(602.0189), np.float32(410.95758), np.float32(253.44061), np.float32(361.653), np.float32(241.44778), np.float32(519.31165), np.float32(169.14589), np.float32(498.00064)]
2025-09-14 14:37:57,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:37:57,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (379.76) for latency 21
2025-09-14 14:37:57,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 13 minutes, 17 seconds)
2025-09-14 14:41:12,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:41:23,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 481.18146 ± 113.334
2025-09-14 14:41:23,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(493.35547), np.float32(472.36765), np.float32(605.6402), np.float32(475.57117), np.float32(448.20947), np.float32(328.08902), np.float32(555.0092), np.float32(243.06552), np.float32(595.6108), np.float32(594.89606)]
2025-09-14 14:41:23,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:41:23,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (481.18) for latency 21
2025-09-14 14:41:23,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 10 minutes, 43 seconds)
2025-09-14 14:44:36,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:44:47,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 532.60559 ± 313.530
2025-09-14 14:44:47,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(825.53577), np.float32(-303.9731), np.float32(487.0561), np.float32(804.2912), np.float32(558.1534), np.float32(654.11145), np.float32(521.891), np.float32(345.61948), np.float32(676.78076), np.float32(756.5898)]
2025-09-14 14:44:47,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:44:47,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (532.61) for latency 21
2025-09-14 14:44:47,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 7 minutes, 21 seconds)
2025-09-14 14:47:59,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:48:10,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 761.92267 ± 77.942
2025-09-14 14:48:10,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(829.38837), np.float32(733.4285), np.float32(842.0924), np.float32(603.69104), np.float32(736.354), np.float32(785.53064), np.float32(701.2591), np.float32(847.15967), np.float32(693.9079), np.float32(846.4157)]
2025-09-14 14:48:10,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:48:10,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (761.92) for latency 21
2025-09-14 14:48:10,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 3 minutes, 17 seconds)
2025-09-14 14:51:25,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:51:36,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 959.81824 ± 100.576
2025-09-14 14:51:36,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(892.8075), np.float32(1039.2264), np.float32(690.32556), np.float32(1040.0438), np.float32(968.3741), np.float32(972.2779), np.float32(988.265), np.float32(994.3552), np.float32(957.4997), np.float32(1055.0072)]
2025-09-14 14:51:36,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:51:36,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (959.82) for latency 21
2025-09-14 14:51:36,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 23 seconds)
2025-09-14 14:54:50,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:55:02,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 942.37451 ± 125.089
2025-09-14 14:55:02,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1054.5286), np.float32(687.9322), np.float32(814.33295), np.float32(1079.9829), np.float32(1036.6953), np.float32(947.3621), np.float32(796.59875), np.float32(1032.5071), np.float32(958.9552), np.float32(1014.85065)]
2025-09-14 14:55:02,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:55:02,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 57 minutes, 2 seconds)
2025-09-14 14:58:14,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:58:25,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1049.10254 ± 159.319
2025-09-14 14:58:25,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(971.8195), np.float32(1212.8241), np.float32(1082.3706), np.float32(994.9987), np.float32(709.7118), np.float32(1279.8313), np.float32(894.1407), np.float32(1144.7357), np.float32(1028.9115), np.float32(1171.6829)]
2025-09-14 14:58:25,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:58:25,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (1049.10) for latency 21
2025-09-14 14:58:25,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 53 minutes, 6 seconds)
2025-09-14 15:01:36,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:01:48,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 987.85498 ± 287.199
2025-09-14 15:01:48,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1242.0624), np.float32(983.823), np.float32(1147.701), np.float32(1021.69366), np.float32(968.57654), np.float32(170.5338), np.float32(1037.2162), np.float32(1212.4858), np.float32(996.39557), np.float32(1098.0612)]
2025-09-14 15:01:48,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:01:48,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 49 minutes, 5 seconds)
2025-09-14 15:05:02,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:05:14,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1145.36890 ± 252.420
2025-09-14 15:05:14,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1153.387), np.float32(1077.7151), np.float32(622.0316), np.float32(1119.8308), np.float32(1046.9352), np.float32(1513.5347), np.float32(1004.79425), np.float32(1564.3455), np.float32(1087.1724), np.float32(1263.9426)]
2025-09-14 15:05:14,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:05:14,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (1145.37) for latency 21
2025-09-14 15:05:14,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 46 minutes, 39 seconds)
2025-09-14 15:08:29,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:08:40,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1156.02234 ± 79.095
2025-09-14 15:08:40,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1272.4204), np.float32(1244.9722), np.float32(1144.9469), np.float32(1098.3043), np.float32(1085.954), np.float32(1114.557), np.float32(1290.6862), np.float32(1151.0374), np.float32(1057.5581), np.float32(1099.7882)]
2025-09-14 15:08:40,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:08:40,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (1156.02) for latency 21
2025-09-14 15:08:40,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 43 minutes, 11 seconds)
2025-09-14 15:11:54,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:12:05,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1210.03918 ± 217.165
2025-09-14 15:12:05,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1650.8867), np.float32(1041.1365), np.float32(1203.0435), np.float32(1219.9838), np.float32(1089.8752), np.float32(1036.9095), np.float32(1232.5996), np.float32(1157.9237), np.float32(917.9505), np.float32(1550.0817)]
2025-09-14 15:12:05,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:12:05,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (1210.04) for latency 21
2025-09-14 15:12:05,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 39 minutes, 39 seconds)
2025-09-14 15:15:15,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:15:26,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1198.70630 ± 554.478
2025-09-14 15:15:26,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1527.3417), np.float32(1120.336), np.float32(1181.2053), np.float32(1093.4441), np.float32(1017.9303), np.float32(2312.6191), np.float32(1395.8015), np.float32(-67.52928), np.float32(1074.0997), np.float32(1331.8158)]
2025-09-14 15:15:26,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:15:26,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 35 minutes, 36 seconds)
2025-09-14 15:18:40,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:18:51,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1322.59167 ± 240.025
2025-09-14 15:18:51,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1112.6073), np.float32(1075.8079), np.float32(1809.0787), np.float32(1287.1365), np.float32(1221.6409), np.float32(1457.0051), np.float32(1101.9485), np.float32(1657.9076), np.float32(1121.6249), np.float32(1381.1593)]
2025-09-14 15:18:51,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:18:51,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (1322.59) for latency 21
2025-09-14 15:18:51,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 32 minutes, 54 seconds)
2025-09-14 15:22:05,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:22:17,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1313.68250 ± 226.022
2025-09-14 15:22:17,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1113.7777), np.float32(1149.4938), np.float32(1190.4016), np.float32(984.8621), np.float32(1383.4392), np.float32(1779.1338), np.float32(1207.5981), np.float32(1447.3153), np.float32(1578.0245), np.float32(1302.7795)]
2025-09-14 15:22:17,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:22:17,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 29 minutes, 25 seconds)
2025-09-14 15:25:31,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:25:42,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1323.32520 ± 283.825
2025-09-14 15:25:42,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1887.655), np.float32(1074.5435), np.float32(1170.9647), np.float32(1400.5758), np.float32(1187.3832), np.float32(1091.424), np.float32(1039.2593), np.float32(1163.0242), np.float32(1773.3982), np.float32(1445.0238)]
2025-09-14 15:25:42,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:25:42,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (1323.33) for latency 21
2025-09-14 15:25:42,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 25 minutes, 46 seconds)
2025-09-14 15:28:54,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:29:05,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1480.96387 ± 222.279
2025-09-14 15:29:05,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1678.699), np.float32(1789.223), np.float32(1436.7538), np.float32(1664.6173), np.float32(1269.4519), np.float32(1654.9551), np.float32(1445.2411), np.float32(1116.7046), np.float32(1593.428), np.float32(1160.565)]
2025-09-14 15:29:05,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:29:05,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (1480.96) for latency 21
2025-09-14 15:29:05,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 21 minutes, 43 seconds)
2025-09-14 15:32:16,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:32:27,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1343.13196 ± 228.551
2025-09-14 15:32:27,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1235.7224), np.float32(1322.1705), np.float32(1761.5742), np.float32(1084.9299), np.float32(1137.3425), np.float32(1124.3044), np.float32(1187.1442), np.float32(1344.9606), np.float32(1663.9974), np.float32(1569.173)]
2025-09-14 15:32:27,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:32:27,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 18 minutes, 37 seconds)
2025-09-14 15:35:41,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:35:52,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1620.34143 ± 459.459
2025-09-14 15:35:52,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1332.2281), np.float32(1631.1505), np.float32(1367.2241), np.float32(2335.7432), np.float32(1326.8402), np.float32(1090.4785), np.float32(963.434), np.float32(1881.5332), np.float32(1978.1716), np.float32(2296.6104)]
2025-09-14 15:35:52,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:35:52,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (1620.34) for latency 21
2025-09-14 15:35:52,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 15 minutes, 18 seconds)
2025-09-14 15:39:05,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:39:16,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1371.85327 ± 247.422
2025-09-14 15:39:16,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1315.6011), np.float32(1069.7384), np.float32(1198.0184), np.float32(1763.6063), np.float32(1052.489), np.float32(1599.9731), np.float32(1441.928), np.float32(1683.343), np.float32(1114.0889), np.float32(1479.7465)]
2025-09-14 15:39:16,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:39:16,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 11 minutes, 20 seconds)
2025-09-14 15:42:29,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:42:40,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1275.24792 ± 192.959
2025-09-14 15:42:40,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1057.8762), np.float32(1085.8439), np.float32(1377.9664), np.float32(1357.7434), np.float32(1218.2966), np.float32(1170.6477), np.float32(1391.8811), np.float32(1495.052), np.float32(1610.5643), np.float32(986.6077)]
2025-09-14 15:42:40,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:42:40,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 7 minutes, 48 seconds)
2025-09-14 15:45:56,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:46:07,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1343.24744 ± 376.716
2025-09-14 15:46:07,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1482.1482), np.float32(1289.4563), np.float32(2077.848), np.float32(1065.3724), np.float32(1287.5996), np.float32(755.0517), np.float32(1011.1457), np.float32(1388.9849), np.float32(1886.1744), np.float32(1188.692)]
2025-09-14 15:46:07,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:46:07,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 5 minutes, 30 seconds)
2025-09-14 15:49:20,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:49:32,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1267.44373 ± 107.636
2025-09-14 15:49:32,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1200.3685), np.float32(1389.2904), np.float32(1277.3274), np.float32(1194.731), np.float32(1428.5381), np.float32(1208.0238), np.float32(1101.9508), np.float32(1408.973), np.float32(1155.8115), np.float32(1309.423)]
2025-09-14 15:49:32,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:49:32,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 2 minutes, 29 seconds)
2025-09-14 15:52:44,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:52:55,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1494.49011 ± 365.902
2025-09-14 15:52:55,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1360.5217), np.float32(1597.1423), np.float32(1565.526), np.float32(1304.8245), np.float32(1305.2964), np.float32(1658.4042), np.float32(1201.5781), np.float32(2471.3108), np.float32(1103.5459), np.float32(1376.7512)]
2025-09-14 15:52:55,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:52:55,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 58 minutes, 42 seconds)
2025-09-14 15:56:07,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:56:19,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1352.19507 ± 306.220
2025-09-14 15:56:19,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1128.09), np.float32(2177.7637), np.float32(1313.1927), np.float32(1128.9268), np.float32(1345.7205), np.float32(1595.2107), np.float32(1246.8458), np.float32(1171.51), np.float32(1277.1383), np.float32(1137.5518)]
2025-09-14 15:56:19,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:56:19,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 55 minutes, 17 seconds)
2025-09-14 15:59:34,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:59:46,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1891.77283 ± 663.366
2025-09-14 15:59:46,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2851.121), np.float32(1974.9684), np.float32(2566.1287), np.float32(1351.5492), np.float32(1289.9658), np.float32(2396.9602), np.float32(1126.3809), np.float32(2772.4644), np.float32(1164.8594), np.float32(1423.3306)]
2025-09-14 15:59:46,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:59:46,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (1891.77) for latency 21
2025-09-14 15:59:46,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 52 minutes, 22 seconds)
2025-09-14 16:03:01,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:03:13,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1598.44495 ± 314.120
2025-09-14 16:03:13,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2062.812), np.float32(1324.7993), np.float32(1294.602), np.float32(1746.834), np.float32(1315.199), np.float32(1738.3009), np.float32(1946.9308), np.float32(1165.9023), np.float32(1972.9807), np.float32(1416.088)]
2025-09-14 16:03:13,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:03:13,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 48 minutes, 57 seconds)
2025-09-14 16:06:25,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:06:36,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1486.00073 ± 580.131
2025-09-14 16:06:36,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(905.71387), np.float32(1094.7327), np.float32(2281.2537), np.float32(1139.1543), np.float32(1694.3617), np.float32(2770.6892), np.float32(1622.9585), np.float32(1108.5848), np.float32(1090.5758), np.float32(1151.9818)]
2025-09-14 16:06:36,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:06:36,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 45 minutes, 27 seconds)
2025-09-14 16:09:49,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:10:01,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1397.95911 ± 283.451
2025-09-14 16:10:01,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1159.0719), np.float32(1246.6914), np.float32(1193.6497), np.float32(1280.8129), np.float32(1188.8641), np.float32(1787.031), np.float32(1775.6808), np.float32(1912.825), np.float32(1221.2194), np.float32(1213.7444)]
2025-09-14 16:10:01,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:10:01,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 42 minutes, 8 seconds)
2025-09-14 16:13:16,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:13:27,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1470.50867 ± 186.608
2025-09-14 16:13:27,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1234.8785), np.float32(1649.2727), np.float32(1463.2753), np.float32(1500.5261), np.float32(1577.2015), np.float32(1710.1119), np.float32(1414.9762), np.float32(1136.1492), np.float32(1318.1609), np.float32(1700.5338)]
2025-09-14 16:13:27,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:13:27,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 39 minutes, 19 seconds)
2025-09-14 16:16:41,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:16:53,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1346.31775 ± 456.123
2025-09-14 16:16:53,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1120.9), np.float32(1296.1921), np.float32(1464.999), np.float32(2038.438), np.float32(1227.5302), np.float32(1154.5117), np.float32(1284.0747), np.float32(396.60013), np.float32(1374.324), np.float32(2105.6074)]
2025-09-14 16:16:53,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:16:53,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 35 minutes, 42 seconds)
2025-09-14 16:20:05,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:20:15,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1619.04858 ± 456.484
2025-09-14 16:20:15,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1168.6035), np.float32(1584.4254), np.float32(1702.0825), np.float32(1144.8665), np.float32(1839.6626), np.float32(2732.961), np.float32(1437.1404), np.float32(1311.7524), np.float32(1973.9539), np.float32(1295.0374)]
2025-09-14 16:20:15,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:20:15,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 31 minutes, 18 seconds)
2025-09-14 16:23:28,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:23:39,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1420.36938 ± 320.741
2025-09-14 16:23:39,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1303.5743), np.float32(1288.0138), np.float32(1198.7344), np.float32(1084.733), np.float32(1374.9457), np.float32(1767.6033), np.float32(1189.6083), np.float32(1862.0911), np.float32(2026.3765), np.float32(1108.0139)]
2025-09-14 16:23:39,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:23:39,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 27 minutes, 59 seconds)
2025-09-14 16:26:53,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:27:04,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1474.34497 ± 362.067
2025-09-14 16:27:04,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1718.2573), np.float32(1167.97), np.float32(1643.9745), np.float32(1145.8805), np.float32(1371.7666), np.float32(1536.8867), np.float32(1252.444), np.float32(1405.0801), np.float32(2382.973), np.float32(1118.2175)]
2025-09-14 16:27:04,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:27:04,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 24 minutes, 45 seconds)
2025-09-14 16:30:18,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:30:30,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1429.67554 ± 291.032
2025-09-14 16:30:30,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1953.2882), np.float32(1929.5062), np.float32(1405.0787), np.float32(1105.8699), np.float32(1227.4186), np.float32(1274.761), np.float32(1290.7903), np.float32(1279.1355), np.float32(1180.6381), np.float32(1650.2698)]
2025-09-14 16:30:30,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:30:30,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 21 minutes, 7 seconds)
2025-09-14 16:33:43,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:33:54,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1888.74353 ± 574.863
2025-09-14 16:33:54,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2627.554), np.float32(2153.3196), np.float32(1270.7551), np.float32(1201.0013), np.float32(1871.2048), np.float32(2214.3123), np.float32(2965.0005), np.float32(1716.732), np.float32(1684.5056), np.float32(1183.0511)]
2025-09-14 16:33:54,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:33:54,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 17 minutes, 30 seconds)
2025-09-14 16:37:09,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:37:20,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1448.07275 ± 584.684
2025-09-14 16:37:20,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1588.8507), np.float32(1901.7891), np.float32(1109.6152), np.float32(2213.8616), np.float32(1920.4099), np.float32(1470.9996), np.float32(1287.7334), np.float32(1185.7083), np.float32(13.502114), np.float32(1788.2578)]
2025-09-14 16:37:20,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:37:20,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 14 minutes, 45 seconds)
2025-09-14 16:40:33,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:40:44,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1489.64587 ± 428.614
2025-09-14 16:40:44,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1136.4412), np.float32(2070.919), np.float32(1224.7672), np.float32(1930.1831), np.float32(1253.89), np.float32(1191.7513), np.float32(2367.225), np.float32(1314.1293), np.float32(1228.9313), np.float32(1178.2205)]
2025-09-14 16:40:44,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:40:44,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 11 minutes, 17 seconds)
2025-09-14 16:43:57,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:44:08,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1384.90466 ± 612.028
2025-09-14 16:44:08,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1363.261), np.float32(2220.8718), np.float32(1573.7794), np.float32(-106.78103), np.float32(1305.3405), np.float32(1482.6212), np.float32(1864.9829), np.float32(1094.6271), np.float32(1069.2594), np.float32(1981.0845)]
2025-09-14 16:44:08,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:44:08,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 7 minutes, 42 seconds)
2025-09-14 16:47:21,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:47:33,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1380.07129 ± 289.221
2025-09-14 16:47:33,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1189.8383), np.float32(1402.3589), np.float32(1349.1421), np.float32(1847.9198), np.float32(1739.568), np.float32(1795.1456), np.float32(1124.4761), np.float32(1181.2512), np.float32(1080.711), np.float32(1090.3014)]
2025-09-14 16:47:33,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:47:33,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 4 minutes, 10 seconds)
2025-09-14 16:50:48,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:50:59,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1803.96606 ± 424.415
2025-09-14 16:50:59,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2104.5708), np.float32(1597.9004), np.float32(1895.5244), np.float32(1228.545), np.float32(2453.401), np.float32(1905.2755), np.float32(1304.5078), np.float32(2405.196), np.float32(1871.3066), np.float32(1273.434)]
2025-09-14 16:50:59,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:50:59,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 1 minute, 5 seconds)
2025-09-14 16:54:13,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:54:24,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1399.13745 ± 358.729
2025-09-14 16:54:24,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1132.6919), np.float32(1193.8712), np.float32(1474.4357), np.float32(1113.7864), np.float32(1279.0697), np.float32(1223.5358), np.float32(2186.7703), np.float32(950.32947), np.float32(1807.002), np.float32(1629.8827)]
2025-09-14 16:54:24,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:54:24,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 57 minutes, 31 seconds)
2025-09-14 16:57:35,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:57:46,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1515.24170 ± 449.295
2025-09-14 16:57:46,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1265.8456), np.float32(1500.3488), np.float32(2319.553), np.float32(1193.8983), np.float32(1125.9508), np.float32(1191.0425), np.float32(1014.181), np.float32(1840.0105), np.float32(1422.0275), np.float32(2279.5593)]
2025-09-14 16:57:46,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:57:46,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 53 minutes, 43 seconds)
2025-09-14 17:01:00,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:01:11,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1570.87903 ± 417.912
2025-09-14 17:01:11,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1283.7955), np.float32(1336.4896), np.float32(1350.1648), np.float32(1192.6237), np.float32(2327.8987), np.float32(1469.5098), np.float32(1750.8733), np.float32(2257.1765), np.float32(1016.5767), np.float32(1723.6815)]
2025-09-14 17:01:11,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:01:11,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 50 minutes, 30 seconds)
2025-09-14 17:04:26,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:04:37,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1517.81885 ± 575.620
2025-09-14 17:04:37,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1214.449), np.float32(1449.9049), np.float32(1420.0397), np.float32(3218.3499), np.float32(1163.52), np.float32(1361.777), np.float32(1467.5287), np.float32(1379.1377), np.float32(1284.4109), np.float32(1219.0702)]
2025-09-14 17:04:37,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:04:37,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 47 minutes, 21 seconds)
2025-09-14 17:07:51,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:08:02,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2114.76221 ± 776.369
2025-09-14 17:08:02,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1269.4495), np.float32(1196.0011), np.float32(2106.4983), np.float32(2671.2017), np.float32(3461.7847), np.float32(1488.6078), np.float32(2912.2698), np.float32(1279.1376), np.float32(1893.3953), np.float32(2869.278)]
2025-09-14 17:08:02,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:08:02,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (2114.76) for latency 21
2025-09-14 17:08:02,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 43 minutes, 42 seconds)
2025-09-14 17:11:14,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:11:25,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1624.50635 ± 478.198
2025-09-14 17:11:25,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1232.8986), np.float32(1201.5845), np.float32(2423.766), np.float32(1185.1565), np.float32(2494.2383), np.float32(1463.6489), np.float32(1310.4664), np.float32(1261.8433), np.float32(1733.2949), np.float32(1938.1674)]
2025-09-14 17:11:25,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:11:25,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 39 minutes, 56 seconds)
2025-09-14 17:14:38,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:14:49,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2247.10229 ± 862.726
2025-09-14 17:14:49,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1152.367), np.float32(3266.182), np.float32(1960.635), np.float32(2672.1958), np.float32(1393.5398), np.float32(2381.7031), np.float32(3461.1653), np.float32(1293.6669), np.float32(3376.289), np.float32(1513.2798)]
2025-09-14 17:14:49,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:14:49,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (2247.10) for latency 21
2025-09-14 17:14:49,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 36 minutes, 55 seconds)
2025-09-14 17:18:04,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:18:15,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1387.24243 ± 283.118
2025-09-14 17:18:15,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1279.907), np.float32(1387.6842), np.float32(1293.51), np.float32(1946.2098), np.float32(1905.2208), np.float32(1219.2782), np.float32(1360.225), np.float32(1108.7169), np.float32(1105.6649), np.float32(1266.0071)]
2025-09-14 17:18:15,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:18:15,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 33 minutes, 36 seconds)
2025-09-14 17:21:28,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:21:39,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1647.70605 ± 493.274
2025-09-14 17:21:39,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1147.2605), np.float32(1635.609), np.float32(1595.4178), np.float32(1163.7454), np.float32(1621.3007), np.float32(1390.837), np.float32(2318.1348), np.float32(1457.9851), np.float32(2791.4001), np.float32(1355.3685)]
2025-09-14 17:21:39,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:21:39,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 29 minutes, 50 seconds)
2025-09-14 17:24:53,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:25:05,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1615.68945 ± 450.426
2025-09-14 17:25:05,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1349.1927), np.float32(1445.8792), np.float32(1795.4617), np.float32(635.72327), np.float32(1764.3062), np.float32(2054.17), np.float32(2292.4587), np.float32(1717.8606), np.float32(1901.0138), np.float32(1200.8284)]
2025-09-14 17:25:05,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:25:05,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 26 minutes, 30 seconds)
2025-09-14 17:28:20,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:28:31,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1408.29858 ± 185.929
2025-09-14 17:28:31,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1362.2982), np.float32(1276.2201), np.float32(1642.8508), np.float32(1518.8967), np.float32(1162.444), np.float32(1407.3438), np.float32(1810.0483), np.float32(1314.2426), np.float32(1327.5631), np.float32(1261.0781)]
2025-09-14 17:28:31,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:28:31,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 23 minutes, 33 seconds)
2025-09-14 17:31:44,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:31:55,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1707.71033 ± 518.354
2025-09-14 17:31:55,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(954.5098), np.float32(2182.4038), np.float32(1574.7739), np.float32(1439.7643), np.float32(1910.8375), np.float32(2787.0447), np.float32(2107.7625), np.float32(1521.8168), np.float32(1486.2434), np.float32(1111.949)]
2025-09-14 17:31:55,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:31:55,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 20 minutes, 7 seconds)
2025-09-14 17:35:08,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:35:19,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1468.15881 ± 268.578
2025-09-14 17:35:19,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1298.4559), np.float32(1394.4352), np.float32(2011.8225), np.float32(1352.5222), np.float32(1186.4275), np.float32(1217.1622), np.float32(1465.4375), np.float32(1600.2493), np.float32(1889.3168), np.float32(1265.7593)]
2025-09-14 17:35:19,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:35:19,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 16 minutes, 25 seconds)
2025-09-14 17:38:33,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:38:44,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1461.00757 ± 279.757
2025-09-14 17:38:44,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1337.5723), np.float32(1692.4542), np.float32(1356.3865), np.float32(1197.963), np.float32(2008.833), np.float32(1683.8033), np.float32(1075.0955), np.float32(1490.0977), np.float32(1631.2141), np.float32(1136.6577)]
2025-09-14 17:38:44,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:38:44,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 13 minutes, 13 seconds)
2025-09-14 17:41:59,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:42:10,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1593.26514 ± 399.475
2025-09-14 17:42:10,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1575.642), np.float32(2047.3029), np.float32(2430.9102), np.float32(1271.9562), np.float32(2001.6443), np.float32(1438.1023), np.float32(1374.847), np.float32(1261.3273), np.float32(1163.4698), np.float32(1367.4495)]
2025-09-14 17:42:10,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:42:10,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 9 minutes, 50 seconds)
2025-09-14 17:45:23,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:45:34,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1550.70801 ± 393.968
2025-09-14 17:45:34,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2492.4956), np.float32(1481.8927), np.float32(1970.1055), np.float32(1362.3704), np.float32(1776.158), np.float32(1256.8042), np.float32(1197.5765), np.float32(1468.1638), np.float32(1236.9258), np.float32(1264.5883)]
2025-09-14 17:45:34,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:45:34,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 6 minutes, 14 seconds)
2025-09-14 17:48:46,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:48:56,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1317.37097 ± 168.025
2025-09-14 17:48:56,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1185.022), np.float32(1338.125), np.float32(1338.9362), np.float32(1363.052), np.float32(1177.1531), np.float32(1453.2393), np.float32(1183.762), np.float32(1026.126), np.float32(1487.4209), np.float32(1620.8733)]
2025-09-14 17:48:56,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:48:56,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 2 minutes, 34 seconds)
2025-09-14 17:52:11,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:52:22,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1557.66724 ± 347.062
2025-09-14 17:52:22,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1322.3652), np.float32(2026.4797), np.float32(1122.5171), np.float32(1308.4564), np.float32(1173.1708), np.float32(1672.2573), np.float32(1433.083), np.float32(2255.276), np.float32(1675.403), np.float32(1587.6632)]
2025-09-14 17:52:22,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:52:22,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 59 minutes, 26 seconds)
2025-09-14 17:55:38,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:55:49,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1587.04260 ± 361.308
2025-09-14 17:55:49,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1278.4779), np.float32(1587.8859), np.float32(1200.4653), np.float32(1377.1135), np.float32(1320.4012), np.float32(2476.949), np.float32(1772.3647), np.float32(1753.0116), np.float32(1763.206), np.float32(1340.5505)]
2025-09-14 17:55:49,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:55:49,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 56 minutes, 10 seconds)
2025-09-14 17:59:03,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:59:15,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1591.69763 ± 346.346
2025-09-14 17:59:15,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1479.8501), np.float32(1178.7697), np.float32(1524.6427), np.float32(1456.5466), np.float32(1192.1957), np.float32(1919.9292), np.float32(1235.4897), np.float32(2119.13), np.float32(1655.4252), np.float32(2154.998)]
2025-09-14 17:59:15,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:59:15,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 52 minutes, 43 seconds)
2025-09-14 18:02:25,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:02:37,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1549.62036 ± 285.159
2025-09-14 18:02:37,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1581.1323), np.float32(1833.7356), np.float32(1401.6641), np.float32(1110.1324), np.float32(1840.3635), np.float32(1172.454), np.float32(1997.939), np.float32(1643.3302), np.float32(1280.3113), np.float32(1635.1422)]
2025-09-14 18:02:37,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:02:37,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 49 minutes, 3 seconds)
2025-09-14 18:05:49,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:06:01,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1583.13818 ± 407.801
2025-09-14 18:06:01,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2106.3796), np.float32(1256.0614), np.float32(1170.9944), np.float32(2237.3157), np.float32(1539.2728), np.float32(2212.4988), np.float32(1268.7933), np.float32(1231.7848), np.float32(1410.7323), np.float32(1397.5485)]
2025-09-14 18:06:01,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:06:01,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 45 minutes, 52 seconds)
2025-09-14 18:09:15,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:09:26,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1533.61755 ± 382.946
2025-09-14 18:09:26,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1132.3157), np.float32(1417.1461), np.float32(2161.584), np.float32(2001.1478), np.float32(1542.1802), np.float32(1171.103), np.float32(2043.649), np.float32(1550.5388), np.float32(1209.4279), np.float32(1107.0826)]
2025-09-14 18:09:26,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:09:26,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 42 minutes, 21 seconds)
2025-09-14 18:12:40,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:12:52,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1655.65332 ± 670.036
2025-09-14 18:12:52,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1301.1287), np.float32(1628.0868), np.float32(1837.3552), np.float32(1154.3163), np.float32(3490.354), np.float32(1262.514), np.float32(2006.2389), np.float32(1410.6162), np.float32(1327.0465), np.float32(1138.8762)]
2025-09-14 18:12:52,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:12:52,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 38 minutes, 51 seconds)
2025-09-14 18:16:05,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:16:16,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2063.51392 ± 593.520
2025-09-14 18:16:16,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1388.8318), np.float32(2375.8523), np.float32(2852.4512), np.float32(2137.3503), np.float32(2970.6028), np.float32(1234.7178), np.float32(1435.9288), np.float32(2618.8433), np.float32(1840.3558), np.float32(1780.2052)]
2025-09-14 18:16:16,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:16:16,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 35 minutes, 19 seconds)
2025-09-14 18:19:30,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:19:41,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1265.72156 ± 93.645
2025-09-14 18:19:41,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1242.9398), np.float32(1190.9033), np.float32(1092.9784), np.float32(1194.3019), np.float32(1327.3694), np.float32(1295.8118), np.float32(1466.6779), np.float32(1284.6602), np.float32(1299.9502), np.float32(1261.6228)]
2025-09-14 18:19:41,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:19:41,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 32 minutes, 11 seconds)
2025-09-14 18:22:56,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:23:07,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1696.91699 ± 454.725
2025-09-14 18:23:07,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1820.706), np.float32(1936.3054), np.float32(1337.4442), np.float32(2078.0322), np.float32(1041.5089), np.float32(1637.8889), np.float32(1111.0753), np.float32(1643.435), np.float32(1695.5321), np.float32(2667.242)]
2025-09-14 18:23:07,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:23:07,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 28 minutes, 53 seconds)
2025-09-14 18:26:21,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:26:32,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1543.93237 ± 633.456
2025-09-14 18:26:32,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3340.8586), np.float32(1540.4346), np.float32(1267.9888), np.float32(1885.8932), np.float32(1181.1261), np.float32(1312.3385), np.float32(1278.2294), np.float32(1203.1719), np.float32(1184.8182), np.float32(1244.4637)]
2025-09-14 18:26:32,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:26:32,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 25 minutes, 30 seconds)
2025-09-14 18:29:45,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:29:56,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1583.21375 ± 455.574
2025-09-14 18:29:56,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1181.0853), np.float32(1276.4109), np.float32(1547.1184), np.float32(1256.61), np.float32(2400.561), np.float32(1521.463), np.float32(2333.6301), np.float32(1963.5527), np.float32(1183.9718), np.float32(1167.7344)]
2025-09-14 18:29:56,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:29:56,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 21 minutes, 58 seconds)
2025-09-14 18:33:11,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:33:23,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1735.92419 ± 608.409
2025-09-14 18:33:23,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2343.4941), np.float32(1519.6752), np.float32(1921.2411), np.float32(1658.2972), np.float32(1137.5713), np.float32(1530.7258), np.float32(1126.5564), np.float32(1627.8829), np.float32(3231.7886), np.float32(1262.0104)]
2025-09-14 18:33:23,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:33:23,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 18 minutes, 43 seconds)
2025-09-14 18:36:33,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:36:44,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1961.55542 ± 698.193
2025-09-14 18:36:44,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2412.6487), np.float32(1710.065), np.float32(1408.0548), np.float32(1702.3907), np.float32(3363.474), np.float32(1004.28485), np.float32(1702.3813), np.float32(1788.893), np.float32(2995.0286), np.float32(1528.3348)]
2025-09-14 18:36:44,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:36:44,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 15 minutes, 2 seconds)
2025-09-14 18:39:54,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:40:05,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1622.54871 ± 586.260
2025-09-14 18:40:05,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1307.1371), np.float32(1268.053), np.float32(1305.4852), np.float32(1898.0786), np.float32(1167.1859), np.float32(1451.5017), np.float32(1157.8723), np.float32(1722.4426), np.float32(3223.3352), np.float32(1724.3964)]
2025-09-14 18:40:05,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:40:05,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 11 minutes, 18 seconds)
2025-09-14 18:43:15,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:43:26,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1777.38147 ± 448.750
2025-09-14 18:43:26,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1059.704), np.float32(2020.7406), np.float32(1294.9683), np.float32(2183.6992), np.float32(1600.5831), np.float32(2260.894), np.float32(2366.8723), np.float32(1998.5605), np.float32(1825.1638), np.float32(1162.6285)]
2025-09-14 18:43:26,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:43:26,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 7 minutes, 36 seconds)
2025-09-14 18:46:38,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:46:48,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1950.49292 ± 666.852
2025-09-14 18:46:48,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2409.655), np.float32(2401.4355), np.float32(1139.455), np.float32(1178.2919), np.float32(1711.1377), np.float32(1377.1282), np.float32(3152.2297), np.float32(1337.7085), np.float32(2106.774), np.float32(2691.1128)]
2025-09-14 18:46:48,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:46:48,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 4 minutes, 4 seconds)
2025-09-14 18:49:58,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:50:08,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1982.35608 ± 585.051
2025-09-14 18:50:08,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1532.0916), np.float32(2250.6003), np.float32(1287.4633), np.float32(1440.1533), np.float32(2173.7966), np.float32(2565.4175), np.float32(1496.1926), np.float32(3217.0134), np.float32(2268.468), np.float32(1592.3641)]
2025-09-14 18:50:08,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:50:08,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 20 seconds)
2025-09-14 18:53:17,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:53:28,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2127.60791 ± 495.763
2025-09-14 18:53:28,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2896.2588), np.float32(1460.2506), np.float32(1719.8091), np.float32(1582.9604), np.float32(2671.9727), np.float32(2218.416), np.float32(2628.2964), np.float32(2531.792), np.float32(1746.0322), np.float32(1820.2917)]
2025-09-14 18:53:28,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:53:28,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 56 minutes, 53 seconds)
2025-09-14 18:56:39,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:56:50,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1893.95862 ± 578.562
2025-09-14 18:56:50,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2015.3599), np.float32(2863.7104), np.float32(1675.304), np.float32(1266.6827), np.float32(1836.1125), np.float32(1227.571), np.float32(2325.365), np.float32(1742.602), np.float32(1195.1074), np.float32(2791.7725)]
2025-09-14 18:56:50,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:56:50,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 53 minutes, 34 seconds)
2025-09-14 19:00:02,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:00:13,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2104.92334 ± 777.279
2025-09-14 19:00:13,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3001.834), np.float32(3088.564), np.float32(1221.8348), np.float32(1474.1814), np.float32(2688.2378), np.float32(1163.3087), np.float32(3115.5898), np.float32(1201.0281), np.float32(1947.4708), np.float32(2147.1858)]
2025-09-14 19:00:13,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:00:13,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 50 minutes, 20 seconds)
2025-09-14 19:03:26,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:03:37,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1726.77173 ± 534.821
2025-09-14 19:03:37,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1677.7969), np.float32(2094.8586), np.float32(2144.2556), np.float32(1186.3435), np.float32(1434.2058), np.float32(1070.8103), np.float32(2737.8254), np.float32(1409.4015), np.float32(2310.8792), np.float32(1201.339)]
2025-09-14 19:03:37,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:03:37,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 47 minutes, 5 seconds)
2025-09-14 19:06:48,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:06:59,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1717.73999 ± 427.441
2025-09-14 19:06:59,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1554.2539), np.float32(1336.8479), np.float32(1717.8416), np.float32(2570.1187), np.float32(1665.3579), np.float32(1859.7098), np.float32(1204.9618), np.float32(1355.9746), np.float32(2402.427), np.float32(1509.9066)]
2025-09-14 19:06:59,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:06:59,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 43 minutes, 47 seconds)
2025-09-14 19:10:13,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:10:24,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1602.39673 ± 401.255
2025-09-14 19:10:24,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1315.6034), np.float32(1963.3807), np.float32(1354.5576), np.float32(2006.6553), np.float32(1112.0817), np.float32(2433.3389), np.float32(1183.4875), np.float32(1705.326), np.float32(1367.2672), np.float32(1582.2676)]
2025-09-14 19:10:24,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:10:24,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 40 minutes, 38 seconds)
2025-09-14 19:13:39,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:13:50,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1747.19824 ± 421.112
2025-09-14 19:13:50,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1373.3741), np.float32(1123.4474), np.float32(2053.1497), np.float32(2166.5974), np.float32(1991.1416), np.float32(1834.9907), np.float32(1258.6349), np.float32(1952.8142), np.float32(2407.5002), np.float32(1310.3324)]
2025-09-14 19:13:50,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:13:50,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 37 minutes, 24 seconds)
2025-09-14 19:17:03,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:17:14,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1810.90039 ± 578.879
2025-09-14 19:17:14,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1695.0128), np.float32(1959.8005), np.float32(1444.2854), np.float32(1160.7257), np.float32(1227.9811), np.float32(2057.2341), np.float32(2437.494), np.float32(3026.3416), np.float32(1130.6499), np.float32(1969.4775)]
2025-09-14 19:17:14,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:17:14,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 34 minutes, 2 seconds)
2025-09-14 19:20:25,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:20:36,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1733.55762 ± 667.893
2025-09-14 19:20:36,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1265.7507), np.float32(2017.0325), np.float32(1300.6948), np.float32(2072.0981), np.float32(3487.9148), np.float32(1320.7517), np.float32(1230.2694), np.float32(1283.744), np.float32(1404.9872), np.float32(1952.3334)]
2025-09-14 19:20:36,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:20:36,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 30 minutes, 33 seconds)
2025-09-14 19:23:50,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:24:01,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1536.16614 ± 325.567
2025-09-14 19:24:01,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1389.1119), np.float32(1321.1837), np.float32(2403.3232), np.float32(1223.3552), np.float32(1601.7571), np.float32(1713.8884), np.float32(1590.4331), np.float32(1490.2178), np.float32(1276.7943), np.float32(1351.5963)]
2025-09-14 19:24:01,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:24:01,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 27 minutes, 15 seconds)
2025-09-14 19:27:15,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:27:26,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1784.78479 ± 733.215
2025-09-14 19:27:26,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1496.7625), np.float32(1641.4855), np.float32(2798.3792), np.float32(1118.5425), np.float32(2677.4055), np.float32(1130.7087), np.float32(1050.6044), np.float32(1731.1162), np.float32(3059.098), np.float32(1143.744)]
2025-09-14 19:27:26,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:27:26,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 23 minutes, 51 seconds)
2025-09-14 19:30:39,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:30:51,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1718.09924 ± 462.746
2025-09-14 19:30:51,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2422.0342), np.float32(1448.7446), np.float32(1181.2802), np.float32(1388.1898), np.float32(1276.6929), np.float32(1564.5709), np.float32(1676.6469), np.float32(1519.6826), np.float32(2590.5002), np.float32(2112.6494)]
2025-09-14 19:30:51,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:30:51,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 20 minutes, 25 seconds)
2025-09-14 19:34:04,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:34:15,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1593.12817 ± 551.171
2025-09-14 19:34:15,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1344.5836), np.float32(1396.1936), np.float32(1303.8185), np.float32(1960.4939), np.float32(1187.6497), np.float32(1413.0712), np.float32(1328.7229), np.float32(3138.517), np.float32(1499.5076), np.float32(1358.7231)]
2025-09-14 19:34:15,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:34:15,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 17 minutes, 1 second)
2025-09-14 19:37:27,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:37:38,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2097.23633 ± 725.031
2025-09-14 19:37:38,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1627.139), np.float32(1360.4452), np.float32(1840.763), np.float32(1805.0048), np.float32(2632.1394), np.float32(2061.558), np.float32(3176.823), np.float32(1661.4471), np.float32(1282.979), np.float32(3524.0667)]
2025-09-14 19:37:38,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:37:38,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 13 minutes, 37 seconds)
2025-09-14 19:40:52,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:41:03,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1755.88403 ± 927.368
2025-09-14 19:41:03,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1535.4507), np.float32(1230.4957), np.float32(1404.567), np.float32(1792.3052), np.float32(3485.6726), np.float32(1762.9622), np.float32(-73.333984), np.float32(3094.685), np.float32(1700.7603), np.float32(1625.2747)]
2025-09-14 19:41:03,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:41:03,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 10 minutes, 13 seconds)
2025-09-14 19:44:17,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:44:28,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 1792.38928 ± 573.240
2025-09-14 19:44:28,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1579.0342), np.float32(1219.8418), np.float32(1254.3187), np.float32(1283.7191), np.float32(2166.6836), np.float32(2095.4258), np.float32(3001.635), np.float32(1451.9083), np.float32(2442.2505), np.float32(1429.0757)]
2025-09-14 19:44:28,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:44:28,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 48 seconds)
2025-09-14 19:47:28,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:47:38,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2517.84326 ± 875.475
2025-09-14 19:47:38,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1396.2311), np.float32(2238.1335), np.float32(2351.9956), np.float32(3549.7996), np.float32(1810.9304), np.float32(1098.5923), np.float32(3490.6304), np.float32(3389.8416), np.float32(3509.9897), np.float32(2342.2876)]
2025-09-14 19:47:38,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:47:38,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1226 [INFO]: New best (2517.84) for latency 21
2025-09-14 19:47:38,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 21 seconds)
2025-09-14 19:50:22,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:50:31,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1221 [DEBUG]: Total Reward: 2135.73584 ± 575.563
2025-09-14 19:50:31,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2577.2188), np.float32(2064.863), np.float32(2837.4814), np.float32(1910.2522), np.float32(3006.603), np.float32(1425.179), np.float32(1297.2726), np.float32(1535.8185), np.float32(2077.178), np.float32(2625.493)]
2025-09-14 19:50:31,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:50:31,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-halfcheetah):1251 [DEBUG]: Training session finished
