2025-09-14 13:39:36,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.050-delay_21
2025-09-14 13:39:36,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.050-delay_21
2025-09-14 13:39:36,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'21': <latency_env.delayed_mdp.ConstantDelay object at 0x7fcea29866f0>}
2025-09-14 13:39:36,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 13:39:36,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 13:39:36,844 baseline-bpql-noisepromille50-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=143, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 13:39:36,844 baseline-bpql-noisepromille50-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 13:39:37,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 13:39:37,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 13:41:48,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 13:41:56,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: -319.92960 ± 115.192
2025-09-14 13:41:56,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-266.8668), np.float32(-188.5337), np.float32(-269.41666), np.float32(-517.90576), np.float32(-274.4323), np.float32(-561.7553), np.float32(-254.93314), np.float32(-244.26523), np.float32(-320.405), np.float32(-300.782)]
2025-09-14 13:41:56,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:41:56,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (-319.93) for latency 21
2025-09-14 13:41:56,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 48 minutes, 15 seconds)
2025-09-14 13:44:07,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 13:44:16,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: -220.25334 ± 68.917
2025-09-14 13:44:16,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-238.91136), np.float32(-221.99223), np.float32(-70.75902), np.float32(-258.84167), np.float32(-280.28757), np.float32(-284.02124), np.float32(-279.1144), np.float32(-266.3927), np.float32(-166.61438), np.float32(-135.59879)]
2025-09-14 13:44:16,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:44:16,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (-220.25) for latency 21
2025-09-14 13:44:16,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 47 minutes, 10 seconds)
2025-09-14 13:46:26,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 13:46:34,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: -145.15852 ± 56.393
2025-09-14 13:46:34,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-103.6285), np.float32(-98.11913), np.float32(-220.32298), np.float32(-191.52245), np.float32(-63.611763), np.float32(-171.70049), np.float32(-131.07059), np.float32(-69.51135), np.float32(-184.58464), np.float32(-217.51334)]
2025-09-14 13:46:34,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:46:34,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (-145.16) for latency 21
2025-09-14 13:46:34,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 44 minutes, 22 seconds)
2025-09-14 13:48:44,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 13:48:52,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: -89.07787 ± 27.668
2025-09-14 13:48:52,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-109.08598), np.float32(-97.971214), np.float32(-124.26917), np.float32(-138.499), np.float32(-57.299507), np.float32(-63.813572), np.float32(-54.16343), np.float32(-96.61155), np.float32(-83.13502), np.float32(-65.93026)]
2025-09-14 13:48:52,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:48:52,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (-89.08) for latency 21
2025-09-14 13:48:52,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 41 minutes, 51 seconds)
2025-09-14 13:51:03,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 13:51:11,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 9.27741 ± 67.372
2025-09-14 13:51:11,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-40.69593), np.float32(68.549866), np.float32(18.265202), np.float32(52.24119), np.float32(-63.845955), np.float32(19.018604), np.float32(-78.27938), np.float32(42.838104), np.float32(-66.68688), np.float32(141.36925)]
2025-09-14 13:51:11,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:51:11,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (9.28) for latency 21
2025-09-14 13:51:11,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 39 minutes, 32 seconds)
2025-09-14 13:53:22,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 13:53:30,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 176.23694 ± 67.290
2025-09-14 13:53:30,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(232.20592), np.float32(198.20169), np.float32(14.789533), np.float32(271.0899), np.float32(237.44467), np.float32(158.0076), np.float32(164.89388), np.float32(131.49379), np.float32(163.12614), np.float32(191.11636)]
2025-09-14 13:53:30,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:53:30,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (176.24) for latency 21
2025-09-14 13:53:30,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 37 minutes, 36 seconds)
2025-09-14 13:55:45,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 13:55:53,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 365.40881 ± 143.904
2025-09-14 13:55:53,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(461.64627), np.float32(492.73462), np.float32(255.65173), np.float32(262.79263), np.float32(122.87185), np.float32(147.73038), np.float32(488.3534), np.float32(478.7077), np.float32(505.22266), np.float32(438.37698)]
2025-09-14 13:55:53,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:55:53,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (365.41) for latency 21
2025-09-14 13:55:53,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 36 minutes, 18 seconds)
2025-09-14 13:58:04,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 13:58:12,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 564.93585 ± 121.833
2025-09-14 13:58:12,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(525.2122), np.float32(632.5884), np.float32(660.41064), np.float32(369.45084), np.float32(629.1081), np.float32(710.5062), np.float32(367.59366), np.float32(538.3833), np.float32(722.6672), np.float32(493.43805)]
2025-09-14 13:58:12,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:58:12,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (564.94) for latency 21
2025-09-14 13:58:12,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 34 minutes, 7 seconds)
2025-09-14 14:00:22,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:00:30,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 791.38422 ± 108.306
2025-09-14 14:00:30,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(857.99493), np.float32(834.66986), np.float32(846.82733), np.float32(532.9029), np.float32(795.341), np.float32(879.4013), np.float32(829.102), np.float32(659.82825), np.float32(909.9461), np.float32(767.8284)]
2025-09-14 14:00:30,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:00:30,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (791.38) for latency 21
2025-09-14 14:00:30,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 31 minutes, 49 seconds)
2025-09-14 14:02:41,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:02:49,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 749.85754 ± 167.003
2025-09-14 14:02:49,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(741.5982), np.float32(464.3267), np.float32(934.3977), np.float32(974.62836), np.float32(849.41895), np.float32(633.48706), np.float32(649.0466), np.float32(521.05505), np.float32(872.4481), np.float32(858.1682)]
2025-09-14 14:02:49,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:02:49,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 29 minutes, 23 seconds)
2025-09-14 14:04:59,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:05:07,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 835.18799 ± 91.403
2025-09-14 14:05:07,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(999.69574), np.float32(776.17175), np.float32(938.9078), np.float32(866.21576), np.float32(760.48206), np.float32(833.518), np.float32(796.52124), np.float32(878.8981), np.float32(655.80005), np.float32(845.6699)]
2025-09-14 14:05:07,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:05:07,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (835.19) for latency 21
2025-09-14 14:05:07,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 26 minutes, 45 seconds)
2025-09-14 14:07:17,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:07:26,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 883.74915 ± 112.075
2025-09-14 14:07:26,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(930.1946), np.float32(885.59033), np.float32(894.93024), np.float32(792.30457), np.float32(988.44977), np.float32(601.832), np.float32(837.64465), np.float32(989.3189), np.float32(963.083), np.float32(954.14343)]
2025-09-14 14:07:26,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:07:26,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (883.75) for latency 21
2025-09-14 14:07:26,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 3 hours, 23 minutes, 3 seconds)
2025-09-14 14:09:36,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:09:44,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 856.12366 ± 368.224
2025-09-14 14:09:44,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1136.455), np.float32(1004.8137), np.float32(-200.13129), np.float32(800.45496), np.float32(1020.85297), np.float32(780.3437), np.float32(1059.8737), np.float32(901.5541), np.float32(1012.208), np.float32(1044.8113)]
2025-09-14 14:09:44,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:09:44,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 3 hours, 20 minutes, 41 seconds)
2025-09-14 14:11:54,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:12:03,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1026.35986 ± 112.829
2025-09-14 14:12:03,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(807.0426), np.float32(1086.1183), np.float32(1118.3284), np.float32(1036.8081), np.float32(946.0456), np.float32(1113.2241), np.float32(1086.0835), np.float32(1183.6783), np.float32(1016.9656), np.float32(869.30505)]
2025-09-14 14:12:03,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:12:03,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1026.36) for latency 21
2025-09-14 14:12:03,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 3 hours, 18 minutes, 26 seconds)
2025-09-14 14:14:13,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:14:21,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1059.12561 ± 85.207
2025-09-14 14:14:21,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(927.8071), np.float32(1077.7552), np.float32(1095.494), np.float32(1182.5854), np.float32(1146.0907), np.float32(1127.352), np.float32(1059.9482), np.float32(1066.8883), np.float32(997.95776), np.float32(909.3757)]
2025-09-14 14:14:21,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:14:21,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1059.13) for latency 21
2025-09-14 14:14:21,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 3 hours, 16 minutes, 8 seconds)
2025-09-14 14:16:31,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:16:40,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1065.67896 ± 105.112
2025-09-14 14:16:40,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1066.997), np.float32(899.91516), np.float32(952.42395), np.float32(1252.161), np.float32(1090.5813), np.float32(1219.0101), np.float32(998.9035), np.float32(1081.4789), np.float32(1102.0453), np.float32(993.2742)]
2025-09-14 14:16:40,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:16:40,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1065.68) for latency 21
2025-09-14 14:16:40,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 13 minutes, 50 seconds)
2025-09-14 14:18:50,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:18:58,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1145.34583 ± 126.678
2025-09-14 14:18:58,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1293.8438), np.float32(1101.0104), np.float32(989.38916), np.float32(1071.2166), np.float32(1398.3593), np.float32(1208.3497), np.float32(999.882), np.float32(1026.1003), np.float32(1166.9982), np.float32(1198.3085)]
2025-09-14 14:18:58,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:18:58,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1145.35) for latency 21
2025-09-14 14:18:58,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 11 minutes, 31 seconds)
2025-09-14 14:21:08,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:21:16,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1121.87402 ± 115.727
2025-09-14 14:21:16,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1359.423), np.float32(1084.5112), np.float32(1097.1637), np.float32(858.3028), np.float32(1179.62), np.float32(1149.0312), np.float32(1106.4904), np.float32(1095.5586), np.float32(1141.7533), np.float32(1146.8865)]
2025-09-14 14:21:16,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:21:16,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 9 minutes, 10 seconds)
2025-09-14 14:23:27,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:23:35,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1097.82739 ± 108.983
2025-09-14 14:23:35,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1016.83716), np.float32(1088.3806), np.float32(1345.783), np.float32(1223.594), np.float32(959.61865), np.float32(1058.1854), np.float32(1119.4335), np.float32(987.5739), np.float32(1063.0896), np.float32(1115.7786)]
2025-09-14 14:23:35,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:23:35,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 6 minutes, 51 seconds)
2025-09-14 14:25:46,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:25:55,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1118.58508 ± 156.985
2025-09-14 14:25:55,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1347.2325), np.float32(1131.6588), np.float32(1179.0132), np.float32(902.013), np.float32(1215.1292), np.float32(981.0627), np.float32(996.1766), np.float32(944.531), np.float32(1106.2983), np.float32(1382.735)]
2025-09-14 14:25:55,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:25:55,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 4 minutes, 56 seconds)
2025-09-14 14:28:05,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:28:13,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1302.63403 ± 165.101
2025-09-14 14:28:13,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1610.4841), np.float32(1452.6093), np.float32(1356.655), np.float32(1484.5411), np.float32(1138.4594), np.float32(1350.274), np.float32(1182.8429), np.float32(1207.027), np.float32(1145.59), np.float32(1097.8573)]
2025-09-14 14:28:13,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:28:13,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1302.63) for latency 21
2025-09-14 14:28:13,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 2 minutes, 33 seconds)
2025-09-14 14:30:24,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:30:32,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1173.75415 ± 105.016
2025-09-14 14:30:32,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1163.0576), np.float32(1202.0757), np.float32(1200.0874), np.float32(1060.1492), np.float32(1130.9567), np.float32(1253.5736), np.float32(1178.0492), np.float32(1004.5493), np.float32(1132.1511), np.float32(1412.8904)]
2025-09-14 14:30:32,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:30:32,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 34 seconds)
2025-09-14 14:32:45,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:32:53,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1192.10718 ± 75.361
2025-09-14 14:32:53,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1164.943), np.float32(1183.6569), np.float32(1059.1534), np.float32(1121.3986), np.float32(1234.1405), np.float32(1361.6804), np.float32(1220.2716), np.float32(1178.0579), np.float32(1169.7096), np.float32(1228.0596)]
2025-09-14 14:32:53,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:32:53,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 58 minutes, 55 seconds)
2025-09-14 14:35:04,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:35:12,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1312.61328 ± 193.153
2025-09-14 14:35:12,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1274.5182), np.float32(1758.7058), np.float32(1176.7054), np.float32(1127.0139), np.float32(1288.0559), np.float32(1484.799), np.float32(1195.9025), np.float32(1078.2578), np.float32(1287.1799), np.float32(1454.9938)]
2025-09-14 14:35:12,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:35:12,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1312.61) for latency 21
2025-09-14 14:35:12,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 56 minutes, 41 seconds)
2025-09-14 14:37:23,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:37:31,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1395.63147 ± 289.051
2025-09-14 14:37:31,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1118.7675), np.float32(1196.391), np.float32(1875.8574), np.float32(1338.735), np.float32(1725.0781), np.float32(1863.0339), np.float32(1300.1287), np.float32(1140.1046), np.float32(1256.7355), np.float32(1141.4819)]
2025-09-14 14:37:31,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:37:31,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1395.63) for latency 21
2025-09-14 14:37:31,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 54 minutes, 6 seconds)
2025-09-14 14:39:42,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:39:50,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1422.56458 ± 254.000
2025-09-14 14:39:50,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1361.5773), np.float32(1054.0076), np.float32(1398.2832), np.float32(1402.3142), np.float32(1808.03), np.float32(1459.34), np.float32(1293.5421), np.float32(1326.1954), np.float32(1180.1859), np.float32(1942.1703)]
2025-09-14 14:39:50,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:39:50,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1422.56) for latency 21
2025-09-14 14:39:50,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 51 minutes, 57 seconds)
2025-09-14 14:42:01,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:42:09,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1441.04761 ± 256.889
2025-09-14 14:42:09,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1382.5192), np.float32(1404.2987), np.float32(1522.451), np.float32(1307.8099), np.float32(1457.0498), np.float32(1207.8984), np.float32(1161.3173), np.float32(1188.1173), np.float32(1763.6229), np.float32(2015.3917)]
2025-09-14 14:42:09,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:42:09,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1441.05) for latency 21
2025-09-14 14:42:09,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 49 minutes, 26 seconds)
2025-09-14 14:44:19,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:44:27,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1387.70667 ± 289.745
2025-09-14 14:44:27,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1074.7253), np.float32(1549.4796), np.float32(1290.4872), np.float32(1714.4811), np.float32(1976.2313), np.float32(1085.9429), np.float32(1585.1044), np.float32(1140.6161), np.float32(1152.6113), np.float32(1307.3872)]
2025-09-14 14:44:27,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:44:27,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 46 minutes, 32 seconds)
2025-09-14 14:46:38,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:46:46,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1392.31958 ± 229.588
2025-09-14 14:46:46,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1460.2245), np.float32(1221.5333), np.float32(1926.3435), np.float32(1196.2269), np.float32(1533.6675), np.float32(1221.8533), np.float32(1242.4146), np.float32(1626.6791), np.float32(1223.68), np.float32(1270.5737)]
2025-09-14 14:46:46,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:46:46,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 44 minutes, 11 seconds)
2025-09-14 14:48:57,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:49:05,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1470.06519 ± 275.239
2025-09-14 14:49:05,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1252.3044), np.float32(1230.2203), np.float32(1205.6976), np.float32(1190.542), np.float32(1576.3092), np.float32(1559.7986), np.float32(1315.8932), np.float32(1476.7147), np.float32(2006.9885), np.float32(1886.1835)]
2025-09-14 14:49:05,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:49:05,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1470.07) for latency 21
2025-09-14 14:49:05,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 41 minutes, 52 seconds)
2025-09-14 14:51:15,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:51:23,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1540.11768 ± 269.915
2025-09-14 14:51:23,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1330.6609), np.float32(1702.1489), np.float32(1421.1947), np.float32(1212.8458), np.float32(1168.933), np.float32(1853.404), np.float32(1865.8207), np.float32(1329.1182), np.float32(1924.7072), np.float32(1592.3425)]
2025-09-14 14:51:23,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:51:23,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1540.12) for latency 21
2025-09-14 14:51:23,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 39 minutes, 30 seconds)
2025-09-14 14:53:34,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:53:42,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1487.85840 ± 395.345
2025-09-14 14:53:42,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1427.5488), np.float32(1198.8585), np.float32(1268.8275), np.float32(1365.7225), np.float32(1278.6659), np.float32(1197.7416), np.float32(1416.7794), np.float32(1646.26), np.float32(1470.256), np.float32(2607.9238)]
2025-09-14 14:53:42,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:53:42,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 37 minutes, 6 seconds)
2025-09-14 14:55:52,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:56:01,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1624.21802 ± 501.179
2025-09-14 14:56:01,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1229.585), np.float32(1304.9368), np.float32(2452.4673), np.float32(1222.6554), np.float32(1909.7207), np.float32(1111.9307), np.float32(1250.3346), np.float32(1304.8821), np.float32(2442.84), np.float32(2012.8285)]
2025-09-14 14:56:01,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:56:01,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1624.22) for latency 21
2025-09-14 14:56:01,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 34 minutes, 50 seconds)
2025-09-14 14:58:11,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:58:19,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1313.53540 ± 181.115
2025-09-14 14:58:19,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1096.3127), np.float32(1652.2714), np.float32(1534.7075), np.float32(1327.0757), np.float32(1302.6144), np.float32(1202.3479), np.float32(1143.5228), np.float32(1512.0331), np.float32(1152.8085), np.float32(1211.66)]
2025-09-14 14:58:19,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:58:19,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 32 minutes, 28 seconds)
2025-09-14 15:00:29,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:00:38,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1464.23315 ± 426.132
2025-09-14 15:00:38,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1313.7085), np.float32(1253.4102), np.float32(1402.2804), np.float32(1399.6195), np.float32(1519.0984), np.float32(1253.3938), np.float32(1179.3293), np.float32(1473.0056), np.float32(2695.2288), np.float32(1153.2566)]
2025-09-14 15:00:38,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:00:38,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 30 minutes, 6 seconds)
2025-09-14 15:02:48,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:02:56,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1411.32556 ± 304.904
2025-09-14 15:02:56,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1446.4214), np.float32(1456.3242), np.float32(2222.3596), np.float32(1328.142), np.float32(1592.4734), np.float32(1203.5859), np.float32(1342.3821), np.float32(1124.1562), np.float32(1262.3812), np.float32(1135.0306)]
2025-09-14 15:02:56,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:02:56,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 27 minutes, 47 seconds)
2025-09-14 15:05:06,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:05:15,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1895.32874 ± 363.552
2025-09-14 15:05:15,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1623.3318), np.float32(1228.013), np.float32(2168.041), np.float32(2069.8655), np.float32(1586.1721), np.float32(1942.8496), np.float32(1953.2947), np.float32(1831.4149), np.float32(1885.435), np.float32(2664.8691)]
2025-09-14 15:05:15,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:05:15,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1895.33) for latency 21
2025-09-14 15:05:15,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 25 minutes, 29 seconds)
2025-09-14 15:07:25,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:07:33,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1491.95081 ± 335.891
2025-09-14 15:07:33,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2312.5771), np.float32(1702.667), np.float32(1179.1357), np.float32(1271.9436), np.float32(1111.3058), np.float32(1362.8138), np.float32(1395.8766), np.float32(1768.8809), np.float32(1440.8667), np.float32(1373.4419)]
2025-09-14 15:07:33,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:07:33,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 23 minutes, 8 seconds)
2025-09-14 15:09:44,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:09:52,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1536.75024 ± 177.507
2025-09-14 15:09:52,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1309.4806), np.float32(1576.2224), np.float32(1287.4615), np.float32(1514.5115), np.float32(1393.6389), np.float32(1801.6356), np.float32(1781.8911), np.float32(1740.424), np.float32(1460.7913), np.float32(1501.4459)]
2025-09-14 15:09:52,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:09:52,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 20 minutes, 51 seconds)
2025-09-14 15:12:06,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:12:14,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1597.42651 ± 474.705
2025-09-14 15:12:14,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1653.1173), np.float32(2689.68), np.float32(2190.4226), np.float32(1510.7052), np.float32(1077.8629), np.float32(1299.3278), np.float32(1649.1427), np.float32(1456.095), np.float32(1361.3403), np.float32(1086.5728)]
2025-09-14 15:12:14,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:12:14,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 19 minutes, 13 seconds)
2025-09-14 15:14:28,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:14:36,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1540.89771 ± 353.593
2025-09-14 15:14:36,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1172.3087), np.float32(1779.9456), np.float32(1277.7721), np.float32(1163.5532), np.float32(1718.9705), np.float32(1972.5148), np.float32(2214.5598), np.float32(1159.784), np.float32(1574.3883), np.float32(1375.1809)]
2025-09-14 15:14:36,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:14:36,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 17 minutes, 39 seconds)
2025-09-14 15:16:50,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:16:58,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1697.43530 ± 307.046
2025-09-14 15:16:58,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1333.7837), np.float32(1841.5867), np.float32(1621.5704), np.float32(1829.588), np.float32(1846.7158), np.float32(1323.9287), np.float32(1229.1404), np.float32(2283.41), np.float32(1885.6271), np.float32(1779.0035)]
2025-09-14 15:16:58,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:16:58,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 16 minutes, 3 seconds)
2025-09-14 15:19:12,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:19:20,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1386.63770 ± 186.085
2025-09-14 15:19:20,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1327.468), np.float32(1263.4789), np.float32(1623.6708), np.float32(1641.963), np.float32(1702.7078), np.float32(1384.891), np.float32(1157.0474), np.float32(1224.3181), np.float32(1259.166), np.float32(1281.6663)]
2025-09-14 15:19:20,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:19:20,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 14 minutes, 18 seconds)
2025-09-14 15:21:35,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:21:43,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1739.34241 ± 532.774
2025-09-14 15:21:43,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1321.8564), np.float32(1265.5298), np.float32(2648.998), np.float32(1392.3538), np.float32(2206.4182), np.float32(1706.6268), np.float32(2651.6536), np.float32(1197.1012), np.float32(1372.8751), np.float32(1630.01)]
2025-09-14 15:21:43,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:21:43,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 12 minutes, 46 seconds)
2025-09-14 15:24:11,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:24:20,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1472.27673 ± 355.749
2025-09-14 15:24:20,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2234.1218), np.float32(1353.4084), np.float32(942.5041), np.float32(1623.3644), np.float32(1808.7515), np.float32(1443.8225), np.float32(1360.3799), np.float32(1105.1481), np.float32(1199.9465), np.float32(1651.3215)]
2025-09-14 15:24:20,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:24:20,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 13 minutes, 11 seconds)
2025-09-14 15:26:46,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:26:54,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1455.87390 ± 354.336
2025-09-14 15:26:54,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1250.8397), np.float32(2328.7708), np.float32(1399.064), np.float32(1338.3998), np.float32(1665.2667), np.float32(1266.3967), np.float32(1273.9691), np.float32(1776.7183), np.float32(1184.0181), np.float32(1075.2958)]
2025-09-14 15:26:54,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:26:54,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 12 minutes, 49 seconds)
2025-09-14 15:29:11,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:29:19,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1516.29297 ± 670.459
2025-09-14 15:29:19,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-134.10342), np.float32(2445.8552), np.float32(1451.0442), np.float32(1357.4302), np.float32(1556.8), np.float32(2400.0442), np.float32(1458.704), np.float32(1776.6501), np.float32(1486.2113), np.float32(1364.2948)]
2025-09-14 15:29:19,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:29:19,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 10 minutes, 55 seconds)
2025-09-14 15:31:36,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:31:44,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1378.01147 ± 207.161
2025-09-14 15:31:44,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1169.7605), np.float32(1561.097), np.float32(1283.8593), np.float32(1168.7656), np.float32(1218.5376), np.float32(1255.5355), np.float32(1708.0834), np.float32(1209.606), np.float32(1705.2606), np.float32(1499.6094)]
2025-09-14 15:31:44,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:31:44,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 8 minutes, 58 seconds)
2025-09-14 15:34:02,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:34:10,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1669.96753 ± 403.024
2025-09-14 15:34:10,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1729.5122), np.float32(2282.5225), np.float32(1592.3374), np.float32(1844.2556), np.float32(1434.9314), np.float32(1163.1122), np.float32(1508.3107), np.float32(1403.8318), np.float32(2470.277), np.float32(1270.5836)]
2025-09-14 15:34:10,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:34:10,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 6 minutes, 56 seconds)
2025-09-14 15:36:28,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:36:37,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1681.96118 ± 376.781
2025-09-14 15:36:37,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1206.7108), np.float32(1653.7083), np.float32(2510.3472), np.float32(1691.0262), np.float32(1298.3414), np.float32(2034.6931), np.float32(1579.4539), np.float32(1623.6482), np.float32(1935.6952), np.float32(1285.9865)]
2025-09-14 15:36:37,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:36:37,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 2 minutes, 43 seconds)
2025-09-14 15:38:54,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:39:02,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1874.04199 ± 406.495
2025-09-14 15:39:02,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1606.8674), np.float32(2060.9102), np.float32(2545.8381), np.float32(1504.3076), np.float32(1646.1088), np.float32(2378.8447), np.float32(1488.8339), np.float32(2032.6388), np.float32(2207.4785), np.float32(1268.5912)]
2025-09-14 15:39:02,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:39:02,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 58 minutes, 52 seconds)
2025-09-14 15:41:19,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:41:27,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1832.84021 ± 650.999
2025-09-14 15:41:27,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2985.259), np.float32(2002.0254), np.float32(2058.6968), np.float32(1807.871), np.float32(1274.183), np.float32(1276.9276), np.float32(1341.5646), np.float32(1351.7899), np.float32(1227.212), np.float32(3002.875)]
2025-09-14 15:41:27,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:41:27,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 56 minutes, 28 seconds)
2025-09-14 15:43:44,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:43:52,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1447.69458 ± 615.540
2025-09-14 15:43:52,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1830.7357), np.float32(1190.8632), np.float32(1179.7079), np.float32(2068.6895), np.float32(1136.1907), np.float32(199.0771), np.float32(1228.1656), np.float32(1498.485), np.float32(2634.254), np.float32(1510.7782)]
2025-09-14 15:43:52,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:43:52,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 54 minutes, 2 seconds)
2025-09-14 15:46:09,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:46:18,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1602.05298 ± 489.665
2025-09-14 15:46:18,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1577.3925), np.float32(1403.0494), np.float32(1119.276), np.float32(1315.8572), np.float32(2534.227), np.float32(1557.0929), np.float32(1038.4641), np.float32(1402.3999), np.float32(2503.706), np.float32(1569.0643)]
2025-09-14 15:46:18,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:46:18,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 51 minutes, 35 seconds)
2025-09-14 15:48:34,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:48:42,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2301.05908 ± 749.326
2025-09-14 15:48:42,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3027.725), np.float32(3281.442), np.float32(1343.1039), np.float32(1150.4053), np.float32(3022.8547), np.float32(1705.7998), np.float32(2500.2336), np.float32(2897.2659), np.float32(2529.845), np.float32(1551.9156)]
2025-09-14 15:48:42,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:48:42,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (2301.06) for latency 21
2025-09-14 15:48:42,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 48 minutes, 53 seconds)
2025-09-14 15:51:00,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:51:08,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2099.62402 ± 816.314
2025-09-14 15:51:08,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1699.2771), np.float32(1845.8024), np.float32(1894.8002), np.float32(3358.9866), np.float32(1388.2035), np.float32(1410.7784), np.float32(1566.3171), np.float32(3147.1946), np.float32(3421.9465), np.float32(1262.9331)]
2025-09-14 15:51:08,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:51:08,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 46 minutes, 27 seconds)
2025-09-14 15:53:26,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:53:34,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2068.48071 ± 700.213
2025-09-14 15:53:34,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3003.5522), np.float32(2259.5347), np.float32(1188.7994), np.float32(3278.221), np.float32(2629.3413), np.float32(2042.7504), np.float32(1271.1797), np.float32(1250.3633), np.float32(2059.32), np.float32(1701.7448)]
2025-09-14 15:53:34,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:53:34,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 44 minutes, 6 seconds)
2025-09-14 15:55:51,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:55:59,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2185.92114 ± 778.805
2025-09-14 15:55:59,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3397.1436), np.float32(2538.6074), np.float32(3403.916), np.float32(1385.8263), np.float32(1607.5592), np.float32(2304.7979), np.float32(1672.1749), np.float32(1334.3527), np.float32(1400.2411), np.float32(2814.5918)]
2025-09-14 15:55:59,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:55:59,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 41 minutes, 46 seconds)
2025-09-14 15:58:16,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:58:24,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2078.61646 ± 610.492
2025-09-14 15:58:24,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3094.6787), np.float32(2079.0112), np.float32(1170.1946), np.float32(3055.929), np.float32(2277.4492), np.float32(1545.6016), np.float32(1396.1699), np.float32(2019.5747), np.float32(1827.4536), np.float32(2320.1008)]
2025-09-14 15:58:24,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:58:24,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 39 minutes, 19 seconds)
2025-09-14 16:00:42,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:00:50,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3322.31909 ± 921.441
2025-09-14 16:00:50,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3867.526), np.float32(3571.2754), np.float32(3777.813), np.float32(3750.5356), np.float32(3663.713), np.float32(3796.4272), np.float32(4024.6323), np.float32(1172.6409), np.float32(3731.4377), np.float32(1867.1897)]
2025-09-14 16:00:50,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:00:50,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (3322.32) for latency 21
2025-09-14 16:00:50,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 36 minutes, 58 seconds)
2025-09-14 16:03:07,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:03:15,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2289.19067 ± 757.336
2025-09-14 16:03:15,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2150.625), np.float32(1783.815), np.float32(3638.2605), np.float32(2577.9187), np.float32(1308.6411), np.float32(2066.2424), np.float32(2877.85), np.float32(1135.5135), np.float32(3209.855), np.float32(2143.1846)]
2025-09-14 16:03:15,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:03:15,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 34 minutes, 33 seconds)
2025-09-14 16:05:33,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:05:41,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3144.18213 ± 863.223
2025-09-14 16:05:41,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3364.675), np.float32(4027.8044), np.float32(3953.3962), np.float32(1950.629), np.float32(2887.8242), np.float32(3584.0986), np.float32(2097.1213), np.float32(4013.6528), np.float32(3822.421), np.float32(1740.1979)]
2025-09-14 16:05:41,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:05:41,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 32 minutes, 5 seconds)
2025-09-14 16:07:58,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:08:06,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3304.33325 ± 936.312
2025-09-14 16:08:06,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4168.709), np.float32(4363.8345), np.float32(2925.3875), np.float32(2462.1257), np.float32(2520.4863), np.float32(1820.0717), np.float32(4072.3062), np.float32(4425.1836), np.float32(3957.585), np.float32(2327.6401)]
2025-09-14 16:08:06,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:08:06,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 29 minutes, 40 seconds)
2025-09-14 16:10:24,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:10:32,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2874.60889 ± 1025.626
2025-09-14 16:10:32,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2006.5199), np.float32(3916.5994), np.float32(1183.2894), np.float32(4052.3418), np.float32(2573.7803), np.float32(3646.2495), np.float32(1172.1306), np.float32(3504.3022), np.float32(3282.4832), np.float32(3408.392)]
2025-09-14 16:10:32,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:10:32,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 27 minutes, 18 seconds)
2025-09-14 16:12:49,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:12:57,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2313.74219 ± 722.243
2025-09-14 16:12:57,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2193.423), np.float32(2131.0244), np.float32(3027.238), np.float32(1283.3871), np.float32(2058.9895), np.float32(1968.0044), np.float32(1920.6101), np.float32(4087.4058), np.float32(2457.3735), np.float32(2009.9675)]
2025-09-14 16:12:57,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:12:57,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 24 minutes, 51 seconds)
2025-09-14 16:15:14,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:15:23,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2634.96045 ± 1213.626
2025-09-14 16:15:23,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3967.9707), np.float32(4205.066), np.float32(1016.947), np.float32(1363.2114), np.float32(2278.9185), np.float32(2309.7217), np.float32(2053.835), np.float32(3522.612), np.float32(4369.3564), np.float32(1261.9645)]
2025-09-14 16:15:23,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:15:23,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 22 minutes, 26 seconds)
2025-09-14 16:17:40,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:17:48,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2720.70557 ± 1153.038
2025-09-14 16:17:48,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1984.9244), np.float32(4389.436), np.float32(1480.4625), np.float32(3494.8633), np.float32(2930.6233), np.float32(1247.8994), np.float32(1806.271), np.float32(4422.073), np.float32(1753.8315), np.float32(3696.6711)]
2025-09-14 16:17:48,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:17:48,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 19 minutes, 58 seconds)
2025-09-14 16:20:05,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:20:13,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2350.76514 ± 868.956
2025-09-14 16:20:13,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2008.2557), np.float32(2259.6484), np.float32(1708.1431), np.float32(2491.4268), np.float32(1685.2021), np.float32(1513.2035), np.float32(1280.257), np.float32(2906.9526), np.float32(3699.8564), np.float32(3954.706)]
2025-09-14 16:20:13,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:20:13,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 17 minutes, 33 seconds)
2025-09-14 16:22:31,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:22:39,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4534.96631 ± 118.538
2025-09-14 16:22:39,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4392.273), np.float32(4462.279), np.float32(4387.535), np.float32(4546.141), np.float32(4355.5645), np.float32(4663.2), np.float32(4616.0796), np.float32(4685.4233), np.float32(4636.383), np.float32(4604.7876)]
2025-09-14 16:22:39,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:22:39,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (4534.97) for latency 21
2025-09-14 16:22:39,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 15 minutes, 6 seconds)
2025-09-14 16:24:56,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:25:05,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3479.35474 ± 1283.809
2025-09-14 16:25:05,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4370.219), np.float32(1163.073), np.float32(4616.6724), np.float32(2488.0122), np.float32(2371.8342), np.float32(4353.0933), np.float32(1830.5365), np.float32(4521.486), np.float32(4589.037), np.float32(4489.582)]
2025-09-14 16:25:05,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:25:05,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 12 minutes, 44 seconds)
2025-09-14 16:27:22,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:27:30,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3394.31372 ± 1102.819
2025-09-14 16:27:30,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4271.3027), np.float32(4490.4316), np.float32(2941.4478), np.float32(1805.6201), np.float32(1218.5701), np.float32(3769.4312), np.float32(4557.1743), np.float32(3046.4163), np.float32(4450.0703), np.float32(3392.673)]
2025-09-14 16:27:30,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:27:30,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 10 minutes, 17 seconds)
2025-09-14 16:29:47,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:29:55,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3791.02686 ± 996.854
2025-09-14 16:29:55,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2882.0674), np.float32(1957.5105), np.float32(3632.19), np.float32(4733.975), np.float32(3471.9944), np.float32(2533.509), np.float32(4695.5415), np.float32(4538.625), np.float32(4765.233), np.float32(4699.6216)]
2025-09-14 16:29:55,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:29:55,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 7 minutes, 50 seconds)
2025-09-14 16:32:12,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:32:20,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3769.56299 ± 1239.891
2025-09-14 16:32:20,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1390.679), np.float32(4456.8696), np.float32(4058.6265), np.float32(4763.268), np.float32(4052.031), np.float32(4835.996), np.float32(3518.5884), np.float32(4616.834), np.float32(4581.3447), np.float32(1421.3921)]
2025-09-14 16:32:20,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:32:20,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 5 minutes, 25 seconds)
2025-09-14 16:34:38,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:34:46,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2850.20508 ± 1465.530
2025-09-14 16:34:46,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3088.9692), np.float32(1312.1198), np.float32(1526.2235), np.float32(2225.7234), np.float32(4637.7583), np.float32(4844.1323), np.float32(4092.2288), np.float32(1153.4604), np.float32(4441.1147), np.float32(1180.3206)]
2025-09-14 16:34:46,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:34:46,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 3 minutes, 1 second)
2025-09-14 16:37:03,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:37:11,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3751.58203 ± 943.111
2025-09-14 16:37:11,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3205.4702), np.float32(4298.077), np.float32(4898.3535), np.float32(4760.1914), np.float32(2826.7866), np.float32(2057.6072), np.float32(4360.3394), np.float32(4374.644), np.float32(2597.0535), np.float32(4137.299)]
2025-09-14 16:37:11,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:37:11,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 32 seconds)
2025-09-14 16:39:29,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:39:37,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3459.53784 ± 1144.534
2025-09-14 16:39:37,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4644.0625), np.float32(4408.2695), np.float32(2528.325), np.float32(2995.7844), np.float32(2277.3315), np.float32(4310.1343), np.float32(2914.8936), np.float32(4668.3545), np.float32(1302.107), np.float32(4546.1143)]
2025-09-14 16:39:37,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:39:37,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 58 minutes, 10 seconds)
2025-09-14 16:41:54,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:42:02,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4278.28809 ± 884.393
2025-09-14 16:42:02,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4440.874), np.float32(4439.844), np.float32(4742.5825), np.float32(4803.173), np.float32(1687.5333), np.float32(4948.2285), np.float32(4557.042), np.float32(4343.8022), np.float32(4424.6294), np.float32(4395.173)]
2025-09-14 16:42:02,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:42:02,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 55 minutes, 47 seconds)
2025-09-14 16:44:20,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:44:28,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4056.20898 ± 1205.938
2025-09-14 16:44:28,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4737.2295), np.float32(4762.712), np.float32(4595.3047), np.float32(3175.2925), np.float32(1561.4788), np.float32(4936.4194), np.float32(4775.3657), np.float32(4877.9014), np.float32(4957.6455), np.float32(2182.742)]
2025-09-14 16:44:28,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:44:28,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 53 minutes, 22 seconds)
2025-09-14 16:46:46,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:46:54,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3568.33984 ± 1092.484
2025-09-14 16:46:54,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4403.4106), np.float32(4249.1104), np.float32(4587.3447), np.float32(2945.9944), np.float32(1528.9382), np.float32(2505.9932), np.float32(2227.2998), np.float32(4212.049), np.float32(4696.9053), np.float32(4326.3525)]
2025-09-14 16:46:54,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:46:54,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 50 minutes, 57 seconds)
2025-09-14 16:49:11,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:49:19,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3635.03638 ± 987.829
2025-09-14 16:49:19,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3814.4006), np.float32(2665.0344), np.float32(3474.7017), np.float32(2883.796), np.float32(3414.5566), np.float32(4707.724), np.float32(4649.811), np.float32(4206.492), np.float32(4905.261), np.float32(1628.5853)]
2025-09-14 16:49:19,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:49:19,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 48 minutes, 32 seconds)
2025-09-14 16:51:37,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:51:45,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3336.24072 ± 1214.911
2025-09-14 16:51:45,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2211.7842), np.float32(4489.2236), np.float32(2090.1765), np.float32(4754.0615), np.float32(4754.4526), np.float32(4249.0693), np.float32(4380.704), np.float32(1689.0983), np.float32(2589.8982), np.float32(2153.9392)]
2025-09-14 16:51:45,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:51:45,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 46 minutes, 7 seconds)
2025-09-14 16:54:03,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:54:11,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3947.27490 ± 1382.442
2025-09-14 16:54:11,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4689.2583), np.float32(4532.357), np.float32(4709.4067), np.float32(4714.207), np.float32(4267.192), np.float32(4803.343), np.float32(1129.1172), np.float32(4653.215), np.float32(4709.391), np.float32(1265.2649)]
2025-09-14 16:54:11,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:54:11,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 43 minutes, 43 seconds)
2025-09-14 16:56:29,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:56:37,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3845.44067 ± 1231.678
2025-09-14 16:56:37,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1694.6263), np.float32(4921.7734), np.float32(3185.4998), np.float32(4161.2407), np.float32(4781.654), np.float32(4874.9717), np.float32(4887.0825), np.float32(3209.1624), np.float32(4945.688), np.float32(1792.7085)]
2025-09-14 16:56:37,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:56:37,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 41 minutes, 18 seconds)
2025-09-14 16:58:54,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:59:02,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2898.87500 ± 1533.262
2025-09-14 16:59:02,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4913.4155), np.float32(1188.1187), np.float32(1845.241), np.float32(2577.4888), np.float32(4077.1814), np.float32(2193.4656), np.float32(4886.003), np.float32(1213.5668), np.float32(1231.7128), np.float32(4862.556)]
2025-09-14 16:59:02,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:59:02,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 38 minutes, 50 seconds)
2025-09-14 17:01:20,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:01:28,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3392.28906 ± 1592.939
2025-09-14 17:01:28,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1711.1909), np.float32(1426.8171), np.float32(4899.534), np.float32(4894.2773), np.float32(5068.1216), np.float32(1187.5729), np.float32(5067.732), np.float32(1748.0205), np.float32(4368.1777), np.float32(3551.4465)]
2025-09-14 17:01:28,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:01:28,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 36 minutes, 25 seconds)
2025-09-14 17:03:45,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:03:53,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4778.81934 ± 419.504
2025-09-14 17:03:53,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4976.9453), np.float32(4785.0015), np.float32(4962.5483), np.float32(3556.2817), np.float32(5075.9946), np.float32(4738.075), np.float32(4998.9297), np.float32(4832.2197), np.float32(4883.4707), np.float32(4978.7285)]
2025-09-14 17:03:53,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:03:53,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (4778.82) for latency 21
2025-09-14 17:03:53,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 33 minutes, 59 seconds)
2025-09-14 17:06:10,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:06:19,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4721.04199 ± 222.922
2025-09-14 17:06:19,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4659.412), np.float32(4463.672), np.float32(5062.5728), np.float32(4818.897), np.float32(4290.1655), np.float32(4538.4614), np.float32(4907.062), np.float32(4729.6577), np.float32(4835.005), np.float32(4905.515)]
2025-09-14 17:06:19,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:06:19,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 31 minutes, 31 seconds)
2025-09-14 17:08:36,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:08:44,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3784.62256 ± 1705.841
2025-09-14 17:08:44,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4762.4707), np.float32(5106.6562), np.float32(1199.2809), np.float32(4794.8057), np.float32(4954.4), np.float32(4873.5103), np.float32(1227.528), np.float32(4844.1416), np.float32(1121.9095), np.float32(4961.5254)]
2025-09-14 17:08:44,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:08:44,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 29 minutes, 4 seconds)
2025-09-14 17:11:01,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:11:09,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4193.79492 ± 1104.800
2025-09-14 17:11:09,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4804.767), np.float32(4954.0703), np.float32(4194.8906), np.float32(4344.964), np.float32(1164.9271), np.float32(3419.426), np.float32(4967.998), np.float32(4533.533), np.float32(4598.3037), np.float32(4955.069)]
2025-09-14 17:11:09,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:11:09,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 26 minutes, 39 seconds)
2025-09-14 17:13:27,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:13:35,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3659.53394 ± 1473.039
2025-09-14 17:13:35,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4832.4053), np.float32(5070.297), np.float32(1435.3251), np.float32(4675.0273), np.float32(4782.957), np.float32(4894.687), np.float32(2423.897), np.float32(1863.4414), np.float32(1797.4321), np.float32(4819.871)]
2025-09-14 17:13:35,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:13:35,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 24 minutes, 14 seconds)
2025-09-14 17:15:52,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:16:00,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4101.40869 ± 931.798
2025-09-14 17:16:00,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2579.3547), np.float32(4661.069), np.float32(2877.3926), np.float32(4816.141), np.float32(4079.2412), np.float32(4609.611), np.float32(4911.073), np.float32(2735.707), np.float32(5066.8047), np.float32(4677.6904)]
2025-09-14 17:16:00,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:16:00,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 21 minutes, 47 seconds)
2025-09-14 17:18:17,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:18:25,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4538.85400 ± 1145.847
2025-09-14 17:18:25,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1163.3405), np.float32(5022.1816), np.float32(5025.997), np.float32(5015.289), np.float32(4955.711), np.float32(4970.758), np.float32(5055.4893), np.float32(4874.017), np.float32(5021.7275), np.float32(4284.0327)]
2025-09-14 17:18:25,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:18:25,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 19 minutes, 22 seconds)
2025-09-14 17:20:42,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:20:50,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3707.34961 ± 1370.237
2025-09-14 17:20:50,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2641.0784), np.float32(4850.9395), np.float32(1671.6553), np.float32(5133.758), np.float32(1718.1465), np.float32(5101.2476), np.float32(4936.7373), np.float32(3070.4802), np.float32(5021.6743), np.float32(2927.7783)]
2025-09-14 17:20:50,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:20:50,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 16 minutes, 56 seconds)
2025-09-14 17:23:07,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:23:15,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3665.29761 ± 1669.115
2025-09-14 17:23:15,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1386.1747), np.float32(4978.692), np.float32(5022.115), np.float32(1158.567), np.float32(1182.7394), np.float32(3201.3828), np.float32(5065.171), np.float32(4904.88), np.float32(4820.7417), np.float32(4932.511)]
2025-09-14 17:23:15,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:23:15,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 14 minutes, 31 seconds)
2025-09-14 17:25:33,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:25:42,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3459.13916 ± 1500.676
2025-09-14 17:25:42,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5114.3154), np.float32(1922.3845), np.float32(1978.1538), np.float32(2053.8901), np.float32(4588.434), np.float32(5058.184), np.float32(4812.6953), np.float32(2822.1907), np.float32(1256.2073), np.float32(4984.9355)]
2025-09-14 17:25:42,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:25:42,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 12 minutes, 6 seconds)
2025-09-14 17:28:03,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:28:11,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3446.27783 ± 1355.096
2025-09-14 17:28:11,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4829.0903), np.float32(2446.1125), np.float32(3685.1973), np.float32(1315.1178), np.float32(1398.3973), np.float32(4575.059), np.float32(4510.5693), np.float32(2348.628), np.float32(4764.1357), np.float32(4590.468)]
2025-09-14 17:28:11,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:28:11,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 9 minutes, 45 seconds)
2025-09-14 17:30:33,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:30:42,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3620.89307 ± 1284.832
2025-09-14 17:30:42,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3195.3452), np.float32(5081.8374), np.float32(3355.7559), np.float32(4789.468), np.float32(5127.591), np.float32(4001.5283), np.float32(4692.9688), np.float32(2790.0662), np.float32(1248.0527), np.float32(1926.3132)]
2025-09-14 17:30:42,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:30:42,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes, 21 seconds)
2025-09-14 17:33:02,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:33:10,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3336.65479 ± 1380.339
2025-09-14 17:33:10,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4761.8384), np.float32(1300.6305), np.float32(4880.2344), np.float32(1656.3953), np.float32(1935.6895), np.float32(2839.8447), np.float32(3848.4473), np.float32(4844.058), np.float32(2481.1177), np.float32(4818.2905)]
2025-09-14 17:33:10,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:33:10,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 55 seconds)
2025-09-14 17:35:24,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:35:32,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4078.88818 ± 1368.628
2025-09-14 17:35:32,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5114.935), np.float32(4856.9224), np.float32(3489.3164), np.float32(4922.381), np.float32(1319.79), np.float32(4531.193), np.float32(4827.2793), np.float32(5043.224), np.float32(5015.2646), np.float32(1668.5779)]
2025-09-14 17:35:32,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:35:32,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 27 seconds)
2025-09-14 17:37:58,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:38:07,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4482.85449 ± 1126.982
2025-09-14 17:38:07,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4901.069), np.float32(4992.2363), np.float32(4946.239), np.float32(4845.2754), np.float32(1204.4307), np.float32(4887.8164), np.float32(5010.0864), np.float32(4997.603), np.float32(5002.7227), np.float32(4041.0647)]
2025-09-14 17:38:07,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:38:07,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1251 [DEBUG]: Training session finished
