2025-09-14 11:45:04,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.075-delay_15
2025-09-14 11:45:04,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.075-delay_15
2025-09-14 11:45:04,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'15': <latency_env.delayed_mdp.ConstantDelay object at 0x7f0783c37aa0>}
2025-09-14 11:45:04,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 11:45:04,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 11:45:04,527 baseline-bpql-noisepromille75-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=107, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 11:45:04,528 baseline-bpql-noisepromille75-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 11:45:06,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 11:45:06,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 11:48:06,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 11:48:15,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: -226.58418 ± 38.211
2025-09-14 11:48:15,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-209.72646), np.float32(-205.45828), np.float32(-217.29591), np.float32(-264.1555), np.float32(-177.64561), np.float32(-317.59543), np.float32(-238.49823), np.float32(-191.20044), np.float32(-210.7169), np.float32(-233.54884)]
2025-09-14 11:48:15,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:48:15,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (-226.58) for latency 15
2025-09-14 11:48:15,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 12 minutes, 15 seconds)
2025-09-14 11:51:08,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 11:51:16,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: -210.76250 ± 46.252
2025-09-14 11:51:16,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-228.16069), np.float32(-136.9642), np.float32(-199.90564), np.float32(-229.09949), np.float32(-305.31), np.float32(-193.5785), np.float32(-144.24783), np.float32(-246.9733), np.float32(-222.13802), np.float32(-201.2474)]
2025-09-14 11:51:16,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:51:16,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (-210.76) for latency 15
2025-09-14 11:51:16,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 2 minutes, 23 seconds)
2025-09-14 11:54:19,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 11:54:28,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: -31.87510 ± 103.201
2025-09-14 11:54:28,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-95.99244), np.float32(22.278738), np.float32(118.772064), np.float32(-95.36575), np.float32(-41.82592), np.float32(-166.37088), np.float32(-103.828735), np.float32(139.77145), np.float32(50.34591), np.float32(-146.53539)]
2025-09-14 11:54:28,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:54:28,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (-31.88) for latency 15
2025-09-14 11:54:28,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 2 minutes, 33 seconds)
2025-09-14 11:57:32,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 11:57:41,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 13.63834 ± 97.388
2025-09-14 11:57:41,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(125.49399), np.float32(-120.02715), np.float32(-31.900547), np.float32(-51.610443), np.float32(105.31627), np.float32(-115.019356), np.float32(-7.8508997), np.float32(85.992), np.float32(-28.084625), np.float32(174.07419)]
2025-09-14 11:57:41,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:57:41,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (13.64) for latency 15
2025-09-14 11:57:41,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 1 minute, 50 seconds)
2025-09-14 12:00:35,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:00:44,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 194.86166 ± 175.866
2025-09-14 12:00:44,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(27.20389), np.float32(144.30573), np.float32(28.97785), np.float32(152.10472), np.float32(375.43243), np.float32(-14.326672), np.float32(489.01843), np.float32(280.04233), np.float32(428.161), np.float32(37.69692)]
2025-09-14 12:00:44,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:00:44,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (194.86) for latency 15
2025-09-14 12:00:44,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 56 minutes, 53 seconds)
2025-09-14 12:03:47,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:03:56,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 284.77451 ± 77.222
2025-09-14 12:03:56,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(125.61319), np.float32(276.8327), np.float32(312.70114), np.float32(435.43524), np.float32(280.4936), np.float32(258.13828), np.float32(340.89407), np.float32(207.42728), np.float32(299.10892), np.float32(311.10077)]
2025-09-14 12:03:56,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:03:56,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (284.77) for latency 15
2025-09-14 12:03:56,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 54 minutes, 40 seconds)
2025-09-14 12:06:59,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:07:08,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 441.39813 ± 126.949
2025-09-14 12:07:08,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(685.52563), np.float32(450.5911), np.float32(414.42438), np.float32(605.9305), np.float32(398.6479), np.float32(188.48094), np.float32(398.98886), np.float32(447.40067), np.float32(458.47626), np.float32(365.5152)]
2025-09-14 12:07:08,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:07:08,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (441.40) for latency 15
2025-09-14 12:07:08,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 55 minutes, 4 seconds)
2025-09-14 12:10:00,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:10:09,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 595.66437 ± 150.958
2025-09-14 12:10:09,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(472.06766), np.float32(473.55347), np.float32(568.92126), np.float32(616.715), np.float32(570.1915), np.float32(864.8448), np.float32(863.6729), np.float32(612.87354), np.float32(536.8863), np.float32(376.91733)]
2025-09-14 12:10:09,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:10:09,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (595.66) for latency 15
2025-09-14 12:10:09,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 48 minutes, 35 seconds)
2025-09-14 12:13:13,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:13:21,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 608.14929 ± 82.413
2025-09-14 12:13:21,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(770.0569), np.float32(514.353), np.float32(653.4803), np.float32(532.683), np.float32(672.58264), np.float32(529.40125), np.float32(617.4951), np.float32(667.62866), np.float32(503.16177), np.float32(620.6508)]
2025-09-14 12:13:21,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:13:21,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (608.15) for latency 15
2025-09-14 12:13:21,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 45 minutes, 18 seconds)
2025-09-14 12:16:25,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:16:34,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 774.93024 ± 239.813
2025-09-14 12:16:34,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(684.36884), np.float32(510.97668), np.float32(1208.9846), np.float32(649.2765), np.float32(635.12), np.float32(665.462), np.float32(1230.9246), np.float32(579.1406), np.float32(878.04047), np.float32(707.0079)]
2025-09-14 12:16:34,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:16:34,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (774.93) for latency 15
2025-09-14 12:16:34,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 45 minutes, 2 seconds)
2025-09-14 12:19:26,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:19:35,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 824.30548 ± 87.339
2025-09-14 12:19:35,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(752.79663), np.float32(777.3956), np.float32(953.6134), np.float32(728.4047), np.float32(744.82214), np.float32(852.3801), np.float32(961.7413), np.float32(780.8541), np.float32(761.06226), np.float32(929.9846)]
2025-09-14 12:19:35,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:19:35,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (824.31) for latency 15
2025-09-14 12:19:35,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 38 minutes, 46 seconds)
2025-09-14 12:22:40,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:22:49,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 787.74939 ± 193.950
2025-09-14 12:22:49,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(802.0586), np.float32(291.5411), np.float32(973.8244), np.float32(730.9134), np.float32(657.30585), np.float32(979.835), np.float32(865.1614), np.float32(815.4816), np.float32(797.2178), np.float32(964.1549)]
2025-09-14 12:22:49,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:22:49,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 35 minutes, 56 seconds)
2025-09-14 12:25:55,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:26:04,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1176.70789 ± 241.946
2025-09-14 12:26:04,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1011.4862), np.float32(1267.004), np.float32(1658.1302), np.float32(832.57965), np.float32(1074.6077), np.float32(985.548), np.float32(1102.9187), np.float32(1207.6799), np.float32(1080.2097), np.float32(1546.9143)]
2025-09-14 12:26:04,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:26:04,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1176.71) for latency 15
2025-09-14 12:26:04,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 37 minutes, 1 second)
2025-09-14 12:28:43,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:28:50,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1175.02148 ± 255.506
2025-09-14 12:28:50,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(937.0863), np.float32(1092.4913), np.float32(1314.212), np.float32(1075.3931), np.float32(1076.3237), np.float32(1348.1528), np.float32(1526.5784), np.float32(869.00555), np.float32(1636.4093), np.float32(874.56226)]
2025-09-14 12:28:50,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:28:50,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 26 minutes, 18 seconds)
2025-09-14 12:31:29,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:31:36,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 977.34961 ± 282.834
2025-09-14 12:31:36,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(965.6018), np.float32(1023.72656), np.float32(1006.98236), np.float32(1539.7675), np.float32(967.141), np.float32(1020.99817), np.float32(286.12274), np.float32(977.1673), np.float32(1002.4324), np.float32(983.55707)]
2025-09-14 12:31:36,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:31:36,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 15 minutes, 39 seconds)
2025-09-14 12:34:16,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:34:24,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1269.10059 ± 293.454
2025-09-14 12:34:24,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1213.1415), np.float32(905.6225), np.float32(1815.5006), np.float32(1171.0338), np.float32(1039.6665), np.float32(1142.2324), np.float32(1817.2056), np.float32(1238.9675), np.float32(1052.3348), np.float32(1295.3002)]
2025-09-14 12:34:24,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:34:24,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1269.10) for latency 15
2025-09-14 12:34:24,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 8 minutes, 44 seconds)
2025-09-14 12:37:01,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:37:08,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1340.89722 ± 434.917
2025-09-14 12:37:08,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1138.8431), np.float32(2502.6147), np.float32(1086.4612), np.float32(979.51105), np.float32(1403.7336), np.float32(1588.3275), np.float32(1016.31537), np.float32(1187.6483), np.float32(1033.4022), np.float32(1472.1147)]
2025-09-14 12:37:08,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:37:08,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1340.90) for latency 15
2025-09-14 12:37:08,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 57 minutes, 38 seconds)
2025-09-14 12:39:40,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:39:48,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1370.40210 ± 352.227
2025-09-14 12:39:48,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(963.57947), np.float32(976.5967), np.float32(1153.939), np.float32(1065.4247), np.float32(1635.0209), np.float32(1429.8248), np.float32(2099.177), np.float32(1625.198), np.float32(1612.6735), np.float32(1142.5864)]
2025-09-14 12:39:48,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:39:48,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1370.40) for latency 15
2025-09-14 12:39:48,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 45 minutes, 11 seconds)
2025-09-14 12:42:27,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:42:34,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1302.49866 ± 254.098
2025-09-14 12:42:34,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1334.9738), np.float32(999.3223), np.float32(1036.2089), np.float32(1763.1509), np.float32(1146.8889), np.float32(1697.344), np.float32(1367.4739), np.float32(1400.677), np.float32(1033.5398), np.float32(1245.406)]
2025-09-14 12:42:34,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:42:34,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 42 minutes, 31 seconds)
2025-09-14 12:45:14,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:45:21,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1315.95776 ± 222.129
2025-09-14 12:45:21,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1041.9001), np.float32(1272.6559), np.float32(1492.2062), np.float32(1677.6665), np.float32(1465.4496), np.float32(1055.6155), np.float32(1097.7969), np.float32(1511.4861), np.float32(1465.7751), np.float32(1079.0269)]
2025-09-14 12:45:21,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:45:21,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 40 minutes, 6 seconds)
2025-09-14 12:47:53,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:48:01,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1676.99548 ± 613.153
2025-09-14 12:48:01,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2894.2864), np.float32(1077.7484), np.float32(1002.518), np.float32(1600.1271), np.float32(2061.6887), np.float32(1428.6779), np.float32(1118.124), np.float32(2589.5598), np.float32(1358.1774), np.float32(1639.0466)]
2025-09-14 12:48:01,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:48:01,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1677.00) for latency 15
2025-09-14 12:48:01,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 35 minutes, 5 seconds)
2025-09-14 12:50:39,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:50:46,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1449.98462 ± 266.362
2025-09-14 12:50:46,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1406.1234), np.float32(1378.7621), np.float32(1803.3378), np.float32(1716.8523), np.float32(1181.841), np.float32(1452.0974), np.float32(1125.9155), np.float32(1284.7032), np.float32(1943.1133), np.float32(1207.0994)]
2025-09-14 12:50:46,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:50:46,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 32 minutes, 47 seconds)
2025-09-14 12:53:24,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:53:32,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1711.54883 ± 575.886
2025-09-14 12:53:32,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1182.6395), np.float32(1605.9971), np.float32(1444.3623), np.float32(2473.0032), np.float32(1044.0005), np.float32(2353.7825), np.float32(1072.9913), np.float32(2132.6604), np.float32(1248.0999), np.float32(2557.9526)]
2025-09-14 12:53:32,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:53:32,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1711.55) for latency 15
2025-09-14 12:53:32,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 31 minutes, 32 seconds)
2025-09-14 12:56:01,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:56:08,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1619.41528 ± 514.650
2025-09-14 12:56:08,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1123.8295), np.float32(1770.1182), np.float32(2778.2864), np.float32(1037.137), np.float32(1820.1771), np.float32(1258.4021), np.float32(2210.0037), np.float32(1531.4175), np.float32(1276.0198), np.float32(1388.7617)]
2025-09-14 12:56:08,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:56:08,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 26 minutes, 1 second)
2025-09-14 12:58:48,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:58:55,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1370.81335 ± 233.477
2025-09-14 12:58:55,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1548.3153), np.float32(1371.8351), np.float32(975.72144), np.float32(1471.5917), np.float32(960.4676), np.float32(1284.6648), np.float32(1764.4786), np.float32(1461.4841), np.float32(1443.9769), np.float32(1425.5989)]
2025-09-14 12:58:55,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:58:55,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 23 minutes, 21 seconds)
2025-09-14 13:01:35,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:01:43,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1641.71936 ± 490.901
2025-09-14 13:01:43,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1525.2526), np.float32(1371.9899), np.float32(2255.3723), np.float32(1376.4691), np.float32(1863.454), np.float32(1300.4625), np.float32(1126.8312), np.float32(1214.9241), np.float32(2765.5906), np.float32(1616.8474)]
2025-09-14 13:01:43,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:01:43,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 22 minutes, 44 seconds)
2025-09-14 13:04:13,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:04:21,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1624.67578 ± 520.042
2025-09-14 13:04:21,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1056.8599), np.float32(1849.8057), np.float32(1127.1459), np.float32(1296.4441), np.float32(1109.0997), np.float32(1506.2783), np.float32(1392.2278), np.float32(2078.4065), np.float32(2110.0225), np.float32(2720.4678)]
2025-09-14 13:04:21,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:04:21,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 18 minutes, 7 seconds)
2025-09-14 13:06:57,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:07:05,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1549.87170 ± 405.655
2025-09-14 13:07:05,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1213.6339), np.float32(1283.5402), np.float32(1185.2782), np.float32(1408.3293), np.float32(2595.1638), np.float32(1927.6157), np.float32(1419.7062), np.float32(1338.8525), np.float32(1638.969), np.float32(1487.6287)]
2025-09-14 13:07:05,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:07:05,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 15 minutes, 2 seconds)
2025-09-14 13:09:43,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:09:51,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2106.81592 ± 560.955
2025-09-14 13:09:51,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2394.6804), np.float32(1503.5924), np.float32(2889.9446), np.float32(1279.4553), np.float32(1919.7909), np.float32(1499.0588), np.float32(2285.1428), np.float32(2092.5647), np.float32(3089.5332), np.float32(2114.3975)]
2025-09-14 13:09:51,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:09:51,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2106.82) for latency 15
2025-09-14 13:09:51,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 14 minutes, 48 seconds)
2025-09-14 13:12:25,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:12:32,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1438.31628 ± 193.202
2025-09-14 13:12:32,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1751.2994), np.float32(1181.1954), np.float32(1482.454), np.float32(1620.4044), np.float32(1221.7786), np.float32(1559.0281), np.float32(1289.8358), np.float32(1584.2688), np.float32(1184.5696), np.float32(1508.3289)]
2025-09-14 13:12:32,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:12:32,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 10 minutes, 35 seconds)
2025-09-14 13:14:58,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:15:05,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1554.44324 ± 307.164
2025-09-14 13:15:05,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1675.0345), np.float32(1939.7516), np.float32(1204.9103), np.float32(1699.3873), np.float32(1295.9576), np.float32(1269.8324), np.float32(1345.5243), np.float32(2063.794), np.float32(1221.2373), np.float32(1829.0033)]
2025-09-14 13:15:05,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:15:05,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 4 minutes, 33 seconds)
2025-09-14 13:17:29,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:17:36,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1575.05664 ± 395.081
2025-09-14 13:17:36,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1282.6232), np.float32(1570.245), np.float32(1338.3259), np.float32(2378.8083), np.float32(1286.5232), np.float32(1286.69), np.float32(1624.7952), np.float32(2272.8242), np.float32(1456.2668), np.float32(1253.4652)]
2025-09-14 13:17:36,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:17:36,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 21 seconds)
2025-09-14 13:20:01,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:20:08,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1614.57410 ± 243.103
2025-09-14 13:20:08,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1574.3864), np.float32(1798.551), np.float32(1471.2709), np.float32(1399.9653), np.float32(1455.5122), np.float32(2002.3684), np.float32(1445.8635), np.float32(1409.8842), np.float32(2092.7354), np.float32(1495.2037)]
2025-09-14 13:20:08,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:20:08,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 54 minutes, 58 seconds)
2025-09-14 13:22:33,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:22:40,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1782.25842 ± 596.056
2025-09-14 13:22:40,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1470.5128), np.float32(1150.7102), np.float32(1320.8903), np.float32(1945.8468), np.float32(2757.7744), np.float32(2850.1216), np.float32(1349.5421), np.float32(1512.2325), np.float32(1252.8485), np.float32(2212.1042)]
2025-09-14 13:22:40,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:22:40,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 49 minutes, 9 seconds)
2025-09-14 13:25:06,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:25:12,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1613.21863 ± 384.965
2025-09-14 13:25:12,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1256.2339), np.float32(1586.1444), np.float32(1423.8947), np.float32(2548.6113), np.float32(1342.1759), np.float32(1304.3457), np.float32(2090.2075), np.float32(1420.1681), np.float32(1520.3213), np.float32(1640.0839)]
2025-09-14 13:25:12,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:25:12,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 44 minutes, 46 seconds)
2025-09-14 13:27:38,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:27:44,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1607.38916 ± 352.824
2025-09-14 13:27:44,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1717.1925), np.float32(1569.2977), np.float32(1417.2152), np.float32(1949.943), np.float32(1074.9503), np.float32(1253.6443), np.float32(2356.2322), np.float32(1684.9224), np.float32(1736.1172), np.float32(1314.3773)]
2025-09-14 13:27:44,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:27:44,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 42 minutes)
2025-09-14 13:30:07,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:30:14,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1945.48792 ± 526.781
2025-09-14 13:30:14,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1321.926), np.float32(2781.3645), np.float32(1621.0515), np.float32(2185.7502), np.float32(2025.3882), np.float32(2312.8782), np.float32(2521.2227), np.float32(1280.1836), np.float32(1207.0853), np.float32(2198.027)]
2025-09-14 13:30:14,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:30:14,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 39 minutes, 9 seconds)
2025-09-14 13:32:38,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:32:45,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1507.47607 ± 282.014
2025-09-14 13:32:45,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1490.3643), np.float32(1682.8125), np.float32(1418.7216), np.float32(2108.9697), np.float32(1266.6099), np.float32(1192.844), np.float32(1754.4462), np.float32(1219.2561), np.float32(1257.3145), np.float32(1683.4216)]
2025-09-14 13:32:45,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:32:45,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 36 minutes, 21 seconds)
2025-09-14 13:35:09,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:35:16,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2067.58472 ± 661.240
2025-09-14 13:35:16,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1625.148), np.float32(2363.3347), np.float32(1942.1934), np.float32(2284.9314), np.float32(1580.251), np.float32(1117.9854), np.float32(3353.2297), np.float32(1272.0897), np.float32(2791.672), np.float32(2345.0122)]
2025-09-14 13:35:16,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:35:16,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 33 minutes, 48 seconds)
2025-09-14 14:01:02,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:01:08,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1886.47302 ± 481.493
2025-09-14 14:01:08,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2162.6042), np.float32(1186.074), np.float32(1331.4679), np.float32(2155.9731), np.float32(1701.8258), np.float32(2700.5884), np.float32(1516.1146), np.float32(2581.67), np.float32(1872.9338), np.float32(1655.4791)]
2025-09-14 14:01:08,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:01:08,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 7 hours, 11 minutes, 12 seconds)
2025-09-14 14:03:37,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:03:43,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1684.80139 ± 469.209
2025-09-14 14:03:43,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1737.4128), np.float32(478.57144), np.float32(1552.0404), np.float32(1796.0353), np.float32(1496.554), np.float32(2117.9749), np.float32(1668.556), np.float32(1752.4291), np.float32(1900.9286), np.float32(2347.5115)]
2025-09-14 14:03:43,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:03:43,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 7 hours, 4 minutes, 35 seconds)
2025-09-14 14:06:03,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:06:10,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2115.01416 ± 699.281
2025-09-14 14:06:10,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2181.45), np.float32(1617.4458), np.float32(3064.808), np.float32(3262.1501), np.float32(1393.6136), np.float32(1301.3594), np.float32(1834.5356), np.float32(2412.8604), np.float32(1313.2511), np.float32(2768.6667)]
2025-09-14 14:06:10,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:06:10,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2115.01) for latency 15
2025-09-14 14:06:10,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 6 hours, 56 minutes, 47 seconds)
2025-09-14 14:08:30,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:08:37,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2438.45166 ± 640.718
2025-09-14 14:08:37,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1425.3243), np.float32(2012.39), np.float32(1776.0856), np.float32(2070.0842), np.float32(3165.1091), np.float32(1863.4529), np.float32(2808.6462), np.float32(3176.0588), np.float32(3202.4893), np.float32(2884.8757)]
2025-09-14 14:08:37,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:08:37,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2438.45) for latency 15
2025-09-14 14:08:37,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 6 hours, 48 minutes, 57 seconds)
2025-09-14 14:10:57,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:11:04,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1706.91663 ± 214.500
2025-09-14 14:11:04,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1720.3695), np.float32(1734.8877), np.float32(1776.502), np.float32(1368.4073), np.float32(1865.8293), np.float32(1687.5333), np.float32(1594.8491), np.float32(2175.129), np.float32(1420.6113), np.float32(1725.0475)]
2025-09-14 14:11:04,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:11:04,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 6 hours, 40 minutes, 58 seconds)
2025-09-14 14:13:25,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:13:32,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2068.51440 ± 670.351
2025-09-14 14:13:32,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1310.6053), np.float32(1331.439), np.float32(2174.1511), np.float32(2109.0483), np.float32(1571.617), np.float32(2109.9927), np.float32(2950.6196), np.float32(3457.7913), np.float32(2220.46), np.float32(1449.4214)]
2025-09-14 14:13:32,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:13:32,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 16 minutes, 15 seconds)
2025-09-14 14:15:52,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:15:59,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2115.49170 ± 609.729
2025-09-14 14:15:59,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2060.2869), np.float32(2185.49), np.float32(2969.0017), np.float32(2420.1772), np.float32(1356.7395), np.float32(3195.3435), np.float32(1690.852), np.float32(2376.2458), np.float32(1532.9207), np.float32(1367.8591)]
2025-09-14 14:15:59,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:15:59,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 12 minutes, 21 seconds)
2025-09-14 14:18:19,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:18:26,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2137.54248 ± 1073.740
2025-09-14 14:18:26,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3636.1562), np.float32(3627.8352), np.float32(1109.2931), np.float32(1649.3357), np.float32(3984.203), np.float32(1392.1385), np.float32(1294.0645), np.float32(1456.3997), np.float32(1419.0372), np.float32(1806.9618)]
2025-09-14 14:18:26,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:18:26,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 10 minutes, 1 second)
2025-09-14 14:20:46,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:20:53,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1829.38147 ± 410.439
2025-09-14 14:20:53,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2286.1082), np.float32(2219.5574), np.float32(1624.6329), np.float32(1590.3467), np.float32(2071.0142), np.float32(2543.405), np.float32(1427.048), np.float32(1474.631), np.float32(1222.1449), np.float32(1834.9276)]
2025-09-14 14:20:53,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:20:53,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 7 minutes, 33 seconds)
2025-09-14 14:23:13,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:23:20,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2278.90112 ± 1032.809
2025-09-14 14:23:20,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2051.577), np.float32(1517.5245), np.float32(1648.8428), np.float32(1954.0479), np.float32(1364.3588), np.float32(1278.8209), np.float32(4136.0894), np.float32(3421.296), np.float32(1573.8405), np.float32(3842.6147)]
2025-09-14 14:23:20,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:23:20,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 5 minutes, 6 seconds)
2025-09-14 14:25:41,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:25:47,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2058.86694 ± 698.270
2025-09-14 14:25:47,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1930.1509), np.float32(1589.1627), np.float32(3039.5977), np.float32(1357.9232), np.float32(3091.9233), np.float32(1255.9839), np.float32(1455.1807), np.float32(3084.9526), np.float32(1909.0778), np.float32(1874.7175)]
2025-09-14 14:25:47,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:25:47,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 2 minutes, 37 seconds)
2025-09-14 14:28:08,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:28:15,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2358.76123 ± 742.686
2025-09-14 14:28:15,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1452.7129), np.float32(2472.629), np.float32(2210.172), np.float32(1592.1665), np.float32(3453.5881), np.float32(3777.9412), np.float32(2045.8735), np.float32(1604.1334), np.float32(2770.3352), np.float32(2208.0586)]
2025-09-14 14:28:15,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:28:15,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 11 seconds)
2025-09-14 14:30:35,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:30:42,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1676.63843 ± 579.943
2025-09-14 14:30:42,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1325.9381), np.float32(1290.9221), np.float32(1508.5657), np.float32(1642.4916), np.float32(1414.0789), np.float32(3358.4783), np.float32(1562.1129), np.float32(1689.9194), np.float32(1287.73), np.float32(1686.1478)]
2025-09-14 14:30:42,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:30:42,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 57 minutes, 43 seconds)
2025-09-14 14:33:02,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:33:09,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1856.85583 ± 412.152
2025-09-14 14:33:09,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2432.196), np.float32(1327.5839), np.float32(2122.5889), np.float32(2332.8591), np.float32(1685.7048), np.float32(1455.4785), np.float32(1369.1996), np.float32(2419.6892), np.float32(1751.5667), np.float32(1671.6912)]
2025-09-14 14:33:09,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:33:09,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 55 minutes, 15 seconds)
2025-09-14 14:35:29,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:35:36,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2775.96973 ± 1146.174
2025-09-14 14:35:36,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3860.7524), np.float32(1386.4618), np.float32(3639.7397), np.float32(4095.523), np.float32(1362.1633), np.float32(3256.7727), np.float32(1540.8959), np.float32(1304.0508), np.float32(3842.5532), np.float32(3470.7854)]
2025-09-14 14:35:36,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:35:36,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2775.97) for latency 15
2025-09-14 14:35:36,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 52 minutes, 47 seconds)
2025-09-14 14:37:53,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:37:59,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2484.81226 ± 783.675
2025-09-14 14:37:59,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3386.8684), np.float32(2598.6538), np.float32(3099.6746), np.float32(3599.0854), np.float32(1813.92), np.float32(1992.3872), np.float32(2108.641), np.float32(1621.4008), np.float32(3327.909), np.float32(1299.5848)]
2025-09-14 14:37:59,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:37:59,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 49 minutes, 48 seconds)
2025-09-14 14:40:17,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:40:24,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2285.37500 ± 878.892
2025-09-14 14:40:24,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2281.1494), np.float32(2057.7898), np.float32(1701.4316), np.float32(1745.98), np.float32(3175.6707), np.float32(1324.1708), np.float32(3922.7278), np.float32(1555.7516), np.float32(3538.676), np.float32(1550.4025)]
2025-09-14 14:40:24,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:40:24,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 46 minutes, 54 seconds)
2025-09-14 14:42:40,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:42:47,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1973.12244 ± 476.197
2025-09-14 14:42:47,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1833.3385), np.float32(2117.1157), np.float32(1272.3334), np.float32(1530.3794), np.float32(2966.7842), np.float32(2558.054), np.float32(1685.3042), np.float32(1639.5156), np.float32(2064.9258), np.float32(2063.4739)]
2025-09-14 14:42:47,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:42:47,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 43 minutes, 57 seconds)
2025-09-14 14:45:03,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:45:10,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2073.20581 ± 432.938
2025-09-14 14:45:10,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2178.1758), np.float32(2831.0142), np.float32(1880.2727), np.float32(1530.5032), np.float32(2190.6013), np.float32(2655.794), np.float32(1763.144), np.float32(1549.6565), np.float32(2418.3982), np.float32(1734.498)]
2025-09-14 14:45:10,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:45:10,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 40 minutes, 56 seconds)
2025-09-14 14:47:27,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:47:34,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2643.43408 ± 854.946
2025-09-14 14:47:34,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1599.2253), np.float32(1910.7599), np.float32(1932.6393), np.float32(3534.809), np.float32(2772.808), np.float32(3654.1099), np.float32(4161.4805), np.float32(1789.9313), np.float32(2919.771), np.float32(2158.804)]
2025-09-14 14:47:34,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:47:34,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 38 minutes, 9 seconds)
2025-09-14 14:49:52,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:49:58,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1731.46484 ± 796.498
2025-09-14 14:49:58,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1353.3577), np.float32(4089.512), np.float32(1313.7866), np.float32(1480.2231), np.float32(1579.3853), np.float32(1411.9264), np.float32(1468.2361), np.float32(1410.7373), np.float32(1412.5001), np.float32(1794.9834)]
2025-09-14 14:49:58,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:49:58,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 35 minutes, 52 seconds)
2025-09-14 14:52:16,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:52:23,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2711.86035 ± 1161.708
2025-09-14 14:52:23,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4158.5273), np.float32(1396.443), np.float32(3489.291), np.float32(1634.5204), np.float32(1826.932), np.float32(1592.3011), np.float32(3727.6545), np.float32(1390.5366), np.float32(3848.0564), np.float32(4054.339)]
2025-09-14 14:52:23,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:52:23,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 33 minutes, 32 seconds)
2025-09-14 14:54:41,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:54:48,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2712.83203 ± 1110.838
2025-09-14 14:54:48,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1937.4427), np.float32(4139.147), np.float32(1791.168), np.float32(2114.9287), np.float32(1466.8381), np.float32(4447.4497), np.float32(1759.2162), np.float32(1928.1626), np.float32(4060.9236), np.float32(3483.0425)]
2025-09-14 14:54:48,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:54:48,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 31 minutes, 16 seconds)
2025-09-14 14:57:05,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:57:12,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2168.68628 ± 795.765
2025-09-14 14:57:12,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2114.8845), np.float32(1743.1156), np.float32(3509.1855), np.float32(2642.8796), np.float32(1495.3903), np.float32(1791.065), np.float32(1481.0109), np.float32(1256.7504), np.float32(2001.2815), np.float32(3651.301)]
2025-09-14 14:57:12,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:57:12,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 29 minutes, 3 seconds)
2025-09-14 14:59:29,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:59:36,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2082.27637 ± 746.083
2025-09-14 14:59:36,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2028.5095), np.float32(1635.5105), np.float32(2703.124), np.float32(4067.9893), np.float32(1828.621), np.float32(1365.4912), np.float32(1883.2863), np.float32(1856.261), np.float32(1965.8806), np.float32(1488.0892)]
2025-09-14 14:59:36,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:59:36,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 26 minutes, 35 seconds)
2025-09-14 15:01:52,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:01:59,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2508.88843 ± 1027.491
2025-09-14 15:01:59,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1508.9036), np.float32(3122.3728), np.float32(3859.742), np.float32(2279.5073), np.float32(3987.9114), np.float32(3807.1367), np.float32(1657.9387), np.float32(1443.3044), np.float32(2097.0015), np.float32(1325.0671)]
2025-09-14 15:01:59,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:01:59,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 24 minutes, 5 seconds)
2025-09-14 15:04:16,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:04:22,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2851.85010 ± 1137.565
2025-09-14 15:04:22,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3999.0486), np.float32(1523.1195), np.float32(2216.131), np.float32(1580.6422), np.float32(4148.2915), np.float32(2226.065), np.float32(4093.2427), np.float32(2832.3318), np.float32(1499.667), np.float32(4399.9614)]
2025-09-14 15:04:22,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:04:22,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2851.85) for latency 15
2025-09-14 15:04:22,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 21 minutes, 31 seconds)
2025-09-14 15:06:39,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:06:46,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3082.94214 ± 1024.225
2025-09-14 15:06:46,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4091.6377), np.float32(3713.6318), np.float32(3156.629), np.float32(3918.0535), np.float32(1822.7449), np.float32(3789.2922), np.float32(3956.3357), np.float32(1317.46), np.float32(3479.9878), np.float32(1583.6489)]
2025-09-14 15:06:46,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:06:46,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3082.94) for latency 15
2025-09-14 15:06:46,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 19 minutes, 1 second)
2025-09-14 15:09:02,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:09:08,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2982.17993 ± 965.532
2025-09-14 15:09:08,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2649.1086), np.float32(1424.8408), np.float32(4032.52), np.float32(3618.9583), np.float32(3645.6492), np.float32(3729.596), np.float32(3581.029), np.float32(3705.6902), np.float32(1403.8805), np.float32(2030.525)]
2025-09-14 15:09:08,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:09:08,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 16 minutes, 25 seconds)
2025-09-14 15:11:25,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:11:32,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2824.70361 ± 1078.718
2025-09-14 15:11:32,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3558.9343), np.float32(1946.3933), np.float32(3935.0764), np.float32(1462.0481), np.float32(3883.4766), np.float32(4245.917), np.float32(1662.4795), np.float32(3701.962), np.float32(1502.7214), np.float32(2348.0298)]
2025-09-14 15:11:32,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:11:32,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 14 minutes)
2025-09-14 15:13:49,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:13:56,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2873.41040 ± 1187.364
2025-09-14 15:13:56,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1317.5059), np.float32(1267.7808), np.float32(4448.752), np.float32(3221.3503), np.float32(1457.2656), np.float32(3824.1133), np.float32(3976.9917), np.float32(3915.9565), np.float32(1882.544), np.float32(3421.8452)]
2025-09-14 15:13:56,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:13:56,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 11 minutes, 41 seconds)
2025-09-14 15:16:08,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:16:14,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2196.94263 ± 849.627
2025-09-14 15:16:14,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1428.2426), np.float32(3562.5032), np.float32(1543.4802), np.float32(1367.7761), np.float32(1498.7146), np.float32(1462.8127), np.float32(2184.637), np.float32(3384.7166), np.float32(2239.964), np.float32(3296.5789)]
2025-09-14 15:16:14,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:16:14,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 8 minutes, 49 seconds)
2025-09-14 15:18:26,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:18:33,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2225.48364 ± 868.406
2025-09-14 15:18:33,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3157.3142), np.float32(2414.497), np.float32(1312.412), np.float32(1249.5724), np.float32(3784.7612), np.float32(1396.0359), np.float32(2006.0985), np.float32(3153.0276), np.float32(1343.287), np.float32(2437.829)]
2025-09-14 15:18:33,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:18:33,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 5 minutes, 57 seconds)
2025-09-14 15:20:44,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:20:51,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2968.02295 ± 1130.825
2025-09-14 15:20:51,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2116.9873), np.float32(4020.4612), np.float32(4045.869), np.float32(1225.9574), np.float32(4102.0576), np.float32(2138.7083), np.float32(1835.6493), np.float32(3922.2654), np.float32(4262.2046), np.float32(2010.0713)]
2025-09-14 15:20:51,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:20:51,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 3 minutes, 14 seconds)
2025-09-14 15:23:22,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:23:30,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2825.11670 ± 1090.974
2025-09-14 15:23:30,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1556.0388), np.float32(4472.6255), np.float32(3424.3247), np.float32(1341.1881), np.float32(4000.7666), np.float32(4062.768), np.float32(2759.9438), np.float32(2780.9888), np.float32(1436.2092), np.float32(2416.312)]
2025-09-14 15:23:30,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:23:30,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 2 minutes, 14 seconds)
2025-09-14 15:26:17,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:26:23,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3121.65112 ± 1177.691
2025-09-14 15:26:23,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1398.3325), np.float32(1636.8921), np.float32(4256.7827), np.float32(2353.488), np.float32(1748.99), np.float32(4207.068), np.float32(4083.1658), np.float32(4433.66), np.float32(2939.536), np.float32(4158.5986)]
2025-09-14 15:26:23,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:26:23,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3121.65) for latency 15
2025-09-14 15:26:23,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 2 minutes, 17 seconds)
2025-09-14 15:29:01,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:29:08,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2294.22632 ± 1012.097
2025-09-14 15:29:08,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1862.4098), np.float32(4224.5356), np.float32(1935.5753), np.float32(1439.7783), np.float32(3948.2556), np.float32(2038.1777), np.float32(1273.214), np.float32(3067.4307), np.float32(1573.4834), np.float32(1579.4037)]
2025-09-14 15:29:08,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:29:08,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 1 minute, 51 seconds)
2025-09-14 15:31:44,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:31:51,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3096.13086 ± 641.275
2025-09-14 15:31:51,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3141.4985), np.float32(2166.6301), np.float32(2578.2598), np.float32(2552.713), np.float32(2707.3464), np.float32(3709.402), np.float32(4235.3594), np.float32(3419.022), np.float32(3825.527), np.float32(2625.5518)]
2025-09-14 15:31:51,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:31:51,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 1 minute, 12 seconds)
2025-09-14 15:34:26,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:34:33,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3418.32227 ± 1097.948
2025-09-14 15:34:33,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4363.0884), np.float32(2043.3636), np.float32(4572.157), np.float32(3949.1372), np.float32(2200.3792), np.float32(4247.644), np.float32(3077.8838), np.float32(4136.4824), np.float32(4225.298), np.float32(1367.79)]
2025-09-14 15:34:33,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:34:33,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3418.32) for latency 15
2025-09-14 15:34:33,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 17 seconds)
2025-09-14 15:37:11,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:37:18,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2954.84229 ± 1077.759
2025-09-14 15:37:18,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4116.9834), np.float32(3822.8215), np.float32(3757.394), np.float32(3875.7527), np.float32(2229.6147), np.float32(1232.6177), np.float32(1721.1538), np.float32(1546.1384), np.float32(3369.7285), np.float32(3876.2183)]
2025-09-14 15:37:18,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:37:18,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 57 minutes, 59 seconds)
2025-09-14 15:39:56,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:40:03,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3222.39160 ± 1122.810
2025-09-14 15:40:03,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3311.3545), np.float32(3677.319), np.float32(2462.5574), np.float32(2317.5547), np.float32(4442.2295), np.float32(1762.9368), np.float32(4002.7275), np.float32(4589.356), np.float32(4344.022), np.float32(1313.8588)]
2025-09-14 15:40:03,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:40:03,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 54 minutes, 38 seconds)
2025-09-14 15:42:40,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:42:48,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3464.77856 ± 966.870
2025-09-14 15:42:48,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3891.4746), np.float32(2955.7031), np.float32(4278.9814), np.float32(4310.5845), np.float32(4681.687), np.float32(2395.7375), np.float32(4567.513), np.float32(3170.8438), np.float32(1755.2646), np.float32(2639.995)]
2025-09-14 15:42:48,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:42:48,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3464.78) for latency 15
2025-09-14 15:42:48,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 51 minutes, 56 seconds)
2025-09-14 15:45:13,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:45:20,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3464.09253 ± 573.630
2025-09-14 15:45:20,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3892.1382), np.float32(4296.7866), np.float32(3385.9119), np.float32(3943.8882), np.float32(3010.2258), np.float32(4378.8267), np.float32(2872.92), np.float32(2910.2197), np.float32(2965.3882), np.float32(2984.6206)]
2025-09-14 15:45:20,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:45:20,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 48 minutes, 31 seconds)
2025-09-14 15:47:53,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:48:00,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2675.13770 ± 1064.477
2025-09-14 15:48:00,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2381.2463), np.float32(1396.4124), np.float32(4261.4316), np.float32(3866.4563), np.float32(1337.011), np.float32(4207.849), np.float32(2390.0972), np.float32(1476.9792), np.float32(2633.5288), np.float32(2800.3628)]
2025-09-14 15:48:00,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:48:00,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 45 minutes, 44 seconds)
2025-09-14 15:50:40,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:50:48,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3385.80859 ± 858.621
2025-09-14 15:50:48,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3731.5415), np.float32(3452.068), np.float32(2697.1096), np.float32(3061.6348), np.float32(3966.354), np.float32(3935.6526), np.float32(3005.4727), np.float32(4303.5576), np.float32(1356.7642), np.float32(4347.932)]
2025-09-14 15:50:48,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:50:48,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 43 minutes, 10 seconds)
2025-09-14 15:53:26,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:53:33,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2753.51831 ± 1034.740
2025-09-14 15:53:33,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4141.2314), np.float32(3871.3894), np.float32(1653.4491), np.float32(3171.201), np.float32(3955.0203), np.float32(2215.353), np.float32(1804.2003), np.float32(1558.2211), np.float32(1576.2682), np.float32(3588.85)]
2025-09-14 15:53:33,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:53:33,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 40 minutes, 30 seconds)
2025-09-14 15:56:00,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:56:07,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3498.94458 ± 1078.604
2025-09-14 15:56:07,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2388.2764), np.float32(2097.8296), np.float32(3911.45), np.float32(3943.369), np.float32(1305.0771), np.float32(4422.6377), np.float32(4153.4077), np.float32(3848.139), np.float32(4519.386), np.float32(4399.8716)]
2025-09-14 15:56:07,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:56:07,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3498.94) for latency 15
2025-09-14 15:56:07,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 37 minutes, 17 seconds)
2025-09-14 15:58:40,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:58:47,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2967.99658 ± 1051.651
2025-09-14 15:58:47,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3910.7014), np.float32(4072.1023), np.float32(3793.583), np.float32(4332.5747), np.float32(1459.1436), np.float32(1973.7683), np.float32(2202.7358), np.float32(1600.8586), np.float32(2615.4016), np.float32(3719.096)]
2025-09-14 15:58:47,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:58:47,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 34 minutes, 58 seconds)
2025-09-14 16:01:25,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:01:32,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2971.14746 ± 1162.724
2025-09-14 16:01:32,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1874.166), np.float32(4533.962), np.float32(4371.378), np.float32(1907.4812), np.float32(3756.6907), np.float32(2119.119), np.float32(3135.222), np.float32(1855.6749), np.float32(4518.339), np.float32(1639.4401)]
2025-09-14 16:01:32,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:01:32,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 32 minutes, 29 seconds)
2025-09-14 16:04:10,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:04:18,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3699.35034 ± 616.745
2025-09-14 16:04:18,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4448.2627), np.float32(2928.2253), np.float32(3968.1133), np.float32(3966.2358), np.float32(4163.823), np.float32(4545.138), np.float32(3586.9102), np.float32(2643.2317), np.float32(3711.672), np.float32(3031.8914)]
2025-09-14 16:04:18,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:04:18,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3699.35) for latency 15
2025-09-14 16:04:18,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 29 minutes, 41 seconds)
2025-09-14 16:06:51,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:06:57,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3140.36987 ± 1154.600
2025-09-14 16:06:57,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3052.3926), np.float32(4001.2673), np.float32(2622.3882), np.float32(4442.389), np.float32(1379.5826), np.float32(4135.182), np.float32(3652.8877), np.float32(4640.781), np.float32(1942.9462), np.float32(1533.8818)]
2025-09-14 16:06:57,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:06:57,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 26 minutes, 47 seconds)
2025-09-14 16:09:23,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:09:29,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2867.68994 ± 891.668
2025-09-14 16:09:29,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2581.8306), np.float32(4322.6616), np.float32(2959.819), np.float32(2364.378), np.float32(2737.3208), np.float32(3168.1653), np.float32(4462.039), np.float32(1667.7423), np.float32(1693.2833), np.float32(2719.6572)]
2025-09-14 16:09:29,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:09:29,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 24 minutes, 4 seconds)
2025-09-14 16:12:09,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:12:16,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3209.46338 ± 613.750
2025-09-14 16:12:16,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2675.4055), np.float32(3080.4497), np.float32(4407.351), np.float32(2805.6516), np.float32(3553.2083), np.float32(3576.9873), np.float32(3389.2993), np.float32(2097.5537), np.float32(2841.398), np.float32(3667.3271)]
2025-09-14 16:12:16,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:12:16,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 21 minutes, 34 seconds)
2025-09-14 16:14:54,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:15:01,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3893.17896 ± 914.633
2025-09-14 16:15:01,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4340.9297), np.float32(4184.8804), np.float32(4360.589), np.float32(4587.8945), np.float32(4489.68), np.float32(3200.591), np.float32(3921.2273), np.float32(1458.3354), np.float32(3728.358), np.float32(4659.3066)]
2025-09-14 16:15:01,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:15:01,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3893.18) for latency 15
2025-09-14 16:15:01,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 18 minutes, 52 seconds)
2025-09-14 16:17:39,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:17:46,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2961.38159 ± 980.234
2025-09-14 16:17:46,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1928.4708), np.float32(2319.728), np.float32(4500.418), np.float32(4066.581), np.float32(4147.87), np.float32(1996.295), np.float32(3113.3926), np.float32(1636.7823), np.float32(3411.9626), np.float32(2492.3167)]
2025-09-14 16:17:46,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:17:46,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 16 minutes, 10 seconds)
2025-09-14 16:20:13,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:20:20,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2940.12085 ± 1291.505
2025-09-14 16:20:20,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1788.72), np.float32(1400.3008), np.float32(4231.237), np.float32(1432.3245), np.float32(1623.3231), np.float32(4679.5884), np.float32(2790.6084), np.float32(2694.5825), np.float32(4218.577), np.float32(4541.9478)]
2025-09-14 16:20:20,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:20:20,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 13 minutes, 22 seconds)
2025-09-14 16:22:52,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:22:59,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3130.35083 ± 1219.951
2025-09-14 16:22:59,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4094.5105), np.float32(4076.3557), np.float32(1317.1687), np.float32(4263.4414), np.float32(1361.0918), np.float32(3620.1921), np.float32(2357.273), np.float32(4374.661), np.float32(4119.732), np.float32(1719.0803)]
2025-09-14 16:22:59,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:22:59,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 10 minutes, 47 seconds)
2025-09-14 16:25:37,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:25:45,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3180.40381 ± 1179.045
2025-09-14 16:25:45,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4578.629), np.float32(1733.0168), np.float32(4272.528), np.float32(2015.4166), np.float32(3884.0986), np.float32(2743.2715), np.float32(1318.1428), np.float32(4301.203), np.float32(4404.324), np.float32(2553.4077)]
2025-09-14 16:25:45,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:25:45,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 8 minutes, 5 seconds)
2025-09-14 16:28:22,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:28:30,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3417.88159 ± 1257.637
2025-09-14 16:28:30,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3433.2568), np.float32(4466.9995), np.float32(1512.0266), np.float32(4426.7104), np.float32(4316.0083), np.float32(2507.8843), np.float32(2674.259), np.float32(4642.8345), np.float32(4856.4854), np.float32(1342.3517)]
2025-09-14 16:28:30,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:28:30,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 23 seconds)
2025-09-14 16:31:05,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:31:12,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3089.19067 ± 1106.196
2025-09-14 16:31:12,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2614.1875), np.float32(4265.394), np.float32(1415.5981), np.float32(3540.9597), np.float32(1895.128), np.float32(4003.8967), np.float32(2833.8577), np.float32(4569.0796), np.float32(4104.653), np.float32(1649.1531)]
2025-09-14 16:31:12,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:31:12,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 41 seconds)
2025-09-14 16:33:37,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:33:44,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3594.22461 ± 1175.423
2025-09-14 16:33:44,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4419.4507), np.float32(3903.2407), np.float32(4774.7705), np.float32(1412.8115), np.float32(4283.9326), np.float32(4024.1663), np.float32(4627.242), np.float32(1418.2103), np.float32(3996.7483), np.float32(3081.6724)]
2025-09-14 16:33:44,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:33:44,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1251 [DEBUG]: Training session finished
