2025-09-16 09:05:37,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.025-delay_15
2025-09-16 09:05:37,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.025-delay_15
2025-09-16 09:05:37,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'15': <latency_env.delayed_mdp.ConstantDelay object at 0x1477de008910>}
2025-09-16 09:05:37,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1111 [DEBUG]: using device: cuda
2025-09-16 09:05:37,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-16 09:05:37,209 baseline-bpql-noisepromille25-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=107, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-16 09:05:37,209 baseline-bpql-noisepromille25-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 09:05:38,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-16 09:05:38,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-16 09:07:17,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:07:28,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: -466.26181 ± 61.621
2025-09-16 09:07:28,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [-447.52066, -433.8533, -407.5428, -427.56714, -426.3342, -478.35107, -519.2851, -404.2391, -504.01297, -613.9119]
2025-09-16 09:07:28,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:07:28,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (-466.26) for latency 15
2025-09-16 09:07:28,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 2 minutes, 1 second)
2025-09-16 09:09:12,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:09:23,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: -286.58051 ± 25.675
2025-09-16 09:09:23,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [-269.35233, -297.32166, -296.7906, -322.91168, -304.49847, -250.87523, -248.02608, -314.7975, -260.54083, -300.6908]
2025-09-16 09:09:23,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:09:23,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (-286.58) for latency 15
2025-09-16 09:09:23,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 4 minutes, 13 seconds)
2025-09-16 09:11:08,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:11:19,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: -137.79300 ± 111.098
2025-09-16 09:11:19,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [-186.1212, 25.94231, 7.383427, -356.81094, -140.97064, -25.954668, -187.32498, -168.85822, -113.9677, -231.2473]
2025-09-16 09:11:19,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:11:19,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (-137.79) for latency 15
2025-09-16 09:11:19,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 3 minutes, 44 seconds)
2025-09-16 09:13:03,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:13:14,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 133.38326 ± 119.210
2025-09-16 09:13:14,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [188.7962, 244.24458, 118.27653, 136.00717, -114.41545, -61.015163, 180.2869, 216.43915, 159.36955, 265.84314]
2025-09-16 09:13:14,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:13:14,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (133.38) for latency 15
2025-09-16 09:13:14,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 2 minutes, 37 seconds)
2025-09-16 09:14:59,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:15:09,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 8.67151 ± 184.635
2025-09-16 09:15:09,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [131.30563, -119.83354, 39.96926, -156.05852, 165.48, -232.92358, -237.5965, 334.9312, -21.260561, 182.70169]
2025-09-16 09:15:09,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:15:09,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 1 minute, 1 second)
2025-09-16 09:16:54,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:17:05,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 579.33777 ± 412.626
2025-09-16 09:17:05,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [911.09204, 849.15125, 311.66745, 138.79532, -92.97508, 483.35852, 1097.2653, 141.86641, 1070.3093, 882.847]
2025-09-16 09:17:05,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:17:05,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (579.34) for latency 15
2025-09-16 09:17:05,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 40 seconds)
2025-09-16 09:18:49,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:19:00,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 531.68933 ± 304.162
2025-09-16 09:19:00,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [1235.5417, 24.071415, 636.62103, 620.79224, 576.4786, 414.99548, 655.01624, 215.38766, 553.7672, 384.2218]
2025-09-16 09:19:00,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:19:00,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 58 minutes, 43 seconds)
2025-09-16 09:20:44,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:20:55,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 540.83691 ± 367.097
2025-09-16 09:20:55,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [208.29745, 190.1531, 1093.9111, 808.9287, 213.15648, 1020.1326, 515.5097, 258.66782, 949.0265, 150.58612]
2025-09-16 09:20:55,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:20:55,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 56 minutes, 40 seconds)
2025-09-16 09:22:39,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:22:50,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 985.30920 ± 363.893
2025-09-16 09:22:50,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [561.46045, 1231.401, 648.6577, 1162.3885, 1234.127, 1527.2128, 1502.5415, 668.38385, 690.7267, 626.1928]
2025-09-16 09:22:50,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:22:50,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (985.31) for latency 15
2025-09-16 09:22:50,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 54 minutes, 35 seconds)
2025-09-16 09:24:34,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:24:45,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1048.22192 ± 314.125
2025-09-16 09:24:45,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [1087.4258, 1100.7449, 1148.8462, 790.1145, 1693.7826, 822.4044, 660.8419, 947.8389, 748.4517, 1481.7665]
2025-09-16 09:24:45,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:24:45,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (1048.22) for latency 15
2025-09-16 09:24:45,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 52 minutes, 38 seconds)
2025-09-16 09:26:29,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:26:39,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1010.79749 ± 447.639
2025-09-16 09:26:39,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [914.8717, 1856.076, 951.2586, 691.0046, 1645.3582, 967.2911, 893.3007, 840.4816, 174.0241, 1174.3074]
2025-09-16 09:26:39,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:26:39,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 50 minutes, 33 seconds)
2025-09-16 09:28:23,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:28:34,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1266.58545 ± 359.184
2025-09-16 09:28:34,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [1001.3035, 1412.9363, 817.85626, 1091.4789, 1015.3879, 1569.3792, 880.8037, 2051.4883, 1457.2614, 1367.9579]
2025-09-16 09:28:34,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:28:34,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (1266.59) for latency 15
2025-09-16 09:28:34,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 48 minutes, 22 seconds)
2025-09-16 09:30:17,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:30:28,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1178.73792 ± 354.492
2025-09-16 09:30:28,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [791.6264, 967.0093, 1280.0476, 1451.6132, 1054.6503, 906.11475, 2059.1414, 1314.5455, 1070.7861, 891.845]
2025-09-16 09:30:28,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:30:28,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 46 minutes, 15 seconds)
2025-09-16 09:32:12,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:32:23,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1383.47778 ± 751.403
2025-09-16 09:32:23,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [1165.7534, 1030.9268, 2405.8384, 990.878, -44.449078, 2524.731, 877.9594, 1710.0392, 1092.1257, 2080.974]
2025-09-16 09:32:23,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:32:23,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (1383.48) for latency 15
2025-09-16 09:32:23,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 44 minutes, 13 seconds)
2025-09-16 09:34:07,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:34:18,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1960.22290 ± 654.446
2025-09-16 09:34:18,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [2493.1582, 1214.9652, 1756.4281, 1836.4679, 840.1238, 2716.177, 3023.271, 1509.8346, 1801.5414, 2410.2625]
2025-09-16 09:34:18,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:34:18,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (1960.22) for latency 15
2025-09-16 09:34:18,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 42 minutes, 29 seconds)
2025-09-16 09:36:02,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:36:13,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2073.93896 ± 744.848
2025-09-16 09:36:13,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [1295.4916, 2452.014, 978.11383, 1394.1659, 2399.7886, 3119.6443, 3140.956, 2119.0337, 1308.3043, 2531.8796]
2025-09-16 09:36:13,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:36:13,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (2073.94) for latency 15
2025-09-16 09:36:13,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 40 minutes, 29 seconds)
2025-09-16 09:37:55,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:38:06,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1955.52405 ± 745.341
2025-09-16 09:38:06,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [3090.9844, 2872.346, 1152.4213, 1867.7831, 1603.0763, 2169.9573, 1211.354, 1555.312, 2985.8872, 1046.1183]
2025-09-16 09:38:06,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:38:06,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 38 minutes, 19 seconds)
2025-09-16 09:39:48,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:39:59,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2827.95288 ± 821.554
2025-09-16 09:39:59,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [3638.862, 3461.5813, 1375.8562, 3367.9104, 1313.1687, 3509.542, 2458.7705, 3416.0862, 3019.9153, 2717.8367]
2025-09-16 09:39:59,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:39:59,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (2827.95) for latency 15
2025-09-16 09:39:59,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 36 minutes, 3 seconds)
2025-09-16 09:41:41,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:41:52,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2589.23926 ± 1023.585
2025-09-16 09:41:52,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [2779.2864, 3075.984, 1278.1815, 2280.1492, 3866.5554, 3817.6665, 2455.2644, 1126.7095, 1362.9717, 3849.6238]
2025-09-16 09:41:52,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:41:52,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 33 minutes, 43 seconds)
2025-09-16 09:43:34,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:43:45,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2019.19312 ± 740.141
2025-09-16 09:43:45,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [1117.0634, 1674.8073, 1627.9639, 1591.7693, 2853.0884, 1390.1222, 3397.8315, 1407.7966, 2214.3945, 2917.0957]
2025-09-16 09:43:45,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:43:45,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 31 minutes, 3 seconds)
2025-09-16 09:45:27,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:45:38,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3076.17383 ± 1132.483
2025-09-16 09:45:38,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [4061.0916, 3918.6147, 1428.9673, 1332.0702, 3973.1873, 3099.682, 3672.6704, 3821.7988, 1417.4049, 4036.251]
2025-09-16 09:45:38,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:45:38,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3076.17) for latency 15
2025-09-16 09:45:38,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 28 minutes, 48 seconds)
2025-09-16 09:47:20,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:47:31,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2615.28125 ± 1173.001
2025-09-16 09:47:31,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [1900.0615, 2021.7621, 3061.748, 3174.7139, 1364.4614, 1262.1777, 4177.851, 4005.8118, 1073.6418, 4110.5815]
2025-09-16 09:47:31,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:47:31,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 26 minutes, 52 seconds)
2025-09-16 09:49:13,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:49:24,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3285.12695 ± 1206.507
2025-09-16 09:49:24,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [3669.9937, 1761.9019, 4715.143, 1675.1921, 1641.5535, 2392.218, 4259.7983, 4391.0054, 4493.1167, 3851.348]
2025-09-16 09:49:24,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:49:24,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3285.13) for latency 15
2025-09-16 09:49:24,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 24 minutes, 59 seconds)
2025-09-16 09:51:06,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:51:16,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4447.19189 ± 1089.436
2025-09-16 09:51:16,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [4916.0664, 4707.2314, 4874.394, 1212.1471, 4846.771, 4702.6846, 4423.1387, 4969.957, 4966.6445, 4852.883]
2025-09-16 09:51:16,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:51:16,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (4447.19) for latency 15
2025-09-16 09:51:16,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 22 minutes, 59 seconds)
2025-09-16 09:52:58,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:53:09,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4689.80762 ± 716.673
2025-09-16 09:53:09,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5058.2407, 5134.5674, 2720.4268, 4115.4287, 5014.8223, 5123.997, 5062.2285, 4832.7915, 4781.703, 5053.8677]
2025-09-16 09:53:09,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:53:09,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (4689.81) for latency 15
2025-09-16 09:53:09,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 21 minutes, 3 seconds)
2025-09-16 09:54:53,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:55:04,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3708.06006 ± 1245.495
2025-09-16 09:55:04,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [4981.8257, 3637.5132, 1369.8125, 4274.306, 4956.874, 5112.527, 4431.608, 2176.8418, 3733.6274, 2405.6694]
2025-09-16 09:55:04,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:55:04,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 19 minutes, 36 seconds)
2025-09-16 09:56:48,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:56:59,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3796.53638 ± 1037.993
2025-09-16 09:56:59,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [2748.4314, 3045.547, 4512.7334, 4617.9194, 3921.2188, 4531.759, 4861.8345, 4307.5815, 4052.1733, 1366.1647]
2025-09-16 09:56:59,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:56:59,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 18 minutes, 8 seconds)
2025-09-16 09:58:43,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:58:53,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3447.71948 ± 1083.415
2025-09-16 09:58:53,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [4561.498, 4215.8193, 2584.2527, 4106.599, 2477.6938, 1377.348, 2701.1565, 4971.5015, 3266.6272, 4214.6978]
2025-09-16 09:58:53,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:58:53,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 16 minutes, 39 seconds)
2025-09-16 10:00:37,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:00:48,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4367.94238 ± 939.483
2025-09-16 10:00:48,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [4887.9355, 4775.932, 2078.5408, 3928.8074, 5079.3613, 4892.2134, 4917.526, 3314.238, 5211.113, 4593.7544]
2025-09-16 10:00:48,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:00:48,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 15 minutes, 21 seconds)
2025-09-16 10:02:33,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:02:44,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4474.10107 ± 732.586
2025-09-16 10:02:44,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [4399.339, 4573.2275, 4744.096, 4774.0024, 4681.4385, 4816.6196, 5167.8213, 2352.3582, 4566.448, 4665.6597]
2025-09-16 10:02:44,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:02:44,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 14 minutes, 3 seconds)
2025-09-16 10:04:25,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:04:36,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3732.71533 ± 1181.459
2025-09-16 10:04:36,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [3347.0093, 3557.054, 5121.0923, 2398.235, 4469.0366, 4139.3843, 1235.2181, 4907.966, 4944.513, 3207.6455]
2025-09-16 10:04:36,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:04:36,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 11 minutes, 42 seconds)
2025-09-16 10:06:18,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:06:29,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3407.99756 ± 1334.299
2025-09-16 10:06:29,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [4445.363, 4854.2993, 3947.8105, 4090.8442, 3163.4932, 1880.437, 5205.7803, 1595.8143, 1197.9174, 3698.2158]
2025-09-16 10:06:29,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:06:29,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 9 minutes, 19 seconds)
2025-09-16 10:08:11,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:08:22,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4001.64258 ± 1152.738
2025-09-16 10:08:22,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [3271.5266, 2962.4075, 5337.631, 5129.3403, 2376.2998, 5073.9697, 5107.334, 3491.2825, 4872.762, 2393.875]
2025-09-16 10:08:22,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:08:22,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 6 minutes, 58 seconds)
2025-09-16 10:10:04,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:10:14,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4470.64404 ± 728.379
2025-09-16 10:10:14,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [4883.1074, 4467.0117, 4735.6924, 3829.5762, 5061.486, 2760.678, 4545.0303, 5314.063, 5124.9165, 3984.8777]
2025-09-16 10:10:14,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:10:14,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 4 minutes, 34 seconds)
2025-09-16 10:11:56,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:12:07,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4970.71729 ± 690.439
2025-09-16 10:12:07,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [2922.9011, 5257.021, 5253.173, 5032.0073, 5330.1367, 5317.0576, 5227.4136, 5004.9556, 5141.598, 5220.9087]
2025-09-16 10:12:07,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:12:07,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (4970.72) for latency 15
2025-09-16 10:12:07,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 2 minutes, 4 seconds)
2025-09-16 10:13:49,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:13:59,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4382.61035 ± 926.750
2025-09-16 10:13:59,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [4953.4634, 3255.1392, 4802.857, 5210.5366, 3152.7043, 5249.367, 5140.1436, 4955.467, 4444.212, 2662.2102]
2025-09-16 10:13:59,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:13:59,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 8 seconds)
2025-09-16 10:15:41,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:15:52,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4889.36621 ± 631.032
2025-09-16 10:15:52,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5289.2773, 5158.51, 5327.0674, 4782.225, 5139.827, 5381.233, 3226.004, 5210.9443, 4307.4053, 5071.1694]
2025-09-16 10:15:52,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:15:52,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 58 minutes, 11 seconds)
2025-09-16 10:17:34,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:17:44,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5168.91895 ± 140.614
2025-09-16 10:17:44,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5225.931, 5052.5054, 5221.024, 5208.598, 5270.229, 5051.2227, 5311.4688, 5354.3857, 5135.882, 4857.9443]
2025-09-16 10:17:44,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:17:44,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5168.92) for latency 15
2025-09-16 10:17:44,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 56 minutes, 16 seconds)
2025-09-16 10:19:26,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:19:37,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5127.62109 ± 106.716
2025-09-16 10:19:37,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5071.519, 5072.633, 5245.825, 5276.894, 5230.3564, 4948.709, 5004.5845, 5089.3535, 5235.257, 5101.079]
2025-09-16 10:19:37,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:19:37,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 54 minutes, 24 seconds)
2025-09-16 10:21:19,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:21:30,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4561.65137 ± 1255.240
2025-09-16 10:21:30,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [2120.6199, 5310.251, 5193.227, 2003.7897, 5188.0605, 5243.556, 5199.8555, 4862.524, 5236.977, 5257.652]
2025-09-16 10:21:30,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:21:30,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 52 minutes, 34 seconds)
2025-09-16 10:23:11,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:23:22,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4593.75146 ± 1369.826
2025-09-16 10:23:22,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [2483.9863, 4885.206, 5367.816, 5306.969, 5251.5767, 5383.8794, 1347.0754, 5295.0054, 5236.162, 5379.8354]
2025-09-16 10:23:22,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:23:22,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 50 minutes, 39 seconds)
2025-09-16 10:25:04,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:25:15,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5187.90869 ± 659.154
2025-09-16 10:25:15,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [3215.2703, 5412.4834, 5466.414, 5412.5967, 5398.8047, 5322.809, 5500.2334, 5389.6963, 5377.4473, 5383.3286]
2025-09-16 10:25:15,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:25:15,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5187.91) for latency 15
2025-09-16 10:25:15,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 48 minutes, 47 seconds)
2025-09-16 10:26:56,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:27:07,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4428.29004 ± 1186.785
2025-09-16 10:27:07,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [2888.0178, 5271.6914, 4565.062, 5069.876, 5212.726, 5127.9624, 4802.8755, 4789.6562, 1468.3516, 5086.679]
2025-09-16 10:27:07,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:27:07,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 46 minutes, 53 seconds)
2025-09-16 10:28:48,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:28:59,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4755.96875 ± 844.483
2025-09-16 10:28:59,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5063.757, 5172.193, 2260.3516, 5041.197, 5192.209, 5068.153, 4733.902, 5073.945, 4793.9404, 5160.045]
2025-09-16 10:28:59,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:28:59,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 44 minutes, 56 seconds)
2025-09-16 10:30:41,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:30:52,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4600.24219 ± 920.514
2025-09-16 10:30:52,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [4912.4453, 3793.2576, 4857.3115, 5074.836, 5213.793, 5290.9424, 4857.5864, 5157.7783, 2113.142, 4731.3267]
2025-09-16 10:30:52,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:30:52,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 43 minutes, 3 seconds)
2025-09-16 10:32:34,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:32:45,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5254.38965 ± 115.829
2025-09-16 10:32:45,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5283.685, 4950.5073, 5341.0156, 5225.703, 5317.551, 5230.151, 5357.9116, 5184.8394, 5293.3345, 5359.2]
2025-09-16 10:32:45,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:32:45,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5254.39) for latency 15
2025-09-16 10:32:45,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 41 minutes, 15 seconds)
2025-09-16 10:34:27,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:34:38,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4910.70312 ± 1046.548
2025-09-16 10:34:38,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5397.937, 5188.9927, 4707.215, 1827.4569, 5395.2056, 5347.384, 5401.2817, 5260.494, 5210.0015, 5371.063]
2025-09-16 10:34:38,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:34:38,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 39 minutes, 25 seconds)
2025-09-16 10:36:22,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:36:33,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5297.85938 ± 101.085
2025-09-16 10:36:33,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5398.468, 5435.5337, 5221.29, 5299.108, 5376.125, 5348.542, 5320.1978, 5216.851, 5071.0464, 5291.4316]
2025-09-16 10:36:33,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:36:33,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5297.86) for latency 15
2025-09-16 10:36:33,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 38 minutes, 3 seconds)
2025-09-16 10:38:17,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:38:28,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4320.40234 ± 1384.892
2025-09-16 10:38:28,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5110.1646, 1841.0508, 5301.9917, 5283.936, 1993.4408, 5150.3447, 5068.369, 2900.5886, 5336.8223, 5217.311]
2025-09-16 10:38:28,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:38:28,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 36 minutes, 40 seconds)
2025-09-16 10:40:12,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:40:23,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5302.47559 ± 48.719
2025-09-16 10:40:23,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5342.7617, 5365.3174, 5291.754, 5333.1187, 5353.3857, 5299.8574, 5212.1406, 5246.312, 5252.0605, 5328.046]
2025-09-16 10:40:23,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:40:23,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5302.48) for latency 15
2025-09-16 10:40:23,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 35 minutes, 11 seconds)
2025-09-16 10:42:07,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:42:18,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5188.00586 ± 322.386
2025-09-16 10:42:18,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5457.127, 4631.926, 5378.92, 5386.671, 5165.8887, 5441.041, 5405.3867, 5088.888, 4538.9575, 5385.2485]
2025-09-16 10:42:18,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:42:18,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 33 minutes, 42 seconds)
2025-09-16 10:44:03,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:44:14,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5201.06738 ± 332.128
2025-09-16 10:44:14,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5075.604, 5292.688, 5403.4873, 5463.42, 5433.7246, 5328.473, 5278.605, 5229.0625, 4257.773, 5247.839]
2025-09-16 10:44:14,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:44:14,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 32 minutes, 11 seconds)
2025-09-16 10:45:56,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:46:06,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4835.90771 ± 1014.656
2025-09-16 10:46:06,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5132.2524, 5346.4595, 4981.939, 1819.9265, 5065.322, 5158.4155, 5330.92, 5276.8784, 5307.5684, 4939.3994]
2025-09-16 10:46:06,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:46:06,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 29 minutes, 52 seconds)
2025-09-16 10:47:48,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:47:59,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4834.73535 ± 969.319
2025-09-16 10:47:59,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5312.6855, 1943.2593, 5287.03, 5157.8555, 5110.274, 5070.9746, 5048.471, 5288.1733, 5132.5405, 4996.0947]
2025-09-16 10:47:59,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:47:59,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 27 minutes, 34 seconds)
2025-09-16 10:49:41,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:49:52,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4475.63232 ± 1572.888
2025-09-16 10:49:52,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5373.9717, 1385.1017, 5360.115, 5340.795, 5341.0767, 4647.282, 5397.4653, 5253.476, 5326.715, 1330.3257]
2025-09-16 10:49:52,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:49:52,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 25 minutes, 19 seconds)
2025-09-16 10:51:34,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:51:45,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5278.74951 ± 174.686
2025-09-16 10:51:45,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5175.9937, 5323.865, 5393.298, 5373.428, 5363.053, 4783.202, 5326.3926, 5332.642, 5349.9395, 5365.682]
2025-09-16 10:51:45,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:51:45,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 23 minutes, 2 seconds)
2025-09-16 10:53:26,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:53:37,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4321.25244 ± 1548.695
2025-09-16 10:53:37,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [1421.5184, 5254.8325, 5252.14, 5385.329, 5303.063, 2617.6929, 1938.7096, 5266.222, 5404.16, 5368.857]
2025-09-16 10:53:37,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:53:37,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 20 minutes, 43 seconds)
2025-09-16 10:55:19,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:55:30,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4896.50195 ± 1161.251
2025-09-16 10:55:30,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5159.353, 5319.687, 5553.842, 1525.5438, 5320.798, 4461.775, 5511.6187, 5455.6655, 5263.783, 5392.9585]
2025-09-16 10:55:30,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:55:30,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 18 minutes, 50 seconds)
2025-09-16 10:57:11,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:57:22,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5307.70410 ± 147.511
2025-09-16 10:57:22,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5354.879, 5315.6973, 5291.2056, 5358.2944, 5472.705, 5276.679, 5356.7993, 4893.894, 5396.1807, 5360.708]
2025-09-16 10:57:22,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:57:22,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5307.70) for latency 15
2025-09-16 10:57:22,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 16 minutes, 55 seconds)
2025-09-16 10:59:03,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:59:14,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5397.34668 ± 53.486
2025-09-16 10:59:14,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5304.328, 5423.6963, 5386.519, 5373.1836, 5406.612, 5383.7725, 5367.085, 5446.741, 5515.475, 5366.055]
2025-09-16 10:59:14,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:59:14,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5397.35) for latency 15
2025-09-16 10:59:14,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 15 minutes)
2025-09-16 11:00:56,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:01:07,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5467.19629 ± 53.726
2025-09-16 11:01:07,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5349.289, 5435.965, 5475.225, 5501.1133, 5516.6934, 5486.102, 5523.6777, 5479.21, 5507.973, 5396.7134]
2025-09-16 11:01:07,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:01:07,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5467.20) for latency 15
2025-09-16 11:01:07,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 13 minutes, 4 seconds)
2025-09-16 11:02:48,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:02:59,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5107.93408 ± 584.487
2025-09-16 11:02:59,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5509.467, 5021.3486, 5448.3423, 5479.371, 5484.164, 5499.0713, 5226.628, 3990.5586, 3961.8188, 5458.5693]
2025-09-16 11:02:59,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:02:59,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 11 minutes, 13 seconds)
2025-09-16 11:04:41,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:04:52,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5521.03662 ± 63.605
2025-09-16 11:04:52,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5431.045, 5479.802, 5591.889, 5587.0674, 5430.2646, 5568.8516, 5579.3633, 5456.09, 5575.914, 5510.0786]
2025-09-16 11:04:52,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:04:52,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5521.04) for latency 15
2025-09-16 11:04:52,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 9 minutes, 20 seconds)
2025-09-16 11:06:34,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:06:44,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5225.91699 ± 468.323
2025-09-16 11:06:44,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5112.496, 5099.764, 5504.008, 5283.515, 5645.5894, 5586.3315, 3995.0552, 5509.3853, 4961.5977, 5561.4277]
2025-09-16 11:06:44,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:06:44,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 7 minutes, 28 seconds)
2025-09-16 11:08:26,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:08:37,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5259.71240 ± 240.729
2025-09-16 11:08:37,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5056.821, 5449.292, 4936.0366, 5387.469, 5486.771, 4751.8057, 5309.1763, 5389.159, 5369.628, 5460.964]
2025-09-16 11:08:37,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:08:37,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 5 minutes, 38 seconds)
2025-09-16 11:10:19,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:10:30,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4972.15527 ± 1247.656
2025-09-16 11:10:30,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5377.252, 5576.467, 1357.3071, 5278.846, 5609.83, 5500.8823, 5557.016, 4448.7964, 5522.7593, 5492.396]
2025-09-16 11:10:30,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:10:30,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 3 minutes, 48 seconds)
2025-09-16 11:12:12,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:12:22,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5466.41309 ± 172.332
2025-09-16 11:12:22,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5514.5234, 5516.9775, 5348.615, 5477.766, 5601.92, 5001.612, 5595.532, 5581.8657, 5581.314, 5444.0024]
2025-09-16 11:12:22,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:12:22,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 1 minute, 56 seconds)
2025-09-16 11:14:05,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:14:15,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5497.89404 ± 140.631
2025-09-16 11:14:15,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5541.763, 5569.555, 5446.7095, 5504.115, 5561.2744, 5618.2134, 5101.438, 5485.0317, 5585.3164, 5565.523]
2025-09-16 11:14:15,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:14:15,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 7 seconds)
2025-09-16 11:15:58,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:16:08,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5412.99219 ± 231.476
2025-09-16 11:16:08,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5392.84, 5354.071, 5570.5894, 4806.014, 5564.5093, 5620.6436, 5541.623, 5618.903, 5330.9224, 5329.8096]
2025-09-16 11:16:08,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:16:08,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 58 minutes, 17 seconds)
2025-09-16 11:17:50,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:18:01,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5436.87695 ± 165.041
2025-09-16 11:18:01,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5688.1934, 5237.7554, 5512.1763, 5477.1016, 5576.359, 5138.533, 5483.913, 5502.8774, 5234.8145, 5517.0435]
2025-09-16 11:18:01,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:18:01,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 56 minutes, 21 seconds)
2025-09-16 11:19:42,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:19:53,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5032.30371 ± 1245.976
2025-09-16 11:19:53,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5627.1304, 5576.3257, 5584.9775, 4879.989, 1347.4783, 5567.4834, 5367.4556, 5528.6997, 5521.0, 5322.501]
2025-09-16 11:19:53,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:19:53,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 54 minutes, 27 seconds)
2025-09-16 11:21:35,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:21:46,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4966.10205 ± 1214.184
2025-09-16 11:21:46,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5414.0386, 5348.401, 5316.6533, 5284.95, 5496.341, 5342.126, 1327.2562, 5360.4727, 5395.3223, 5375.4556]
2025-09-16 11:21:46,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:21:46,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 52 minutes, 34 seconds)
2025-09-16 11:23:28,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:23:39,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5174.15234 ± 992.677
2025-09-16 11:23:39,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5462.153, 5483.301, 5549.352, 5494.9014, 5530.152, 5430.245, 5453.3467, 5553.6772, 2199.4517, 5584.947]
2025-09-16 11:23:39,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:23:39,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 50 minutes, 41 seconds)
2025-09-16 11:25:20,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:25:31,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4809.22900 ± 1296.316
2025-09-16 11:25:31,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5504.247, 1322.6329, 4703.95, 5508.418, 5596.728, 5509.5312, 5096.751, 5628.5693, 3682.931, 5538.533]
2025-09-16 11:25:31,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:25:31,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 48 minutes, 46 seconds)
2025-09-16 11:27:12,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:27:23,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4927.14111 ± 1184.129
2025-09-16 11:27:23,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5317.2764, 5306.298, 1375.7369, 5349.232, 5364.6245, 5301.0957, 5264.968, 5342.456, 5344.835, 5304.8877]
2025-09-16 11:27:23,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:27:23,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 46 minutes, 52 seconds)
2025-09-16 11:29:04,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:29:15,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5303.09863 ± 551.519
2025-09-16 11:29:15,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5448.4575, 3690.9895, 5517.6294, 5471.117, 5525.1377, 5530.547, 5580.916, 5132.645, 5542.924, 5590.625]
2025-09-16 11:29:15,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:29:15,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 44 minutes, 58 seconds)
2025-09-16 11:30:56,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:31:07,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5533.38184 ± 136.964
2025-09-16 11:31:07,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5164.743, 5634.9326, 5566.218, 5585.009, 5593.708, 5576.0176, 5685.023, 5574.463, 5453.2866, 5500.4175]
2025-09-16 11:31:07,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:31:07,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5533.38) for latency 15
2025-09-16 11:31:07,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 43 minutes, 1 second)
2025-09-16 11:32:48,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:32:59,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4836.46045 ± 1301.094
2025-09-16 11:32:59,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5331.6743, 5589.802, 4814.978, 3013.8806, 5605.3354, 1679.3293, 5599.4517, 5553.556, 5525.5366, 5651.0625]
2025-09-16 11:32:59,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:32:59,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 41 minutes, 6 seconds)
2025-09-16 11:34:40,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:34:51,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5533.31836 ± 72.908
2025-09-16 11:34:51,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5590.4087, 5603.1, 5371.512, 5469.9385, 5486.289, 5506.4727, 5586.8994, 5623.17, 5531.2856, 5564.1045]
2025-09-16 11:34:51,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:34:51,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 39 minutes, 11 seconds)
2025-09-16 11:36:32,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:36:43,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5451.86230 ± 66.731
2025-09-16 11:36:43,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5270.8804, 5473.143, 5436.9272, 5486.381, 5489.194, 5537.0034, 5434.131, 5481.417, 5449.259, 5460.2886]
2025-09-16 11:36:43,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:36:43,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 37 minutes, 19 seconds)
2025-09-16 11:38:24,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:38:35,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5086.29199 ± 1238.163
2025-09-16 11:38:35,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [1414.255, 5334.881, 5561.06, 5683.1797, 5603.396, 5565.285, 4995.6724, 5586.5576, 5572.242, 5546.3896]
2025-09-16 11:38:35,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:38:35,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 35 minutes, 28 seconds)
2025-09-16 11:40:16,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:40:27,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5454.88916 ± 196.883
2025-09-16 11:40:27,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5579.7217, 5449.4, 5376.566, 5632.5063, 5472.0283, 5409.8257, 5540.178, 5550.768, 4917.9546, 5619.945]
2025-09-16 11:40:27,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:40:27,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 33 minutes, 37 seconds)
2025-09-16 11:42:09,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:42:20,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5519.49414 ± 63.102
2025-09-16 11:42:20,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5540.9897, 5608.807, 5620.7524, 5420.712, 5521.6255, 5457.9927, 5488.029, 5573.1396, 5501.676, 5461.2197]
2025-09-16 11:42:20,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:42:20,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 31 minutes, 45 seconds)
2025-09-16 11:44:01,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:44:12,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5430.31348 ± 130.118
2025-09-16 11:44:12,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5265.4165, 5544.7607, 5251.66, 5431.364, 5620.068, 5559.1265, 5335.875, 5425.355, 5301.4937, 5568.016]
2025-09-16 11:44:12,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:44:12,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 29 minutes, 53 seconds)
2025-09-16 11:45:52,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:46:03,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4695.27100 ± 1630.507
2025-09-16 11:46:03,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5512.9287, 5616.022, 5283.764, 1497.4651, 5601.699, 1383.0106, 5522.8496, 5496.1323, 5618.208, 5420.631]
2025-09-16 11:46:03,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:46:03,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 27 minutes, 58 seconds)
2025-09-16 11:47:43,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:47:53,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5374.97607 ± 163.005
2025-09-16 11:47:53,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5229.091, 5493.1865, 5442.345, 5478.8965, 5501.2305, 5476.481, 5007.4937, 5538.986, 5371.12, 5210.9307]
2025-09-16 11:47:53,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:47:53,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 26 minutes, 2 seconds)
2025-09-16 11:49:33,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:49:43,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5401.08398 ± 450.631
2025-09-16 11:49:43,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5570.6226, 5393.0107, 5565.4937, 5586.3223, 5542.1094, 5591.6777, 4059.484, 5589.5747, 5537.1846, 5575.361]
2025-09-16 11:49:43,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:49:43,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 24 minutes, 5 seconds)
2025-09-16 11:51:23,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:51:33,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5405.36719 ± 413.731
2025-09-16 11:51:33,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5555.0957, 5553.996, 5642.954, 5594.5894, 5687.763, 4304.445, 4991.5103, 5558.5254, 5695.929, 5468.8667]
2025-09-16 11:51:33,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:51:33,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 22 minutes, 8 seconds)
2025-09-16 11:53:12,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:53:23,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5013.78271 ± 1183.861
2025-09-16 11:53:23,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5551.66, 5513.1396, 5454.4473, 5537.1104, 5093.631, 5544.1157, 5436.152, 1498.9595, 5063.4844, 5445.13]
2025-09-16 11:53:23,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:53:23,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 20 minutes, 12 seconds)
2025-09-16 11:55:02,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:55:13,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5488.12402 ± 85.732
2025-09-16 11:55:13,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5536.5625, 5417.6313, 5457.605, 5499.301, 5540.604, 5531.505, 5470.468, 5594.1646, 5555.8584, 5277.536]
2025-09-16 11:55:13,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:55:13,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 18 minutes, 20 seconds)
2025-09-16 11:56:52,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:57:02,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5182.79590 ± 1040.569
2025-09-16 11:57:02,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5615.225, 5521.602, 2067.6702, 5525.7427, 5614.3994, 5486.1104, 5424.4814, 5503.7305, 5441.347, 5627.6504]
2025-09-16 11:57:02,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:57:02,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 16 minutes, 28 seconds)
2025-09-16 11:58:41,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:58:52,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5389.99658 ± 36.621
2025-09-16 11:58:52,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5406.0723, 5465.5913, 5343.6245, 5371.5444, 5392.028, 5379.847, 5437.7217, 5394.802, 5353.8027, 5354.9336]
2025-09-16 11:58:52,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:58:52,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 14 minutes, 37 seconds)
2025-09-16 12:00:31,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:00:41,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5268.27148 ± 656.980
2025-09-16 12:00:41,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [3797.8005, 5637.9185, 5507.673, 4135.57, 5596.9097, 5577.43, 5601.4536, 5576.609, 5548.754, 5702.596]
2025-09-16 12:00:41,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:00:41,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 47 seconds)
2025-09-16 12:02:20,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:02:31,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5452.23730 ± 497.936
2025-09-16 12:02:31,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5579.408, 5655.714, 5609.1, 5549.451, 5625.242, 5578.676, 5680.8755, 3963.7183, 5680.7505, 5599.4385]
2025-09-16 12:02:31,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:02:31,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 57 seconds)
2025-09-16 12:04:09,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:04:20,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5501.64062 ± 98.129
2025-09-16 12:04:20,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5574.748, 5549.761, 5617.472, 5556.369, 5484.112, 5597.531, 5301.4062, 5429.5854, 5532.225, 5373.1963]
2025-09-16 12:04:20,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:04:20,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 7 seconds)
2025-09-16 12:05:58,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:06:09,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5488.68066 ± 304.538
2025-09-16 12:06:09,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5627.184, 5385.5415, 5771.4946, 5555.1597, 5556.0347, 5648.4795, 5704.831, 5576.408, 4637.469, 5424.204]
2025-09-16 12:06:09,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:06:09,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 17 seconds)
2025-09-16 12:07:47,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:07:58,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4855.44189 ± 1366.307
2025-09-16 12:07:58,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5705.9785, 5672.572, 1675.7827, 5592.278, 2692.7012, 5172.3745, 5588.511, 5201.9062, 5612.8735, 5639.436]
2025-09-16 12:07:58,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:07:58,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 27 seconds)
2025-09-16 12:09:36,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:09:47,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5072.83252 ± 1096.760
2025-09-16 12:09:47,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5453.595, 5430.207, 5428.839, 5515.812, 1787.5586, 5427.7466, 5282.0034, 5445.7275, 5512.8906, 5443.945]
2025-09-16 12:09:47,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:09:47,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 38 seconds)
2025-09-16 12:11:25,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:11:36,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5269.24268 ± 753.976
2025-09-16 12:11:36,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5485.852, 5505.845, 5474.478, 5459.1167, 5583.611, 5539.5054, 5581.029, 5479.377, 3011.235, 5572.375]
2025-09-16 12:11:36,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:11:36,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 49 seconds)
2025-09-16 12:13:14,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:13:25,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5544.09082 ± 78.044
2025-09-16 12:13:25,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [5614.9375, 5498.1265, 5577.639, 5509.8394, 5553.2676, 5597.5938, 5579.7334, 5335.766, 5577.5425, 5596.4585]
2025-09-16 12:13:25,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:13:25,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5544.09) for latency 15
2025-09-16 12:13:25,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1251 [DEBUG]: Training session finished
