2025-08-07 06:07:00,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc25-halfcheetah/ExtremeClogL1U23-bpql-mem24
2025-08-07 06:07:00,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc25-halfcheetah/ExtremeClogL1U23-bpql-mem24
2025-08-07 06:07:00,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14da7636ad90>}
2025-08-07 06:07:00,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1111 [DEBUG]: using device: cuda
2025-08-07 06:07:00,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1133 [INFO]: Creating new trainer
2025-08-07 06:07:00,469 baseline-bpql-noiseperc25-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=161, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 06:07:00,469 baseline-bpql-noiseperc25-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 06:07:01,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1194 [DEBUG]: Starting training session...
2025-08-07 06:07:01,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 1/100
2025-08-07 06:08:37,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:08:49,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -313.70825 ± 48.769
2025-08-07 06:08:49,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-311.93613, -279.6512, -256.09662, -367.3647, -279.18137, -358.6126, -231.9814, -367.62805, -373.86438, -310.7662]
2025-08-07 06:08:49,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:08:49,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (-313.71) for latency ExtremeClogL1U23
2025-08-07 06:08:49,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 58 minutes, 49 seconds)
2025-08-07 06:10:30,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:10:44,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -301.23395 ± 39.572
2025-08-07 06:10:44,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-240.2316, -303.91275, -365.889, -318.21255, -309.42847, -281.62662, -277.1466, -286.70056, -260.4571, -368.73453]
2025-08-07 06:10:44,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:10:44,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (-301.23) for latency ExtremeClogL1U23
2025-08-07 06:10:44,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 1 minute, 52 seconds)
2025-08-07 06:12:26,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:12:38,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -208.12076 ± 100.519
2025-08-07 06:12:38,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-233.17378, -196.10556, -202.3934, -167.30408, -243.57843, -359.92798, -323.81186, -274.21866, -50.771507, -29.922392]
2025-08-07 06:12:38,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:12:38,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (-208.12) for latency ExtremeClogL1U23
2025-08-07 06:12:38,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 1 minute, 31 seconds)
2025-08-07 06:14:20,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:14:32,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -161.19583 ± 103.545
2025-08-07 06:14:32,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-331.67004, -129.7961, -193.87852, -18.449972, -266.36603, -310.6897, -70.30686, -126.81716, -69.257416, -94.726555]
2025-08-07 06:14:32,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:14:32,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (-161.20) for latency ExtremeClogL1U23
2025-08-07 06:14:32,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 31 seconds)
2025-08-07 06:16:18,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:16:30,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -211.77646 ± 58.573
2025-08-07 06:16:30,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-95.03479, -183.17915, -297.1787, -239.9021, -193.32964, -151.3341, -220.15674, -248.66245, -290.38266, -198.60432]
2025-08-07 06:16:30,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:16:30,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 9 seconds)
2025-08-07 06:18:15,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:18:27,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -171.00113 ± 110.217
2025-08-07 06:18:27,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-238.3643, -221.27689, -282.3878, 24.64548, 41.09939, -208.12132, -249.31256, -170.58751, -135.02782, -270.6781]
2025-08-07 06:18:27,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:18:27,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 1 minute, 2 seconds)
2025-08-07 06:20:10,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:20:24,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -139.04654 ± 152.307
2025-08-07 06:20:24,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-176.39154, 22.520197, -72.94974, 53.364655, -5.062846, -139.41559, -137.40652, -472.75876, -139.55846, -322.80685]
2025-08-07 06:20:24,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:20:24,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (-139.05) for latency ExtremeClogL1U23
2025-08-07 06:20:24,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 59 minutes, 52 seconds)
2025-08-07 06:22:05,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:22:17,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -115.11940 ± 115.946
2025-08-07 06:22:17,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-101.6196, -99.18833, -121.36827, -11.383093, -425.32938, -117.25381, -8.760439, 2.1891751, -110.17926, -158.30096]
2025-08-07 06:22:17,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:22:17,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (-115.12) for latency ExtremeClogL1U23
2025-08-07 06:22:17,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 57 minutes, 35 seconds)
2025-08-07 06:23:57,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:24:10,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -99.02593 ± 171.125
2025-08-07 06:24:10,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [69.35323, -134.33563, -438.67264, 141.0002, 5.753208, -45.50919, 5.0767446, -308.1055, -47.260452, -237.55934]
2025-08-07 06:24:10,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:24:10,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (-99.03) for latency ExtremeClogL1U23
2025-08-07 06:24:10,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 55 minutes, 10 seconds)
2025-08-07 06:25:54,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:26:06,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 172.66829 ± 128.765
2025-08-07 06:26:06,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [30.222359, 18.31166, 300.74246, 109.33427, 367.3953, 181.1244, 260.51068, 51.55614, 64.158775, 343.32672]
2025-08-07 06:26:06,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:26:06,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (172.67) for latency ExtremeClogL1U23
2025-08-07 06:26:06,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 52 minutes, 47 seconds)
2025-08-07 06:27:49,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:28:01,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 124.40580 ± 210.498
2025-08-07 06:28:01,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [116.585045, -45.79345, 350.49896, -345.5721, 69.0469, 365.90445, 266.60208, 351.25705, 65.38453, 50.14445]
2025-08-07 06:28:01,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:28:01,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 50 minutes, 15 seconds)
2025-08-07 06:29:45,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:29:59,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 143.94019 ± 161.756
2025-08-07 06:29:59,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-15.314199, 296.48608, 315.74338, 96.193756, 176.034, 145.97298, 462.82556, -34.69749, -46.849712, 43.00757]
2025-08-07 06:29:59,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:29:59,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 48 minutes, 38 seconds)
2025-08-07 06:31:42,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:31:56,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 154.69769 ± 245.359
2025-08-07 06:31:56,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-451.90704, 311.7121, 235.49776, 160.3835, 255.67845, 423.3852, 156.63582, 424.14944, 83.54374, -52.101887]
2025-08-07 06:31:56,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:31:56,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 47 minutes, 49 seconds)
2025-08-07 06:33:36,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:33:49,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 95.31455 ± 245.237
2025-08-07 06:33:49,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [308.55127, -16.92569, -320.96713, -0.2469569, 313.0333, 261.64975, 274.74026, 101.77209, 362.87292, -331.33438]
2025-08-07 06:33:49,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:33:49,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 45 minutes, 56 seconds)
2025-08-07 06:35:30,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:35:42,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 281.77081 ± 227.714
2025-08-07 06:35:42,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [248.32841, 17.12725, -121.263016, 323.87418, 517.6152, 471.87225, 496.8689, -15.334647, 499.4648, 379.1547]
2025-08-07 06:35:42,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:35:42,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (281.77) for latency ExtremeClogL1U23
2025-08-07 06:35:42,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 43 minutes, 14 seconds)
2025-08-07 06:37:28,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:37:41,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 350.90521 ± 271.312
2025-08-07 06:37:41,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [552.5501, -345.2141, 453.06573, 82.532715, 340.40503, 319.69458, 472.72266, 553.3583, 539.54083, 540.3963]
2025-08-07 06:37:41,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:37:41,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (350.91) for latency ExtremeClogL1U23
2025-08-07 06:37:41,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 42 minutes, 29 seconds)
2025-08-07 06:39:23,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:39:35,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 428.94913 ± 192.539
2025-08-07 06:39:35,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-129.23232, 495.8261, 467.79297, 423.98398, 591.0007, 534.1664, 493.05237, 459.57263, 420.94443, 532.3843]
2025-08-07 06:39:35,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:39:35,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (428.95) for latency ExtremeClogL1U23
2025-08-07 06:39:35,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 39 minutes, 27 seconds)
2025-08-07 06:41:18,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:41:31,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 388.29379 ± 211.372
2025-08-07 06:41:31,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [135.0245, 469.04492, 493.21494, 235.62025, -94.6658, 548.90247, 575.2548, 520.26184, 448.3474, 551.9326]
2025-08-07 06:41:31,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:41:31,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 37 minutes, 10 seconds)
2025-08-07 06:43:16,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:43:28,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 477.27142 ± 107.011
2025-08-07 06:43:28,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [555.0773, 383.4152, 539.0905, 571.43097, 481.28955, 370.49307, 365.60855, 597.10706, 300.18448, 609.01746]
2025-08-07 06:43:28,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:43:28,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (477.27) for latency ExtremeClogL1U23
2025-08-07 06:43:28,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 36 minutes, 30 seconds)
2025-08-07 06:45:14,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:45:26,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 310.17133 ± 285.388
2025-08-07 06:45:26,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [59.820152, -87.56468, 624.23425, -129.59816, 110.186745, 438.63864, 398.00348, 466.7072, 737.6821, 483.6035]
2025-08-07 06:45:26,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:45:26,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 35 minutes, 45 seconds)
2025-08-07 06:47:12,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:47:24,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 327.91144 ± 186.331
2025-08-07 06:47:24,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [53.77223, 497.00375, 260.6867, 523.6473, 365.04834, 374.54883, 482.33246, 217.13173, 525.93384, -20.990747]
2025-08-07 06:47:24,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:47:24,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 33 minutes, 25 seconds)
2025-08-07 06:49:09,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:49:21,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 370.09909 ± 166.840
2025-08-07 06:49:21,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [643.1741, 287.58774, 226.03572, 585.96136, 302.02686, 112.936325, 516.9947, 416.73788, 189.50507, 420.03134]
2025-08-07 06:49:21,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:49:21,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 32 minutes, 20 seconds)
2025-08-07 06:51:06,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:51:18,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 232.50200 ± 237.161
2025-08-07 06:51:18,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [372.52164, -271.71057, 225.51587, 434.84982, 341.022, 113.63434, 536.23413, 44.757923, 49.366974, 478.82788]
2025-08-07 06:51:18,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:51:18,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 30 minutes, 50 seconds)
2025-08-07 06:53:03,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:53:15,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 401.56375 ± 181.924
2025-08-07 06:53:15,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [76.7312, 236.805, 265.92453, 534.84467, 685.73334, 567.3464, 395.03674, 230.79906, 493.08844, 529.3282]
2025-08-07 06:53:15,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:53:15,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 28 minutes, 42 seconds)
2025-08-07 06:54:57,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:55:09,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 323.67749 ± 230.929
2025-08-07 06:55:09,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [135.53415, 548.66614, -194.107, 517.3881, 544.8121, 358.62292, 109.1603, 278.39975, 492.8774, 445.4209]
2025-08-07 06:55:09,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:55:09,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 25 minutes, 38 seconds)
2025-08-07 06:56:48,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:57:00,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 481.34164 ± 132.026
2025-08-07 06:57:00,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [653.9371, 388.00818, 353.4901, 414.40475, 522.16864, 528.28906, 681.945, 624.8237, 334.343, 312.0068]
2025-08-07 06:57:00,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:57:00,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (481.34) for latency ExtremeClogL1U23
2025-08-07 06:57:00,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 22 minutes, 8 seconds)
2025-08-07 06:58:43,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:58:56,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 276.14465 ± 278.107
2025-08-07 06:58:56,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-82.31227, 403.02173, 568.7934, 600.62134, 51.7367, -194.78796, 279.78476, 513.0004, 79.20575, 542.3825]
2025-08-07 06:58:56,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:58:56,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 19 minutes, 44 seconds)
2025-08-07 07:00:37,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:00:49,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 340.76315 ± 208.690
2025-08-07 07:00:49,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [111.783585, 364.79663, 571.4912, 619.51587, 334.90726, 398.26044, -48.595566, 103.98104, 521.14557, 430.34546]
2025-08-07 07:00:49,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:00:49,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 17 minutes, 1 second)
2025-08-07 07:02:30,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:02:42,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 296.02985 ± 214.769
2025-08-07 07:02:42,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [489.7694, 421.05035, 538.9065, 355.52023, -5.5619254, 90.497505, 159.73991, 476.43436, 489.68207, -55.73993]
2025-08-07 07:02:42,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:02:42,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 14 minutes, 2 seconds)
2025-08-07 07:04:23,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:04:36,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 397.94318 ± 183.900
2025-08-07 07:04:36,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [84.63663, 330.14536, 456.17844, 163.61725, 577.1508, 483.86786, 511.27844, 616.74963, 182.82443, 572.9828]
2025-08-07 07:04:36,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:04:36,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 12 minutes, 17 seconds)
2025-08-07 07:06:17,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:06:29,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 390.27734 ± 133.109
2025-08-07 07:06:29,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [390.93484, 462.52023, 184.48817, 426.43704, 404.12085, 440.53482, 610.821, 292.27426, 167.41647, 523.2257]
2025-08-07 07:06:29,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:06:29,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 10 minutes, 49 seconds)
2025-08-07 07:08:12,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:08:24,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 429.65033 ± 111.523
2025-08-07 07:08:24,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [463.98178, 467.8872, 236.81667, 445.56293, 327.55298, 597.88007, 511.4758, 509.9423, 478.95102, 256.45288]
2025-08-07 07:08:24,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:08:24,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 8 minutes, 57 seconds)
2025-08-07 07:10:05,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:10:17,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 482.54630 ± 136.001
2025-08-07 07:10:17,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [588.87366, 521.4173, 305.20877, 268.11774, 568.39514, 411.29025, 520.7108, 341.96796, 700.4586, 599.0222]
2025-08-07 07:10:17,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:10:17,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (482.55) for latency ExtremeClogL1U23
2025-08-07 07:10:17,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 6 minutes, 42 seconds)
2025-08-07 07:11:58,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:12:11,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 491.91293 ± 98.135
2025-08-07 07:12:11,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [575.5286, 267.41055, 446.19046, 558.5351, 481.85886, 523.49225, 396.55966, 491.5296, 629.85095, 548.1734]
2025-08-07 07:12:11,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:12:11,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (491.91) for latency ExtremeClogL1U23
2025-08-07 07:12:11,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 5 minutes, 20 seconds)
2025-08-07 07:13:51,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:14:03,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 434.88458 ± 125.828
2025-08-07 07:14:03,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [506.89612, 224.97908, 257.79614, 338.90097, 495.95724, 604.3775, 389.56543, 608.71246, 495.71243, 425.94873]
2025-08-07 07:14:03,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:14:03,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 2 minutes, 56 seconds)
2025-08-07 07:15:44,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:15:57,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 455.48175 ± 124.948
2025-08-07 07:15:57,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [591.5403, 567.979, 539.3842, 237.99245, 389.2045, 325.5823, 355.6898, 507.8138, 638.9185, 400.71237]
2025-08-07 07:15:57,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:15:57,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 1 minute, 12 seconds)
2025-08-07 07:17:39,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:17:51,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 498.05695 ± 124.852
2025-08-07 07:17:51,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [650.1582, 373.54553, 382.80927, 533.4772, 481.16794, 709.9794, 587.3864, 290.4392, 430.46747, 541.13855]
2025-08-07 07:17:51,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:17:51,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (498.06) for latency ExtremeClogL1U23
2025-08-07 07:17:51,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 59 minutes)
2025-08-07 07:19:32,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:19:44,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 612.25946 ± 72.932
2025-08-07 07:19:44,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [537.96704, 603.0562, 570.0622, 620.99335, 561.57806, 599.2795, 726.4189, 772.12024, 570.19434, 560.92523]
2025-08-07 07:19:44,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:19:44,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (612.26) for latency ExtremeClogL1U23
2025-08-07 07:19:44,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 57 minutes, 11 seconds)
2025-08-07 07:21:22,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:21:34,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 633.78497 ± 178.947
2025-08-07 07:21:34,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [988.63586, 385.53384, 386.35687, 523.14276, 738.2167, 748.499, 658.7313, 546.8009, 787.03973, 574.8927]
2025-08-07 07:21:34,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:21:34,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (633.78) for latency ExtremeClogL1U23
2025-08-07 07:21:34,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 54 minutes, 20 seconds)
2025-08-07 07:23:16,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:23:28,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 596.13062 ± 118.285
2025-08-07 07:23:28,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [574.5588, 430.32993, 698.8278, 781.45325, 388.51785, 739.31036, 601.42804, 544.4011, 620.0476, 582.4311]
2025-08-07 07:23:28,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:23:28,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 52 minutes, 56 seconds)
2025-08-07 07:25:07,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:25:20,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 549.57184 ± 119.117
2025-08-07 07:25:20,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [668.0836, 503.587, 320.42084, 565.63824, 603.3143, 524.54504, 742.6507, 396.7472, 529.4741, 641.25684]
2025-08-07 07:25:20,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:25:20,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 50 minutes, 34 seconds)
2025-08-07 07:27:01,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:27:13,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 597.48267 ± 79.149
2025-08-07 07:27:13,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [539.80505, 655.3801, 761.3831, 589.1068, 472.4695, 593.6437, 521.2314, 608.2572, 674.7229, 558.82684]
2025-08-07 07:27:13,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:27:13,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 48 minutes, 40 seconds)
2025-08-07 07:28:55,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:29:07,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 705.69531 ± 115.194
2025-08-07 07:29:07,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [882.8301, 815.94135, 633.0756, 616.4729, 788.27625, 585.24207, 803.63763, 623.5702, 525.6732, 782.23364]
2025-08-07 07:29:07,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:29:07,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (705.70) for latency ExtremeClogL1U23
2025-08-07 07:29:07,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 46 minutes, 56 seconds)
2025-08-07 07:30:45,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:30:57,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 615.37958 ± 101.318
2025-08-07 07:30:57,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [470.83365, 578.12103, 717.3422, 606.7501, 659.3227, 638.18805, 618.25446, 460.15637, 821.1498, 583.677]
2025-08-07 07:30:57,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:30:58,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 45 minutes, 13 seconds)
2025-08-07 07:32:35,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:32:48,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 658.19366 ± 129.569
2025-08-07 07:32:48,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [450.73495, 581.9929, 665.2311, 849.8693, 591.325, 656.4486, 665.6207, 611.7662, 923.3248, 585.6228]
2025-08-07 07:32:48,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:32:48,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 42 minutes, 38 seconds)
2025-08-07 07:34:25,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:34:38,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 543.34070 ± 148.364
2025-08-07 07:34:38,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [522.32043, 683.08405, 699.5689, 610.7445, 411.2451, 470.5704, 296.66867, 788.43835, 379.73785, 571.02826]
2025-08-07 07:34:38,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:34:38,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 40 minutes, 26 seconds)
2025-08-07 07:36:17,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:36:29,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 709.68286 ± 96.920
2025-08-07 07:36:29,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [701.4407, 675.7281, 690.54156, 761.6247, 757.6818, 741.0971, 551.6375, 587.3044, 925.04346, 704.7295]
2025-08-07 07:36:29,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:36:29,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (709.68) for latency ExtremeClogL1U23
2025-08-07 07:36:29,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 38 minutes, 14 seconds)
2025-08-07 07:38:08,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:38:20,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 637.99438 ± 126.863
2025-08-07 07:38:20,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [580.5062, 650.6394, 499.00067, 572.7074, 626.77423, 689.71674, 697.90656, 903.16583, 421.00616, 738.5203]
2025-08-07 07:38:20,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:38:20,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 35 minutes, 55 seconds)
2025-08-07 07:39:59,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:40:12,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 541.46265 ± 125.633
2025-08-07 07:40:12,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [535.0259, 769.33, 612.91833, 350.9948, 600.6355, 361.48782, 462.37357, 687.00433, 518.6275, 516.2286]
2025-08-07 07:40:12,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:40:12,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 34 minutes, 13 seconds)
2025-08-07 07:41:55,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:42:08,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 669.88300 ± 132.595
2025-08-07 07:42:08,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [859.8548, 629.1645, 788.49225, 807.1274, 594.2095, 426.36963, 624.93976, 507.10913, 776.6425, 684.92096]
2025-08-07 07:42:08,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:42:08,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 33 minutes, 25 seconds)
2025-08-07 07:43:50,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:44:02,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 554.37976 ± 73.579
2025-08-07 07:44:02,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [419.5653, 611.057, 508.9667, 692.45197, 538.525, 633.07404, 541.07764, 576.20624, 490.95776, 531.91583]
2025-08-07 07:44:02,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:44:02,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 32 minutes, 14 seconds)
2025-08-07 07:45:44,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:45:56,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 592.56653 ± 104.041
2025-08-07 07:45:56,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [716.7612, 590.40265, 638.97675, 664.1745, 546.3139, 670.0982, 446.19284, 547.80945, 711.74817, 393.18744]
2025-08-07 07:45:56,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:45:56,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 30 minutes, 39 seconds)
2025-08-07 07:47:39,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:47:53,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 644.18494 ± 175.318
2025-08-07 07:47:53,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [565.85364, 765.39154, 810.9755, 731.7008, 680.91693, 462.85294, 423.70993, 602.87115, 415.4621, 982.1148]
2025-08-07 07:47:53,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:47:53,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 29 minutes, 45 seconds)
2025-08-07 07:49:33,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:49:46,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 633.71552 ± 132.929
2025-08-07 07:49:46,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [737.01263, 840.63306, 494.3804, 607.6644, 355.0128, 719.7134, 539.2876, 713.97687, 645.1373, 684.3367]
2025-08-07 07:49:46,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:49:46,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 28 minutes)
2025-08-07 07:51:26,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:51:40,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 682.02356 ± 179.140
2025-08-07 07:51:40,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [785.5065, 495.10004, 638.8287, 1139.2386, 540.62604, 682.00726, 482.45956, 636.45874, 688.91626, 731.0944]
2025-08-07 07:51:40,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:51:40,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 25 minutes, 47 seconds)
2025-08-07 07:53:21,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:53:34,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 617.03076 ± 105.263
2025-08-07 07:53:34,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [616.93085, 738.01855, 591.5101, 860.7918, 601.61926, 605.09644, 589.11676, 469.59433, 592.91656, 504.7131]
2025-08-07 07:53:34,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:53:34,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 23 minutes, 48 seconds)
2025-08-07 07:55:14,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:55:28,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 798.61993 ± 79.174
2025-08-07 07:55:28,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [777.14307, 841.2071, 827.6716, 756.6817, 876.80884, 918.1199, 692.402, 853.1105, 646.44305, 796.612]
2025-08-07 07:55:28,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:55:28,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (798.62) for latency ExtremeClogL1U23
2025-08-07 07:55:28,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 22 minutes, 2 seconds)
2025-08-07 07:57:10,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:57:22,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 740.08099 ± 140.199
2025-08-07 07:57:22,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [763.74585, 978.8306, 926.0759, 662.14276, 595.95825, 689.20325, 558.07227, 822.034, 829.41626, 575.3315]
2025-08-07 07:57:22,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:57:22,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 19 minutes, 43 seconds)
2025-08-07 07:59:04,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:59:17,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 755.91522 ± 179.453
2025-08-07 07:59:17,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [560.1263, 994.93695, 846.5749, 654.5984, 492.64462, 823.4817, 621.6039, 658.0684, 834.25226, 1072.8647]
2025-08-07 07:59:17,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:59:17,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 18 minutes, 1 second)
2025-08-07 08:00:57,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:01:09,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 691.09174 ± 113.333
2025-08-07 08:01:09,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [793.5472, 687.0964, 853.54694, 810.80286, 578.79724, 628.57275, 664.4425, 541.25635, 812.6222, 540.233]
2025-08-07 08:01:09,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:01:09,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 15 minutes, 53 seconds)
2025-08-07 08:02:51,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:03:04,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 817.16754 ± 158.183
2025-08-07 08:03:04,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [839.9203, 625.12415, 973.7459, 1081.0013, 819.8648, 794.12, 983.732, 825.2789, 678.2903, 550.59814]
2025-08-07 08:03:04,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:03:04,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (817.17) for latency ExtremeClogL1U23
2025-08-07 08:03:04,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 14 minutes, 6 seconds)
2025-08-07 08:04:44,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:04:58,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 771.61536 ± 139.538
2025-08-07 08:04:58,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [715.4219, 780.8668, 655.7173, 960.6802, 943.34235, 597.5972, 731.2292, 895.1979, 891.07336, 545.028]
2025-08-07 08:04:58,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:04:58,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 12 minutes, 12 seconds)
2025-08-07 08:06:42,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:06:55,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 922.55798 ± 166.296
2025-08-07 08:06:55,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1263.0392, 731.3203, 837.2971, 984.6589, 861.57196, 1081.6777, 898.3981, 978.0879, 948.95776, 640.5708]
2025-08-07 08:06:55,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:06:55,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (922.56) for latency ExtremeClogL1U23
2025-08-07 08:06:55,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 10 minutes, 34 seconds)
2025-08-07 08:08:36,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:08:50,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 782.49158 ± 158.452
2025-08-07 08:08:50,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [798.82965, 894.21924, 639.45233, 518.8435, 946.7169, 949.4256, 631.8426, 790.14624, 645.14276, 1010.2969]
2025-08-07 08:08:50,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:08:50,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 8 minutes, 46 seconds)
2025-08-07 08:10:30,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:10:43,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 851.32581 ± 221.168
2025-08-07 08:10:43,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1125.5035, 1005.6204, 819.04486, 602.7876, 747.7791, 1034.6523, 529.1339, 555.8806, 1135.3191, 957.5366]
2025-08-07 08:10:43,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:10:43,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 6 minutes, 54 seconds)
2025-08-07 08:12:25,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:12:39,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 855.90540 ± 163.653
2025-08-07 08:12:39,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1064.6938, 996.58215, 973.111, 570.0631, 610.23535, 900.8199, 1007.9996, 788.54144, 915.3049, 731.7027]
2025-08-07 08:12:39,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:12:40,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 5 minutes, 14 seconds)
2025-08-07 08:14:20,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:14:32,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 865.40857 ± 123.423
2025-08-07 08:14:32,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [804.09796, 679.5077, 888.32184, 1034.2773, 810.9627, 722.39, 773.9405, 977.99524, 898.67395, 1063.9185]
2025-08-07 08:14:32,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:14:32,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 3 minutes, 6 seconds)
2025-08-07 08:16:15,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:16:28,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 859.96497 ± 176.489
2025-08-07 08:16:28,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1147.2179, 936.76776, 816.40955, 1010.5121, 977.95764, 847.11096, 500.10785, 888.9113, 627.7658, 846.8889]
2025-08-07 08:16:28,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:16:28,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 1 minute, 6 seconds)
2025-08-07 08:18:10,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:18:23,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 858.57971 ± 144.081
2025-08-07 08:18:23,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [812.60956, 942.7555, 904.7235, 534.019, 969.80725, 871.5163, 759.54315, 763.25635, 1088.9985, 938.5674]
2025-08-07 08:18:23,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:18:23,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 59 minutes, 13 seconds)
2025-08-07 08:20:04,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:20:17,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 837.27209 ± 125.952
2025-08-07 08:20:17,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [812.98926, 621.04474, 717.65656, 807.6241, 873.3816, 771.4823, 907.2503, 847.09973, 889.1947, 1124.998]
2025-08-07 08:20:17,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:20:17,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 57 minutes, 25 seconds)
2025-08-07 08:21:58,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:22:10,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 855.46838 ± 153.400
2025-08-07 08:22:10,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [944.9462, 607.9893, 796.1596, 876.6681, 552.0119, 1036.6611, 912.62695, 865.23676, 947.9936, 1014.38965]
2025-08-07 08:22:10,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:22:10,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 55 minutes, 10 seconds)
2025-08-07 08:23:52,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:24:04,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 710.95459 ± 164.175
2025-08-07 08:24:04,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [920.9863, 680.7279, 548.8177, 422.1303, 775.78925, 531.2097, 839.55286, 736.862, 957.6338, 695.8362]
2025-08-07 08:24:04,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:24:04,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 53 minutes, 25 seconds)
2025-08-07 08:25:47,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:26:00,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 920.14178 ± 184.704
2025-08-07 08:26:00,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1040.6848, 737.19867, 1016.6247, 1059.0288, 872.0366, 952.0155, 851.13794, 1177.6749, 492.0306, 1002.98566]
2025-08-07 08:26:00,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:26:00,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 51 minutes, 29 seconds)
2025-08-07 08:27:39,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:27:52,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 833.33252 ± 163.852
2025-08-07 08:27:52,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [579.0851, 898.87805, 865.9058, 541.7674, 890.60443, 937.01056, 1065.7898, 682.6553, 906.73175, 964.8966]
2025-08-07 08:27:52,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:27:52,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 49 minutes, 19 seconds)
2025-08-07 08:29:33,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:29:47,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 752.43250 ± 119.964
2025-08-07 08:29:47,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [988.59076, 688.7122, 768.1956, 721.1936, 677.0508, 804.42554, 529.8553, 660.8775, 831.54694, 853.8764]
2025-08-07 08:29:47,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:29:47,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 47 minutes, 31 seconds)
2025-08-07 08:31:30,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:31:43,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 921.00604 ± 141.740
2025-08-07 08:31:43,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [808.40515, 814.7444, 865.1414, 919.0001, 1083.6993, 1109.6997, 643.19324, 894.2455, 975.1233, 1096.8091]
2025-08-07 08:31:43,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:31:43,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 45 minutes, 50 seconds)
2025-08-07 08:33:23,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:33:35,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 847.09216 ± 206.837
2025-08-07 08:33:35,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [809.1424, 685.97394, 931.853, 494.71625, 1158.7307, 950.3437, 525.92725, 1052.3287, 899.74426, 962.1618]
2025-08-07 08:33:35,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:33:35,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 43 minutes, 46 seconds)
2025-08-07 08:35:15,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:35:28,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 971.98401 ± 170.935
2025-08-07 08:35:28,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [805.124, 1245.3763, 894.5039, 1133.147, 997.54926, 775.40466, 976.62134, 1133.7334, 685.6388, 1072.7411]
2025-08-07 08:35:28,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:35:28,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (971.98) for latency ExtremeClogL1U23
2025-08-07 08:35:28,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 41 minutes, 40 seconds)
2025-08-07 08:37:08,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:37:21,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 883.00305 ± 116.693
2025-08-07 08:37:21,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [691.39197, 923.12415, 924.1691, 995.4757, 763.2505, 869.6204, 1110.4827, 767.56287, 935.66187, 849.2909]
2025-08-07 08:37:21,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:37:21,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 39 minutes, 50 seconds)
2025-08-07 08:39:05,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:39:19,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 733.66461 ± 131.082
2025-08-07 08:39:19,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [905.4463, 851.7191, 549.2093, 898.54816, 638.80865, 615.3858, 778.8112, 623.039, 860.04877, 615.6294]
2025-08-07 08:39:19,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:39:19,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 38 minutes, 7 seconds)
2025-08-07 08:41:03,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:41:15,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 872.84631 ± 205.478
2025-08-07 08:41:15,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1124.8259, 898.5813, 488.5329, 995.86017, 946.62634, 814.52856, 497.31424, 1034.2386, 1000.7376, 927.2178]
2025-08-07 08:41:15,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:41:15,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 36 minutes, 13 seconds)
2025-08-07 08:42:56,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:43:08,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 892.66711 ± 182.807
2025-08-07 08:43:08,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [520.6177, 1009.36615, 1018.0357, 986.57837, 882.7836, 1111.2246, 595.19934, 1007.2041, 834.25653, 961.4046]
2025-08-07 08:43:08,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:43:08,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 34 minutes, 22 seconds)
2025-08-07 08:44:48,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:45:00,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 980.53601 ± 107.933
2025-08-07 08:45:00,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [967.0726, 1009.8574, 975.78937, 1129.7616, 770.2852, 1086.2668, 863.01324, 1117.553, 908.538, 977.2238]
2025-08-07 08:45:00,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:45:00,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (980.54) for latency ExtremeClogL1U23
2025-08-07 08:45:01,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 32 minutes, 26 seconds)
2025-08-07 08:46:41,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:46:54,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 985.59412 ± 128.376
2025-08-07 08:46:54,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1035.6758, 1206.0847, 1112.4524, 1043.3262, 859.10785, 933.1952, 859.5867, 997.67615, 752.2546, 1056.5819]
2025-08-07 08:46:54,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:46:54,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (985.59) for latency ExtremeClogL1U23
2025-08-07 08:46:54,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 30 minutes, 32 seconds)
2025-08-07 08:48:34,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:48:48,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 899.27899 ± 210.855
2025-08-07 08:48:48,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [639.2828, 1054.4072, 793.58093, 624.3073, 922.08496, 1077.1993, 1069.1234, 992.5916, 596.90405, 1223.3082]
2025-08-07 08:48:48,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:48:48,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 28 minutes, 26 seconds)
2025-08-07 08:50:29,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:50:42,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 871.71301 ± 124.696
2025-08-07 08:50:42,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1085.1415, 818.6225, 1018.5436, 828.64246, 924.94196, 667.14465, 810.6477, 745.71027, 817.6056, 1000.1297]
2025-08-07 08:50:42,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:50:42,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 26 minutes, 27 seconds)
2025-08-07 08:52:25,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:52:39,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 816.76385 ± 185.342
2025-08-07 08:52:39,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1241.7201, 881.96686, 915.0135, 605.59546, 883.5085, 557.1987, 872.56274, 818.4775, 706.34015, 685.2547]
2025-08-07 08:52:39,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:52:39,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 24 minutes, 43 seconds)
2025-08-07 08:54:20,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:54:33,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 789.11896 ± 136.267
2025-08-07 08:54:33,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [990.33936, 821.8092, 999.4092, 612.79254, 744.4953, 625.1211, 937.68024, 671.8487, 741.8906, 745.8034]
2025-08-07 08:54:33,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:54:33,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 22 minutes, 52 seconds)
2025-08-07 08:56:14,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:56:28,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 963.87695 ± 266.967
2025-08-07 08:56:28,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [949.9461, 1125.512, 943.0289, 1156.6805, 390.49243, 1140.5583, 1149.9436, 1170.4333, 521.7276, 1090.4459]
2025-08-07 08:56:28,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:56:28,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 3 seconds)
2025-08-07 08:58:08,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:58:20,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 902.20361 ± 179.014
2025-08-07 08:58:20,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1056.9968, 903.7894, 1095.4445, 981.5301, 734.9046, 794.1253, 778.9577, 684.6786, 1253.8375, 737.7718]
2025-08-07 08:58:20,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:58:20,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 3 seconds)
2025-08-07 09:00:00,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:00:12,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 892.12909 ± 210.841
2025-08-07 09:00:12,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [890.34326, 575.94556, 900.1916, 810.0321, 668.45703, 1290.7823, 998.0325, 1117.5878, 1008.48517, 661.4336]
2025-08-07 09:00:12,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:00:12,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 6 seconds)
2025-08-07 09:01:54,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:02:06,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 919.46082 ± 166.767
2025-08-07 09:02:06,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1061.4337, 858.3619, 1003.45294, 653.7809, 1069.8958, 1219.296, 885.5388, 724.54877, 953.05145, 765.24774]
2025-08-07 09:02:06,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:02:06,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 8 seconds)
2025-08-07 09:03:48,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:04:01,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1008.28162 ± 177.660
2025-08-07 09:04:01,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [645.41455, 1055.3945, 943.36884, 1251.6383, 1024.0236, 1179.9937, 1025.0503, 1059.6151, 1147.4792, 750.83826]
2025-08-07 09:04:01,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:04:01,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (1008.28) for latency ExtremeClogL1U23
2025-08-07 09:04:01,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 15 seconds)
2025-08-07 09:05:43,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:05:56,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1035.85046 ± 161.359
2025-08-07 09:05:56,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1321.4028, 927.45544, 1046.2316, 1063.6022, 708.87897, 1016.22534, 1073.6123, 1237.7566, 1055.2921, 908.0473]
2025-08-07 09:05:56,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:05:56,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (1035.85) for latency ExtremeClogL1U23
2025-08-07 09:05:56,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 21 seconds)
2025-08-07 09:07:37,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:07:50,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1021.97754 ± 156.734
2025-08-07 09:07:50,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [962.9491, 1194.2029, 1071.9452, 1100.3453, 1108.3888, 1312.4298, 776.5655, 902.9873, 843.0267, 946.93585]
2025-08-07 09:07:50,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:07:50,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 29 seconds)
2025-08-07 09:09:32,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:09:46,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1097.40942 ± 230.450
2025-08-07 09:09:46,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1465.0508, 1290.4963, 1271.1381, 1120.398, 766.2584, 1160.9996, 1111.2712, 1112.1827, 638.9341, 1037.3662]
2025-08-07 09:09:46,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:09:46,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (1097.41) for latency ExtremeClogL1U23
2025-08-07 09:09:46,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 39 seconds)
2025-08-07 09:11:28,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:11:41,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1206.78101 ± 154.015
2025-08-07 09:11:41,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1302.2631, 1069.1399, 1259.0057, 959.6935, 1130.6747, 1283.1748, 1211.0393, 1505.5729, 1318.9543, 1028.2921]
2025-08-07 09:11:41,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:11:41,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (1206.78) for latency ExtremeClogL1U23
2025-08-07 09:11:41,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 44 seconds)
2025-08-07 09:13:20,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:13:33,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 983.35559 ± 166.411
2025-08-07 09:13:33,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [972.5249, 994.094, 1132.935, 990.3706, 1032.2472, 1322.5857, 937.15436, 631.2751, 910.25195, 910.1159]
2025-08-07 09:13:33,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:13:33,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 48 seconds)
2025-08-07 09:15:15,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:15:29,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1156.22900 ± 112.327
2025-08-07 09:15:29,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1128.1619, 1083.1519, 1092.5547, 1127.4231, 1091.2183, 938.1907, 1272.888, 1303.9708, 1314.462, 1210.2697]
2025-08-07 09:15:29,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:15:29,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 54 seconds)
2025-08-07 09:17:12,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:17:26,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1129.32837 ± 125.791
2025-08-07 09:17:26,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1241.4778, 976.8676, 1164.5787, 1129.0365, 1044.8574, 1273.0927, 1253.058, 896.95856, 1050.4575, 1262.898]
2025-08-07 09:17:26,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:17:26,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1251 [DEBUG]: Training session finished
