2025-08-07 07:20:17,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc20-walker2d/ExtremeClogL1U23-bpql-mem24
2025-08-07 07:20:17,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc20-walker2d/ExtremeClogL1U23-bpql-mem24
2025-08-07 07:20:17,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x154569ec78d0>}
2025-08-07 07:20:17,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1111 [DEBUG]: using device: cuda
2025-08-07 07:20:17,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1133 [INFO]: Creating new trainer
2025-08-07 07:20:17,222 baseline-bpql-noiseperc20-walker2d:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=161, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 07:20:17,223 baseline-bpql-noiseperc20-walker2d:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 07:20:18,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1194 [DEBUG]: Starting training session...
2025-08-07 07:20:18,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 1/100
2025-08-07 07:21:47,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:21:48,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 39.92347 ± 101.439
2025-08-07 07:21:48,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [12.0852995, 2.672913, 4.5909514, 7.7675157, 6.610257, 8.422391, 3.6066523, 0.8921594, 344.09613, 8.490429]
2025-08-07 07:21:48,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [39.0, 14.0, 22.0, 68.0, 20.0, 22.0, 14.0, 17.0, 194.0, 45.0]
2025-08-07 07:21:48,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (39.92) for latency ExtremeClogL1U23
2025-08-07 07:21:48,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 28 minutes, 35 seconds)
2025-08-07 07:23:26,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:23:27,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 34.16220 ± 38.506
2025-08-07 07:23:27,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [6.700968, 120.52886, 6.28145, 12.182016, 0.9557056, 4.4841213, 56.132362, 3.825915, 61.25227, 69.27837]
2025-08-07 07:23:27,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 153.0, 119.0, 23.0, 23.0, 19.0, 72.0, 14.0, 95.0, 207.0]
2025-08-07 07:23:27,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 34 minutes, 27 seconds)
2025-08-07 07:25:05,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:25:06,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 40.55556 ± 67.687
2025-08-07 07:25:06,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1.0761575, 67.388626, 231.07481, 20.109013, 4.796711, 5.3924775, 5.725257, 61.605057, 2.5125515, 5.8749814]
2025-08-07 07:25:06,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 89.0, 193.0, 75.0, 16.0, 19.0, 185.0, 126.0, 13.0, 25.0]
2025-08-07 07:25:06,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (40.56) for latency ExtremeClogL1U23
2025-08-07 07:25:06,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 35 minutes, 28 seconds)
2025-08-07 07:26:43,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:26:44,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 7.50564 ± 16.584
2025-08-07 07:26:44,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [4.0092044, 4.330464, 3.1967878, 7.1057305, 5.5910654, 4.4698052, -8.901362, 0.9913989, -1.2801226, 55.543457]
2025-08-07 07:26:44,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 14.0, 14.0, 21.0, 17.0, 17.0, 214.0, 25.0, 20.0, 87.0]
2025-08-07 07:26:44,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 34 minutes, 21 seconds)
2025-08-07 07:28:22,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:28:23,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 9.15488 ± 17.956
2025-08-07 07:28:23,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [6.696568, 33.77262, 2.0976686, 8.9438925, 2.9399827, -22.434671, 3.6987722, 47.181465, 5.6059966, 3.0464628]
2025-08-07 07:28:23,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 58.0, 15.0, 22.0, 149.0, 85.0, 35.0, 273.0, 25.0, 16.0]
2025-08-07 07:28:23,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 33 minutes, 46 seconds)
2025-08-07 07:30:01,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:30:01,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 18.59867 ± 18.046
2025-08-07 07:30:01,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [31.101954, 9.188811, 26.08924, 10.448057, 53.16044, -0.93977624, 2.5472217, 1.7708434, 44.380238, 8.23969]
2025-08-07 07:30:01,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [183.0, 22.0, 52.0, 22.0, 75.0, 12.0, 17.0, 15.0, 121.0, 24.0]
2025-08-07 07:30:01,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 34 minutes, 40 seconds)
2025-08-07 07:31:39,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:31:40,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 28.23497 ± 49.388
2025-08-07 07:31:40,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3.817205, 17.58142, 2.1062067, 4.0337405, 12.266447, 5.0581627, 170.35736, 51.12559, 14.3155, 1.6881149]
2025-08-07 07:31:40,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 27.0, 14.0, 15.0, 23.0, 19.0, 130.0, 154.0, 68.0, 19.0]
2025-08-07 07:31:40,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 32 minutes, 47 seconds)
2025-08-07 07:33:17,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:33:17,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 10.73123 ± 12.593
2025-08-07 07:33:17,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [-0.7733984, 0.20047909, 1.8142494, 31.858482, 6.8685412, 9.020242, 35.31736, 17.35744, 6.3835716, -0.73470616]
2025-08-07 07:33:17,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [12.0, 14.0, 15.0, 173.0, 19.0, 22.0, 64.0, 36.0, 16.0, 22.0]
2025-08-07 07:33:17,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 30 minutes, 34 seconds)
2025-08-07 07:34:54,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:34:55,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 37.23300 ± 55.838
2025-08-07 07:34:55,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [-0.38987526, 27.067848, 4.619104, 79.02398, 6.201879, 29.629032, 190.77528, 1.1552126, 8.646313, 25.601252]
2025-08-07 07:34:55,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 73.0, 19.0, 186.0, 20.0, 105.0, 148.0, 15.0, 23.0, 41.0]
2025-08-07 07:34:55,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 29 minutes, 6 seconds)
2025-08-07 07:36:32,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:36:33,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 15.28568 ± 10.544
2025-08-07 07:36:33,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [12.4884815, 9.464199, 22.401386, 11.868774, 13.346352, 2.1055121, 35.33258, 27.520868, 19.255325, -0.92670774]
2025-08-07 07:36:33,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 22.0, 68.0, 38.0, 36.0, 23.0, 100.0, 48.0, 58.0, 22.0]
2025-08-07 07:36:33,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 26 minutes, 55 seconds)
2025-08-07 07:38:10,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:38:11,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 50.07320 ± 91.905
2025-08-07 07:38:11,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1.2397717, 243.34009, 15.982047, -1.5957788, 223.423, 10.366834, 1.3785335, 5.90191, 1.2719561, -0.5763232]
2025-08-07 07:38:11,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 146.0, 41.0, 22.0, 140.0, 23.0, 13.0, 23.0, 12.0, 17.0]
2025-08-07 07:38:11,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (50.07) for latency ExtremeClogL1U23
2025-08-07 07:38:11,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 25 minutes, 14 seconds)
2025-08-07 07:39:48,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:39:49,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 32.90703 ± 72.344
2025-08-07 07:39:49,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [7.3778324, 6.710515, 3.9236696, 4.326059, 249.47452, 11.564626, 21.172283, 6.988319, 10.951698, 6.580761]
2025-08-07 07:39:49,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [31.0, 18.0, 15.0, 21.0, 175.0, 20.0, 41.0, 18.0, 39.0, 19.0]
2025-08-07 07:39:49,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 23 minutes, 24 seconds)
2025-08-07 07:41:27,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:41:28,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 70.49046 ± 86.843
2025-08-07 07:41:28,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3.6758096, 63.760212, 154.08624, 3.365662, 3.0239677, 49.431446, -0.9920241, 173.32631, 0.7116825, 254.5154]
2025-08-07 07:41:28,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 134.0, 134.0, 19.0, 19.0, 209.0, 17.0, 118.0, 18.0, 167.0]
2025-08-07 07:41:28,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (70.49) for latency ExtremeClogL1U23
2025-08-07 07:41:28,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 22 minutes, 15 seconds)
2025-08-07 07:43:05,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:43:07,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 98.32562 ± 120.782
2025-08-07 07:43:07,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [11.39439, 339.02164, 2.878742, -3.275166, 11.106599, 207.619, 226.8505, 180.96182, 0.32982528, 6.368907]
2025-08-07 07:43:07,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 326.0, 22.0, 24.0, 22.0, 120.0, 134.0, 134.0, 31.0, 17.0]
2025-08-07 07:43:07,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (98.33) for latency ExtremeClogL1U23
2025-08-07 07:43:07,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 20 minutes, 54 seconds)
2025-08-07 07:44:44,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:44:46,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 183.29074 ± 136.699
2025-08-07 07:44:46,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [127.06591, 7.688741, 304.73267, 273.15646, 263.6549, 141.18398, 334.70016, 370.97202, 5.5069838, 4.2457705]
2025-08-07 07:44:46,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [98.0, 33.0, 241.0, 439.0, 161.0, 97.0, 213.0, 291.0, 23.0, 23.0]
2025-08-07 07:44:46,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (183.29) for latency ExtremeClogL1U23
2025-08-07 07:44:46,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 19 minutes, 46 seconds)
2025-08-07 07:46:25,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:46:26,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 68.17200 ± 87.719
2025-08-07 07:46:26,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [171.22545, 243.91957, 7.138812, 9.418813, 2.7800395, 2.931766, 0.53257734, 56.18004, 11.040468, 176.55244]
2025-08-07 07:46:26,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [119.0, 143.0, 33.0, 22.0, 22.0, 23.0, 15.0, 146.0, 24.0, 253.0]
2025-08-07 07:46:26,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 18 minutes, 35 seconds)
2025-08-07 07:48:05,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:48:06,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 83.89129 ± 159.639
2025-08-07 07:48:06,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [365.82666, 4.8052917, 1.3004594, 1.49383, 1.9017047, 4.4101815, 9.309912, 3.2498014, 437.2167, 9.398391]
2025-08-07 07:48:06,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [209.0, 18.0, 15.0, 18.0, 32.0, 25.0, 23.0, 18.0, 374.0, 20.0]
2025-08-07 07:48:06,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 17 minutes, 32 seconds)
2025-08-07 07:49:46,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:49:48,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 167.87779 ± 265.996
2025-08-07 07:49:48,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [0.9686086, 427.3442, 1.9430073, 2.025623, 1.3140385, 153.82037, 853.93475, 1.4141325, 3.9560335, 232.05722]
2025-08-07 07:49:48,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 217.0, 14.0, 23.0, 19.0, 224.0, 663.0, 15.0, 19.0, 280.0]
2025-08-07 07:49:48,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 16 minutes, 37 seconds)
2025-08-07 07:51:26,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:51:28,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 112.16447 ± 164.892
2025-08-07 07:51:28,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [11.466504, 11.3215275, 9.393405, 261.48206, 366.07608, -0.74728847, 12.068749, 5.5064516, 441.33887, 3.7383614]
2025-08-07 07:51:28,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 25.0, 24.0, 213.0, 188.0, 19.0, 24.0, 17.0, 346.0, 21.0]
2025-08-07 07:51:28,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 15 minutes, 15 seconds)
2025-08-07 07:53:07,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:53:08,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 89.18716 ± 147.520
2025-08-07 07:53:08,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [5.315589, 69.59939, 1.6127762, 27.441874, 13.456097, 412.4801, 3.076081, 5.9214435, 4.904858, 348.0634]
2025-08-07 07:53:08,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [30.0, 84.0, 13.0, 57.0, 24.0, 208.0, 19.0, 19.0, 21.0, 283.0]
2025-08-07 07:53:08,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 13 minutes, 46 seconds)
2025-08-07 07:54:45,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:54:47,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 199.90527 ± 346.878
2025-08-07 07:54:47,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [9.98392, 8.509391, 1.7369465, 1085.5491, 61.77713, 3.245198, 3.6207433, 612.32465, -0.983276, 213.28894]
2025-08-07 07:54:47,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 18.0, 20.0, 657.0, 156.0, 32.0, 15.0, 264.0, 15.0, 117.0]
2025-08-07 07:54:47,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (199.91) for latency ExtremeClogL1U23
2025-08-07 07:54:47,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 11 minutes, 58 seconds)
2025-08-07 07:56:25,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:56:26,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 56.85609 ± 103.548
2025-08-07 07:56:26,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [4.623277, 5.6841745, 287.60428, 4.5958548, 237.86284, 6.0981326, 2.8781066, 3.952966, 8.071989, 7.189353]
2025-08-07 07:56:26,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 18.0, 129.0, 18.0, 138.0, 20.0, 19.0, 17.0, 19.0, 18.0]
2025-08-07 07:56:26,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 10 minutes, 6 seconds)
2025-08-07 07:58:04,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:58:05,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 86.37875 ± 112.083
2025-08-07 07:58:05,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [157.1423, 5.3926983, 8.106671, -0.27346373, 1.9733715, 0.11629077, 311.89212, 6.5061355, 251.99774, 120.93362]
2025-08-07 07:58:05,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [183.0, 19.0, 20.0, 25.0, 14.0, 11.0, 242.0, 16.0, 128.0, 77.0]
2025-08-07 07:58:05,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 7 minutes, 46 seconds)
2025-08-07 07:59:47,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:59:48,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 88.28159 ± 139.653
2025-08-07 07:59:48,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [6.293951, 4.2404914, 5.8052244, 144.46251, 11.237311, 36.056976, 450.3272, 218.50447, 6.3191843, -0.43147424]
2025-08-07 07:59:48,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [32.0, 17.0, 24.0, 138.0, 23.0, 69.0, 271.0, 180.0, 22.0, 19.0]
2025-08-07 07:59:48,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 6 minutes, 46 seconds)
2025-08-07 08:01:25,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:01:25,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 13.67302 ± 28.876
2025-08-07 08:01:25,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [0.77102333, 10.264891, 5.282125, 5.9471827, 7.1269574, 5.674981, 99.65742, -1.2953962, 4.5476193, -1.2465992]
2025-08-07 08:01:25,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [12.0, 23.0, 16.0, 18.0, 20.0, 22.0, 93.0, 20.0, 16.0, 21.0]
2025-08-07 08:01:25,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 4 minutes, 22 seconds)
2025-08-07 08:03:04,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:03:05,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 77.54439 ± 100.766
2025-08-07 08:03:05,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [47.279877, 275.21127, 242.83025, 8.738155, -1.6201917, 42.56089, 150.96695, 5.8719187, 0.40540403, 3.199392]
2025-08-07 08:03:05,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [72.0, 212.0, 145.0, 25.0, 24.0, 56.0, 137.0, 18.0, 15.0, 15.0]
2025-08-07 08:03:05,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 2 minutes, 43 seconds)
2025-08-07 08:04:43,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:04:45,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 86.67923 ± 109.990
2025-08-07 08:04:45,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [-0.4632243, 0.8982267, 6.011995, 326.04727, 9.098378, 2.9974086, 152.07166, 6.046316, 192.8712, 171.21307]
2025-08-07 08:04:45,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 18.0, 19.0, 148.0, 19.0, 22.0, 156.0, 23.0, 304.0, 287.0]
2025-08-07 08:04:45,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 1 minute, 17 seconds)
2025-08-07 08:06:26,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:06:27,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 38.63516 ± 75.528
2025-08-07 08:06:27,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [2.1135597, 244.28285, 6.7545753, -1.3276776, 3.7098322, 109.882256, 9.723638, 3.1410718, 7.2613482, 0.8101846]
2025-08-07 08:06:27,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 145.0, 18.0, 21.0, 16.0, 176.0, 24.0, 17.0, 23.0, 16.0]
2025-08-07 08:06:27,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 23 seconds)
2025-08-07 08:08:02,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:08:03,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 84.94817 ± 130.390
2025-08-07 08:08:03,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [-2.332206, 289.36447, 5.935386, -0.006180885, 3.5065494, -0.33956313, 345.1122, 4.13583, 198.08563, 6.01959]
2025-08-07 08:08:03,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [12.0, 145.0, 18.0, 19.0, 17.0, 64.0, 197.0, 15.0, 109.0, 18.0]
2025-08-07 08:08:03,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 57 minutes, 5 seconds)
2025-08-07 08:09:43,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:09:46,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 209.57108 ± 363.323
2025-08-07 08:09:46,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3.5263662, 6.743332, 238.82875, 488.5969, 131.26059, 4.848966, 6.743231, 6.5414925, 1202.5111, 6.109763]
2025-08-07 08:09:46,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 19.0, 170.0, 289.0, 208.0, 15.0, 19.0, 18.0, 1000.0, 17.0]
2025-08-07 08:09:46,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (209.57) for latency ExtremeClogL1U23
2025-08-07 08:09:46,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 56 minutes, 43 seconds)
2025-08-07 08:11:24,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:11:25,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 90.41600 ± 231.219
2025-08-07 08:11:25,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [5.0226297, 6.193384, 782.9048, 4.313243, 13.683862, 47.870724, 27.817156, 6.3184357, 2.942108, 7.093669]
2025-08-07 08:11:25,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 18.0, 481.0, 16.0, 24.0, 114.0, 122.0, 19.0, 18.0, 23.0]
2025-08-07 08:11:25,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 55 minutes, 11 seconds)
2025-08-07 08:13:04,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:13:06,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 133.43904 ± 184.774
2025-08-07 08:13:06,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [128.10858, 619.60254, 1.4485707, 8.7175665, 0.070872344, 56.953835, 8.285814, 42.882484, 187.98738, 280.3327]
2025-08-07 08:13:06,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [72.0, 385.0, 11.0, 24.0, 18.0, 147.0, 21.0, 160.0, 142.0, 164.0]
2025-08-07 08:13:06,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 53 minutes, 35 seconds)
2025-08-07 08:14:44,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:14:45,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 75.66389 ± 106.596
2025-08-07 08:14:45,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [13.1311455, 3.930751, 5.4695277, 226.98796, 0.26868922, 257.41904, 229.6165, 5.8875713, 4.083615, 9.844088]
2025-08-07 08:14:45,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 20.0, 22.0, 145.0, 25.0, 129.0, 350.0, 15.0, 24.0, 23.0]
2025-08-07 08:14:45,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 51 minutes, 8 seconds)
2025-08-07 08:16:25,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:16:27,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 230.69949 ± 317.331
2025-08-07 08:16:27,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [10.227682, 419.09253, 471.07538, -0.94488627, 402.132, 10.340299, 989.80176, -2.0925078, 3.9366477, 3.4257793]
2025-08-07 08:16:27,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 227.0, 255.0, 17.0, 196.0, 24.0, 628.0, 10.0, 25.0, 14.0]
2025-08-07 08:16:27,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (230.70) for latency ExtremeClogL1U23
2025-08-07 08:16:27,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 50 minutes, 57 seconds)
2025-08-07 08:18:05,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:18:07,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 169.35207 ± 206.642
2025-08-07 08:18:07,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [402.7231, 490.57886, 491.4923, 262.462, 10.87197, 10.767089, 0.31476197, 3.5763729, 5.6578307, 15.076261]
2025-08-07 08:18:07,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [201.0, 293.0, 547.0, 114.0, 25.0, 22.0, 10.0, 14.0, 16.0, 24.0]
2025-08-07 08:18:07,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 48 minutes, 30 seconds)
2025-08-07 08:19:46,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:19:48,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 269.71039 ± 269.079
2025-08-07 08:19:48,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [-0.51464486, 517.10724, 576.33264, 1.1350158, 273.3837, 207.33203, 317.31876, 8.670852, 795.8762, 0.46223205]
2025-08-07 08:19:48,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 297.0, 409.0, 12.0, 167.0, 132.0, 143.0, 23.0, 574.0, 14.0]
2025-08-07 08:19:48,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (269.71) for latency ExtremeClogL1U23
2025-08-07 08:19:48,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 47 minutes, 17 seconds)
2025-08-07 08:21:27,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:21:28,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 61.72761 ± 96.889
2025-08-07 08:21:28,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1.0476518, 6.5674834, 3.3971124, 4.9530616, 208.74841, 4.5502887, 2.2634087, -1.1930957, 111.51698, 275.42477]
2025-08-07 08:21:28,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [12.0, 19.0, 18.0, 18.0, 125.0, 16.0, 24.0, 12.0, 167.0, 170.0]
2025-08-07 08:21:28,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 45 minutes, 26 seconds)
2025-08-07 08:23:07,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:23:09,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 124.12842 ± 139.236
2025-08-07 08:23:09,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [242.39047, 2.883009, 103.62772, 3.624329, 292.67828, 404.21066, 170.7689, 13.891301, 5.1883454, 2.0211766]
2025-08-07 08:23:09,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [133.0, 22.0, 188.0, 18.0, 192.0, 222.0, 96.0, 23.0, 17.0, 25.0]
2025-08-07 08:23:09,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 44 minutes, 9 seconds)
2025-08-07 08:24:49,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:24:49,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 18.38649 ± 43.881
2025-08-07 08:24:49,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [149.83478, 2.0641468, 3.8060868, 2.9656858, 5.965212, 0.24744123, 2.0094926, 6.7705708, 1.9234965, 8.278017]
2025-08-07 08:24:49,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [100.0, 18.0, 16.0, 14.0, 19.0, 18.0, 17.0, 18.0, 20.0, 20.0]
2025-08-07 08:24:49,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 42 minutes, 7 seconds)
2025-08-07 08:26:27,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:26:28,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 196.54700 ± 242.099
2025-08-07 08:26:28,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [5.5174313, 617.8141, 5.713813, 477.47668, -2.3274267, 447.04074, 2.2610033, 4.903016, 3.5965188, 403.47418]
2025-08-07 08:26:28,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 355.0, 15.0, 221.0, 11.0, 188.0, 13.0, 14.0, 24.0, 213.0]
2025-08-07 08:26:28,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 40 minutes, 20 seconds)
2025-08-07 08:28:08,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:28:10,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 258.12918 ± 187.301
2025-08-07 08:28:10,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [452.48154, 417.3424, 5.8873596, 6.6003313, 209.16202, 310.50702, 283.4912, 11.137448, 328.00638, 556.67596]
2025-08-07 08:28:10,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [202.0, 237.0, 16.0, 19.0, 104.0, 179.0, 217.0, 22.0, 182.0, 310.0]
2025-08-07 08:28:10,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 38 minutes, 42 seconds)
2025-08-07 08:29:52,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:29:53,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 113.30467 ± 185.184
2025-08-07 08:29:53,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [11.450686, 7.052331, 299.51968, 194.24257, 585.996, 3.645619, 5.060793, 7.3515778, 5.740561, 12.986939]
2025-08-07 08:29:53,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 22.0, 148.0, 109.0, 239.0, 17.0, 23.0, 19.0, 21.0, 21.0]
2025-08-07 08:29:53,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 37 minutes, 42 seconds)
2025-08-07 08:31:28,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:31:30,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 212.34671 ± 255.589
2025-08-07 08:31:30,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3.7957225, 459.53683, 6.309987, 6.627086, 493.6991, 526.70013, 609.57214, 11.365331, -0.15981111, 6.020673]
2025-08-07 08:31:30,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 195.0, 18.0, 25.0, 257.0, 298.0, 348.0, 22.0, 12.0, 19.0]
2025-08-07 08:31:30,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 35 minutes, 15 seconds)
2025-08-07 08:33:09,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:33:09,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 20.59189 ± 50.804
2025-08-07 08:33:09,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [5.6415105, 1.1651393, 5.0566077, 4.560222, 172.64888, 6.0757284, -4.227393, 5.9773393, 0.4714375, 8.549422]
2025-08-07 08:33:09,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 13.0, 23.0, 16.0, 103.0, 18.0, 15.0, 18.0, 24.0, 25.0]
2025-08-07 08:33:09,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 33 minutes, 15 seconds)
2025-08-07 08:34:48,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:34:49,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 129.37886 ± 376.302
2025-08-07 08:34:49,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [-0.9874519, 5.4430027, 1.1456295, 7.036788, 9.562435, 3.4839914, 2.681282, 6.885168, 1258.2452, 0.2925293]
2025-08-07 08:34:49,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 18.0, 12.0, 18.0, 24.0, 22.0, 22.0, 17.0, 708.0, 15.0]
2025-08-07 08:34:49,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 31 minutes, 49 seconds)
2025-08-07 08:36:28,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:36:30,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 270.33157 ± 215.004
2025-08-07 08:36:30,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [387.8851, 363.596, 312.7677, 402.28506, 707.1928, 175.76945, 3.3985577, 345.9847, 1.0486478, 3.387713]
2025-08-07 08:36:30,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [204.0, 289.0, 169.0, 245.0, 353.0, 189.0, 13.0, 183.0, 16.0, 21.0]
2025-08-07 08:36:30,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (270.33) for latency ExtremeClogL1U23
2025-08-07 08:36:30,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 29 minutes, 54 seconds)
2025-08-07 08:38:09,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:38:10,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 98.29520 ± 183.118
2025-08-07 08:38:10,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [29.477795, 5.1115246, 430.0001, 7.9507275, 2.6259477, 495.99667, 1.5250492, 5.7203965, -0.86697984, 5.4108286]
2025-08-07 08:38:10,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [125.0, 16.0, 233.0, 24.0, 14.0, 344.0, 12.0, 15.0, 9.0, 20.0]
2025-08-07 08:38:10,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 27 minutes, 46 seconds)
2025-08-07 08:39:49,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:39:52,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 358.73767 ± 368.630
2025-08-07 08:39:52,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [329.82846, 637.8337, 239.88371, 3.5778131, 4.1881204, 6.150219, 7.7129884, 1139.2888, 483.63007, 735.2827]
2025-08-07 08:39:52,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [166.0, 477.0, 260.0, 14.0, 24.0, 14.0, 23.0, 539.0, 386.0, 417.0]
2025-08-07 08:39:52,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (358.74) for latency ExtremeClogL1U23
2025-08-07 08:39:52,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 27 minutes, 1 second)
2025-08-07 08:41:32,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:41:33,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 146.56940 ± 211.381
2025-08-07 08:41:33,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [5.377567, 617.82654, -4.5963316, -4.174311, 421.05255, 140.60732, 5.802355, -7.0151677, 287.60767, 3.20585]
2025-08-07 08:41:33,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 249.0, 23.0, 15.0, 192.0, 106.0, 22.0, 16.0, 152.0, 21.0]
2025-08-07 08:41:33,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 25 minutes, 39 seconds)
2025-08-07 08:43:12,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:43:14,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 189.71458 ± 232.999
2025-08-07 08:43:14,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [399.52823, 7.5074205, 198.64647, 5.598772, 1.8228197, 1.8502241, 5.38106, 635.4795, 103.74597, 537.58545]
2025-08-07 08:43:14,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [175.0, 17.0, 111.0, 20.0, 19.0, 22.0, 22.0, 382.0, 141.0, 259.0]
2025-08-07 08:43:14,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 24 minutes, 8 seconds)
2025-08-07 08:44:57,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:44:59,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 178.71489 ± 301.663
2025-08-07 08:44:59,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [4.7165256, 1034.5774, 156.16528, 7.4767766, 223.11487, 51.99056, 8.431315, 8.37025, 286.21497, 6.0909386]
2025-08-07 08:44:59,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 581.0, 94.0, 20.0, 116.0, 125.0, 21.0, 19.0, 228.0, 21.0]
2025-08-07 08:44:59,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 23 minutes, 8 seconds)
2025-08-07 08:46:33,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:46:34,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 98.43943 ± 155.093
2025-08-07 08:46:34,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [0.03569344, 352.10834, -0.38593376, 429.9107, 3.509838, 1.1443353, 167.1821, 2.0909967, 3.0867124, 25.711489]
2025-08-07 08:46:34,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 184.0, 11.0, 207.0, 21.0, 14.0, 97.0, 15.0, 17.0, 116.0]
2025-08-07 08:46:34,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 20 minutes, 37 seconds)
2025-08-07 08:48:14,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:48:14,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 20.45825 ± 43.012
2025-08-07 08:48:14,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [10.170491, 5.1338096, 8.856995, -0.26151964, 3.357848, 9.562878, 10.216293, 6.8093514, 1.6690744, 149.0673]
2025-08-07 08:48:14,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 21.0, 25.0, 13.0, 17.0, 23.0, 20.0, 20.0, 14.0, 98.0]
2025-08-07 08:48:14,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 18 minutes, 37 seconds)
2025-08-07 08:49:52,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:49:53,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 186.37546 ± 224.337
2025-08-07 08:49:53,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [2.5967336, 4.704634, 470.38144, 385.72226, 4.630838, 526.3247, 9.584516, 7.160683, 450.97318, 1.675632]
2025-08-07 08:49:53,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 16.0, 193.0, 186.0, 15.0, 249.0, 22.0, 17.0, 194.0, 17.0]
2025-08-07 08:49:53,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 16 minutes, 45 seconds)
2025-08-07 08:51:33,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:51:34,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 230.88611 ± 352.248
2025-08-07 08:51:34,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [5.8683224, 2.4074974, 12.153871, 779.60114, 592.30646, 7.663713, 902.77466, 1.2421519, 2.7634072, 2.079793]
2025-08-07 08:51:34,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 23.0, 25.0, 348.0, 255.0, 22.0, 383.0, 16.0, 25.0, 15.0]
2025-08-07 08:51:34,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 15 minutes, 4 seconds)
2025-08-07 08:53:17,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:53:18,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 135.39636 ± 199.002
2025-08-07 08:53:18,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [10.814907, 1.6076192, 7.6613297, 485.5321, 2.4344344, 4.272818, 375.54605, 6.176576, 10.229832, 449.68805]
2025-08-07 08:53:18,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 13.0, 21.0, 279.0, 17.0, 21.0, 204.0, 20.0, 22.0, 253.0]
2025-08-07 08:53:18,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 13 minutes, 13 seconds)
2025-08-07 08:54:56,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:54:59,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 471.32455 ± 398.863
2025-08-07 08:54:59,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [394.78604, 690.85736, 0.77153313, 479.06177, 801.06506, 572.3292, 1339.422, 431.7267, 5.689881, -2.4638274]
2025-08-07 08:54:59,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [334.0, 494.0, 20.0, 186.0, 341.0, 299.0, 554.0, 214.0, 22.0, 10.0]
2025-08-07 08:54:59,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (471.32) for latency ExtremeClogL1U23
2025-08-07 08:54:59,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 12 minutes, 26 seconds)
2025-08-07 08:56:37,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:56:37,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 38.02965 ± 66.062
2025-08-07 08:56:37,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [-1.8586981, 1.5336165, 27.560387, 155.15166, 182.16193, -0.6163828, 3.5063941, 3.7534184, 7.349156, 1.7549994]
2025-08-07 08:56:37,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 18.0, 109.0, 92.0, 111.0, 25.0, 19.0, 16.0, 22.0, 34.0]
2025-08-07 08:56:37,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 10 minutes, 27 seconds)
2025-08-07 08:58:16,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:58:18,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 153.30965 ± 222.137
2025-08-07 08:58:18,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [491.63483, 37.67239, 1.8654937, 526.0605, -0.7955969, 2.188148, 0.43245283, 456.56805, 10.605638, 6.864632]
2025-08-07 08:58:18,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [317.0, 111.0, 25.0, 245.0, 18.0, 18.0, 11.0, 181.0, 21.0, 17.0]
2025-08-07 08:58:18,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 8 minutes, 55 seconds)
2025-08-07 09:00:00,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:00:02,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 210.00916 ± 214.098
2025-08-07 09:00:02,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [535.8816, 9.025237, 426.53763, 153.7815, 0.7817931, -0.22729126, 287.9773, -3.3708446, 137.73369, 551.9711]
2025-08-07 09:00:02,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [245.0, 23.0, 223.0, 102.0, 11.0, 23.0, 128.0, 24.0, 85.0, 268.0]
2025-08-07 09:00:02,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 7 minutes, 39 seconds)
2025-08-07 09:01:39,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:01:40,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 243.07935 ± 212.801
2025-08-07 09:01:40,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [169.69695, 160.31194, 475.07538, 7.102826, 548.17944, 359.50983, 562.5104, 130.97818, 8.927339, 8.501428]
2025-08-07 09:01:40,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [98.0, 97.0, 200.0, 25.0, 265.0, 143.0, 245.0, 75.0, 18.0, 23.0]
2025-08-07 09:01:40,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 5 minutes, 15 seconds)
2025-08-07 09:03:17,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:03:18,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 303.62192 ± 342.264
2025-08-07 09:03:18,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [879.92957, 861.7257, -2.0283878, 9.854612, 341.7904, 578.59357, 8.177941, -0.17570972, 5.2604504, 353.0911]
2025-08-07 09:03:18,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [357.0, 322.0, 18.0, 21.0, 144.0, 249.0, 20.0, 16.0, 19.0, 149.0]
2025-08-07 09:03:18,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 3 minutes, 11 seconds)
2025-08-07 09:04:58,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:04:59,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 121.61123 ± 199.343
2025-08-07 09:04:59,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [7.019811, -0.45503357, 186.48718, -1.8350655, 4.4614744, 3.2909446, 3.797574, 481.11774, 4.1142015, 528.1134]
2025-08-07 09:04:59,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 20.0, 112.0, 24.0, 19.0, 23.0, 15.0, 208.0, 18.0, 231.0]
2025-08-07 09:04:59,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 1 minute, 54 seconds)
2025-08-07 09:06:39,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:06:40,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 177.24593 ± 309.307
2025-08-07 09:06:40,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [8.737096, 1.2680248, 502.93225, 0.29261008, 180.86407, 75.030876, 12.855845, 1.317549, 989.61597, -0.4550461]
2025-08-07 09:06:40,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 24.0, 234.0, 23.0, 98.0, 131.0, 22.0, 14.0, 457.0, 19.0]
2025-08-07 09:06:40,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 15 seconds)
2025-08-07 09:08:24,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:08:26,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 278.96890 ± 431.801
2025-08-07 09:08:26,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [-1.094617, 7.4060144, 661.78613, 570.07074, -0.9078819, 176.86337, 1359.5442, 6.5128703, 8.656341, 0.8519374]
2025-08-07 09:08:26,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 18.0, 292.0, 283.0, 20.0, 96.0, 584.0, 17.0, 19.0, 19.0]
2025-08-07 09:08:26,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 58 minutes, 49 seconds)
2025-08-07 09:10:00,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:10:04,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 595.48181 ± 668.530
2025-08-07 09:10:04,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [-0.39507377, 497.97913, 1252.385, 1432.0825, 383.64505, 164.85205, 242.02422, 1978.6484, -0.056273296, 3.6531487]
2025-08-07 09:10:04,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [10.0, 203.0, 614.0, 645.0, 163.0, 104.0, 114.0, 1000.0, 14.0, 22.0]
2025-08-07 09:10:04,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (595.48) for latency ExtremeClogL1U23
2025-08-07 09:10:04,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 57 minutes, 8 seconds)
2025-08-07 09:11:48,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:11:49,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 153.28268 ± 196.931
2025-08-07 09:11:49,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [65.25462, 0.14133948, 10.916215, 13.077569, 565.01324, 128.09576, 304.23044, 8.36031, 1.9083571, 435.82913]
2025-08-07 09:11:49,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [147.0, 21.0, 20.0, 24.0, 256.0, 170.0, 134.0, 25.0, 17.0, 205.0]
2025-08-07 09:11:49,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 56 minutes, 13 seconds)
2025-08-07 09:13:29,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:13:31,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 263.85730 ± 606.083
2025-08-07 09:13:31,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [476.9344, 1.6767528, 2.7387226, 110.72379, 1.5769004, 6.583826, 7.102488, 2032.1727, -3.5402899, 2.6035736]
2025-08-07 09:13:31,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [192.0, 12.0, 20.0, 74.0, 24.0, 21.0, 18.0, 869.0, 21.0, 14.0]
2025-08-07 09:13:31,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 54 minutes, 37 seconds)
2025-08-07 09:15:08,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:15:10,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 325.10187 ± 360.268
2025-08-07 09:15:10,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [7.4520097, 13.255469, 324.58286, 436.24808, 418.9001, 1251.412, 419.71417, 6.1077933, 375.80566, -2.4593692]
2025-08-07 09:15:10,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 82.0, 159.0, 198.0, 176.0, 498.0, 169.0, 17.0, 150.0, 12.0]
2025-08-07 09:15:10,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 52 minutes, 45 seconds)
2025-08-07 09:16:48,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:16:50,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 244.97940 ± 469.743
2025-08-07 09:16:50,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1577.2644, 335.38736, 1.3032249, -2.2855484, 2.4866128, 11.043194, 449.42822, 6.681931, 2.7884226, 65.69617]
2025-08-07 09:16:50,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [668.0, 269.0, 21.0, 15.0, 18.0, 19.0, 181.0, 25.0, 23.0, 125.0]
2025-08-07 09:16:50,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 50 minutes, 22 seconds)
2025-08-07 09:18:33,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:18:35,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 195.72995 ± 344.915
2025-08-07 09:18:35,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [349.5607, 1161.0171, 0.25008532, 0.9685428, 6.0953455, 6.607399, 285.69376, 1.7418592, 144.58191, 0.78275716]
2025-08-07 09:18:35,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [143.0, 501.0, 14.0, 12.0, 18.0, 20.0, 128.0, 16.0, 87.0, 14.0]
2025-08-07 09:18:35,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 49 minutes, 20 seconds)
2025-08-07 09:20:15,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:20:18,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 476.75146 ± 621.678
2025-08-07 09:20:18,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [254.03227, 0.9237983, -0.06653467, 367.82648, -0.1673706, 1328.4028, 973.0685, 1795.7478, 7.1848497, 40.56196]
2025-08-07 09:20:18,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [114.0, 16.0, 15.0, 151.0, 23.0, 577.0, 387.0, 889.0, 22.0, 121.0]
2025-08-07 09:20:18,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 47 minutes, 26 seconds)
2025-08-07 09:21:52,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:21:56,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 475.58356 ± 432.384
2025-08-07 09:21:56,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [974.6561, 214.81334, 3.680009, 807.92914, 1236.869, 4.439859, 4.438848, 572.83026, 162.91725, 773.26184]
2025-08-07 09:21:56,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [461.0, 179.0, 14.0, 307.0, 863.0, 16.0, 21.0, 285.0, 232.0, 312.0]
2025-08-07 09:21:56,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 45 minutes, 25 seconds)
2025-08-07 09:23:38,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:23:40,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 405.22345 ± 529.712
2025-08-07 09:23:40,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [395.30154, 12.010742, 326.86365, 76.012566, -0.1606207, 5.7121496, 1270.4546, 412.59314, 5.9892163, 1547.4578]
2025-08-07 09:23:40,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [166.0, 23.0, 139.0, 123.0, 17.0, 19.0, 601.0, 174.0, 23.0, 631.0]
2025-08-07 09:23:40,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 44 minutes, 11 seconds)
2025-08-07 09:25:17,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:25:20,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 375.34436 ± 578.279
2025-08-07 09:25:20,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [7.136214, -1.2348009, -0.23249091, 290.54025, 9.815405, 978.79785, 1816.2822, 4.283156, 641.3597, 6.696055]
2025-08-07 09:25:20,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 14.0, 24.0, 126.0, 24.0, 444.0, 1000.0, 17.0, 327.0, 19.0]
2025-08-07 09:25:20,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 42 minutes, 29 seconds)
2025-08-07 09:27:00,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:27:01,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 147.89003 ± 230.698
2025-08-07 09:27:01,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [0.29902405, 4.7939286, 0.40867937, 110.70904, 707.87286, 2.30042, 3.8175557, 451.78708, 14.644707, 182.26698]
2025-08-07 09:27:01,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 17.0, 13.0, 156.0, 313.0, 17.0, 17.0, 205.0, 66.0, 105.0]
2025-08-07 09:27:01,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 40 minutes, 32 seconds)
2025-08-07 09:28:40,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:28:41,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 146.20120 ± 152.865
2025-08-07 09:28:41,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [294.28497, 142.74371, 8.6599865, 1.7554636, 251.21187, 364.58978, 4.374818, 379.74576, 8.115044, 6.5307813]
2025-08-07 09:28:41,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [144.0, 86.0, 24.0, 14.0, 155.0, 155.0, 14.0, 152.0, 18.0, 18.0]
2025-08-07 09:28:41,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 38 minutes, 36 seconds)
2025-08-07 09:30:21,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:30:23,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 321.79474 ± 501.588
2025-08-07 09:30:23,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [702.7629, 3.212355, 879.32983, 3.0913641, -1.3869065, 106.045616, 10.551517, 1.3322634, 1507.6207, 5.387884]
2025-08-07 09:30:23,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [293.0, 23.0, 402.0, 14.0, 15.0, 173.0, 25.0, 21.0, 565.0, 16.0]
2025-08-07 09:30:23,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 37 minutes, 9 seconds)
2025-08-07 09:32:02,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:32:03,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 104.53334 ± 250.657
2025-08-07 09:32:03,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [150.61346, -2.3185072, 3.074373, 8.915918, 8.240069, 9.293329, 2.7007034, 845.2407, 9.6398, 9.93358]
2025-08-07 09:32:03,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [212.0, 21.0, 13.0, 21.0, 21.0, 23.0, 15.0, 411.0, 24.0, 25.0]
2025-08-07 09:32:03,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 35 minutes, 11 seconds)
2025-08-07 09:33:45,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:33:46,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 143.89047 ± 284.796
2025-08-07 09:33:46,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [6.5065417, 7.9730625, -1.2650131, 1.9938303, 1.131808, 56.700615, 12.532879, 4.27199, 445.7869, 903.27203]
2025-08-07 09:33:46,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 26.0, 22.0, 17.0, 25.0, 138.0, 23.0, 20.0, 179.0, 344.0]
2025-08-07 09:33:46,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 33 minutes, 44 seconds)
2025-08-07 09:35:23,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:35:25,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 280.66379 ± 395.617
2025-08-07 09:35:25,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [4.7341003, 6.8468375, 427.77493, -0.34708416, 1126.8647, 4.8498116, 883.9788, 7.100398, 344.1956, 0.6396266]
2025-08-07 09:35:25,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 18.0, 173.0, 24.0, 437.0, 14.0, 327.0, 16.0, 142.0, 12.0]
2025-08-07 09:35:25,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 31 minutes, 53 seconds)
2025-08-07 09:37:06,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:37:08,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 261.80301 ± 319.372
2025-08-07 09:37:08,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [776.74646, 160.75157, 5.764825, 8.603849, 379.64923, 7.843591, 395.4323, 880.6457, 2.9833632, -0.39075422]
2025-08-07 09:37:08,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [313.0, 100.0, 21.0, 19.0, 140.0, 18.0, 167.0, 340.0, 15.0, 21.0]
2025-08-07 09:37:08,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 30 minutes, 23 seconds)
2025-08-07 09:38:46,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:38:47,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 201.45224 ± 229.096
2025-08-07 09:38:47,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [-0.9186333, -0.003113162, 504.6568, 549.077, 12.551647, 407.97504, 4.2694182, 3.701681, 85.32915, 447.8833]
2025-08-07 09:38:47,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 16.0, 207.0, 269.0, 22.0, 169.0, 15.0, 20.0, 156.0, 179.0]
2025-08-07 09:38:47,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 28 minutes, 34 seconds)
2025-08-07 09:40:27,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:40:29,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 257.97757 ± 395.899
2025-08-07 09:40:29,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1233.1672, 194.49687, 0.78641874, 4.7300553, 7.220127, 747.9164, 9.182339, 336.21603, 4.1555705, 41.904846]
2025-08-07 09:40:29,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [509.0, 102.0, 24.0, 23.0, 19.0, 449.0, 20.0, 143.0, 14.0, 76.0]
2025-08-07 09:40:29,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 26 minutes, 58 seconds)
2025-08-07 09:42:09,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:42:11,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 307.85748 ± 434.648
2025-08-07 09:42:11,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [998.67615, 349.74036, 11.790543, 1220.9055, 475.77756, 4.635278, 11.099894, 3.3364053, 2.7595074, -0.14635319]
2025-08-07 09:42:11,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [382.0, 146.0, 24.0, 467.0, 208.0, 16.0, 23.0, 15.0, 14.0, 18.0]
2025-08-07 09:42:11,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 13 seconds)
2025-08-07 09:43:49,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:43:50,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 187.37315 ± 371.501
2025-08-07 09:43:50,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [5.0101643, 1074.5651, 4.2316394, 4.102469, 10.968294, 3.0993721, 9.424819, 759.09064, -1.7968276, 5.0359697]
2025-08-07 09:43:50,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 430.0, 15.0, 16.0, 22.0, 16.0, 20.0, 293.0, 10.0, 21.0]
2025-08-07 09:43:50,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 23 minutes, 34 seconds)
2025-08-07 09:45:30,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:45:32,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 243.71191 ± 307.163
2025-08-07 09:45:32,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [7.4489503, 577.5759, 5.3598313, -1.8977294, 811.954, 629.0684, 385.46912, 8.6151495, 3.727668, 9.797509]
2025-08-07 09:45:32,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 258.0, 20.0, 16.0, 308.0, 307.0, 150.0, 25.0, 16.0, 21.0]
2025-08-07 09:45:32,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 21 minutes, 50 seconds)
2025-08-07 09:47:13,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:47:15,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 430.75400 ± 420.886
2025-08-07 09:47:15,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [805.6253, 168.82732, 520.815, 6.2788115, 690.91296, 954.1014, 4.632859, 8.2821245, 1135.875, 12.189673]
2025-08-07 09:47:15,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [303.0, 98.0, 196.0, 21.0, 259.0, 393.0, 14.0, 24.0, 477.0, 25.0]
2025-08-07 09:47:15,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 19 seconds)
2025-08-07 09:48:54,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:48:57,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 407.69110 ± 425.673
2025-08-07 09:48:57,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1146.0818, 991.7251, 417.3628, 199.32027, 428.57135, 0.30546474, 882.8557, 4.8664827, 2.5607526, 3.261177]
2025-08-07 09:48:57,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [462.0, 447.0, 183.0, 103.0, 181.0, 13.0, 347.0, 15.0, 15.0, 16.0]
2025-08-07 09:48:57,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 37 seconds)
2025-08-07 09:50:35,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:50:37,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 232.73099 ± 299.974
2025-08-07 09:50:37,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [72.23552, 704.95, 2.4178033, 10.237819, 2.158143, 466.61356, 241.1551, 814.3143, 14.198021, -0.9704167]
2025-08-07 09:50:37,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [140.0, 260.0, 18.0, 32.0, 13.0, 187.0, 132.0, 314.0, 24.0, 11.0]
2025-08-07 09:50:37,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 52 seconds)
2025-08-07 09:52:17,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:52:19,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 408.35858 ± 451.480
2025-08-07 09:52:19,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [5.202818, 284.16464, 468.93777, 2.7046661, -1.0825928, 571.28143, 1510.3925, 745.2278, 4.633465, 492.1236]
2025-08-07 09:52:19,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 138.0, 199.0, 17.0, 10.0, 242.0, 564.0, 295.0, 14.0, 218.0]
2025-08-07 09:52:19,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 16 seconds)
2025-08-07 09:54:00,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:54:02,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 237.23540 ± 227.071
2025-08-07 09:54:02,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1.4732395, 145.38446, 3.306683, 498.82956, 540.8159, 120.1799, 596.20233, 90.66981, 4.2565594, 371.23566]
2025-08-07 09:54:02,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 87.0, 14.0, 203.0, 264.0, 85.0, 243.0, 146.0, 17.0, 163.0]
2025-08-07 09:54:02,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 35 seconds)
2025-08-07 09:55:40,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:55:43,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 463.85977 ± 554.379
2025-08-07 09:55:43,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [14.409595, 158.64261, 1179.2814, 718.3213, 1128.6068, 3.919543, 1420.9291, 10.88632, -1.7541391, 5.3550043]
2025-08-07 09:55:43,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 97.0, 465.0, 268.0, 499.0, 15.0, 549.0, 25.0, 14.0, 17.0]
2025-08-07 09:55:43,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 50 seconds)
2025-08-07 09:57:21,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:57:22,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 241.99344 ± 314.829
2025-08-07 09:57:22,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [232.44174, 10.737262, 167.05165, 333.5761, 719.4892, 4.302509, 3.8965547, 11.460252, 930.3691, 6.6099133]
2025-08-07 09:57:22,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [125.0, 21.0, 98.0, 140.0, 272.0, 16.0, 15.0, 22.0, 361.0, 17.0]
2025-08-07 09:57:22,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 6 seconds)
2025-08-07 09:59:04,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:59:05,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 285.64981 ± 374.443
2025-08-07 09:59:05,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [-2.4480586, -0.68670535, 1076.6774, 8.939717, 688.7259, 0.07189826, 156.56606, 211.39058, -0.494928, 717.75616]
2025-08-07 09:59:05,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 24.0, 414.0, 21.0, 276.0, 18.0, 82.0, 109.0, 21.0, 254.0]
2025-08-07 09:59:05,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 28 seconds)
2025-08-07 10:00:43,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:00:45,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 396.85004 ± 524.665
2025-08-07 10:00:45,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [-0.26279435, 175.48035, -0.53755826, 5.0015793, 1671.2908, 786.27893, 588.997, 733.7212, -1.5424385, 10.073171]
2025-08-07 10:00:45,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [11.0, 91.0, 22.0, 21.0, 575.0, 325.0, 261.0, 264.0, 9.0, 20.0]
2025-08-07 10:00:45,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 44 seconds)
2025-08-07 10:02:26,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:02:27,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 196.31778 ± 305.758
2025-08-07 10:02:27,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1.7128042, 3.7683296, 433.4302, 791.374, -0.05297215, 5.301855, 711.3234, 0.81573516, 9.161303, 6.343088]
2025-08-07 10:02:27,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 18.0, 177.0, 303.0, 12.0, 16.0, 273.0, 22.0, 20.0, 19.0]
2025-08-07 10:02:27,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 3 seconds)
2025-08-07 10:04:07,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:04:08,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 339.73599 ± 423.191
2025-08-07 10:04:08,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [7.831203, -1.3966445, 5.7620726, 5.793838, 261.5684, 138.8365, 1362.8168, 247.24608, 715.58813, 653.3135]
2025-08-07 10:04:08,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 13.0, 20.0, 16.0, 129.0, 86.0, 435.0, 144.0, 286.0, 253.0]
2025-08-07 10:04:08,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 22 seconds)
2025-08-07 10:05:48,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:05:50,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 319.80295 ± 343.681
2025-08-07 10:05:50,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [663.9266, 0.08778158, 695.9374, 7.6792407, 10.444097, 6.9207287, 412.54413, 444.3313, 954.42816, 1.7301475]
2025-08-07 10:05:50,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [234.0, 18.0, 255.0, 24.0, 22.0, 19.0, 165.0, 180.0, 357.0, 20.0]
2025-08-07 10:05:50,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 41 seconds)
2025-08-07 10:07:30,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:07:31,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 182.33749 ± 356.497
2025-08-07 10:07:31,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [8.050159, 8.43817, 5.501408, -0.6470082, 0.9590984, 976.2939, 1.9794989, 10.475739, 6.157576, 806.16644]
2025-08-07 10:07:31,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 18.0, 19.0, 12.0, 15.0, 363.0, 15.0, 24.0, 18.0, 318.0]
2025-08-07 10:07:31,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1251 [DEBUG]: Training session finished
