2025-08-07 07:17:16,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc10-walker2d/ExtremeClogL1U23-bpql-mem24
2025-08-07 07:17:16,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc10-walker2d/ExtremeClogL1U23-bpql-mem24
2025-08-07 07:17:16,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x152432877990>}
2025-08-07 07:17:16,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1111 [DEBUG]: using device: cuda
2025-08-07 07:17:16,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1133 [INFO]: Creating new trainer
2025-08-07 07:17:16,557 baseline-bpql-noiseperc10-walker2d:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=161, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 07:17:16,557 baseline-bpql-noiseperc10-walker2d:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 07:17:17,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1194 [DEBUG]: Starting training session...
2025-08-07 07:17:17,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 1/100
2025-08-07 07:18:48,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:18:49,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 14.27054 ± 3.938
2025-08-07 07:18:49,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [17.818338, 14.440024, 17.324781, 12.229419, 13.950114, 7.600594, 16.334454, 7.9145865, 20.565094, 14.528035]
2025-08-07 07:18:49,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [43.0, 24.0, 58.0, 23.0, 61.0, 24.0, 60.0, 57.0, 40.0, 37.0]
2025-08-07 07:18:49,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (14.27) for latency ExtremeClogL1U23
2025-08-07 07:18:49,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 31 minutes, 46 seconds)
2025-08-07 07:20:28,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:20:28,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 47.03395 ± 41.825
2025-08-07 07:20:28,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [60.873196, 12.325408, 47.57811, 29.454601, 28.380098, 8.615223, 22.47395, 25.99847, 156.29439, 78.34602]
2025-08-07 07:20:28,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [61.0, 23.0, 101.0, 46.0, 48.0, 19.0, 41.0, 45.0, 123.0, 140.0]
2025-08-07 07:20:28,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (47.03) for latency ExtremeClogL1U23
2025-08-07 07:20:29,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 36 minutes, 22 seconds)
2025-08-07 07:22:08,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:22:09,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 37.10311 ± 26.417
2025-08-07 07:22:09,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [87.27008, 2.761209, 31.61644, 84.04726, 33.39069, 35.165146, 13.009193, 38.993298, 25.035376, 19.742413]
2025-08-07 07:22:09,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [232.0, 18.0, 169.0, 135.0, 50.0, 92.0, 25.0, 56.0, 42.0, 123.0]
2025-08-07 07:22:09,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 37 minutes, 22 seconds)
2025-08-07 07:23:48,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:23:50,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 40.31846 ± 35.811
2025-08-07 07:23:50,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [138.04097, 56.024143, 16.65578, 27.60003, 40.343822, 46.58541, 20.158564, 9.793179, 38.060394, 9.92233]
2025-08-07 07:23:50,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [138.0, 109.0, 138.0, 62.0, 95.0, 162.0, 56.0, 93.0, 118.0, 25.0]
2025-08-07 07:23:50,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 37 minutes, 1 second)
2025-08-07 07:25:28,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:25:28,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 27.67096 ± 26.055
2025-08-07 07:25:28,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [14.458248, -3.4575815, 23.923433, 74.37268, 28.35029, 9.421574, 7.689565, 8.803112, 37.991467, 75.15685]
2025-08-07 07:25:28,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 104.0, 50.0, 72.0, 60.0, 24.0, 20.0, 19.0, 49.0, 83.0]
2025-08-07 07:25:28,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 35 minutes, 36 seconds)
2025-08-07 07:27:08,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:27:09,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 45.39279 ± 29.158
2025-08-07 07:27:09,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [13.757196, 36.17235, 13.583016, 53.26077, 83.4243, 96.8947, 5.6607976, 57.569218, 62.815746, 30.789839]
2025-08-07 07:27:09,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [91.0, 41.0, 23.0, 74.0, 102.0, 134.0, 19.0, 78.0, 129.0, 66.0]
2025-08-07 07:27:09,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 36 minutes, 43 seconds)
2025-08-07 07:28:47,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:28:49,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 54.63204 ± 54.722
2025-08-07 07:28:49,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [34.951183, 10.904808, 6.8207345, 204.1555, 66.16764, 75.4586, 47.02586, 23.959063, 59.803898, 17.07315]
2025-08-07 07:28:49,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [89.0, 25.0, 17.0, 163.0, 115.0, 96.0, 60.0, 41.0, 276.0, 39.0]
2025-08-07 07:28:49,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (54.63) for latency ExtremeClogL1U23
2025-08-07 07:28:49,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 35 minutes, 2 seconds)
2025-08-07 07:30:31,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:30:32,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 51.49340 ± 47.825
2025-08-07 07:30:32,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [11.842758, 127.813225, 14.190013, 10.911647, 13.150629, 44.461067, 51.397617, 147.4972, 75.61702, 18.052834]
2025-08-07 07:30:32,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [37.0, 153.0, 33.0, 24.0, 121.0, 128.0, 83.0, 222.0, 98.0, 40.0]
2025-08-07 07:30:32,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 34 minutes, 13 seconds)
2025-08-07 07:32:09,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:32:10,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 57.53053 ± 55.985
2025-08-07 07:32:10,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [21.150555, 53.917763, 19.849003, 130.37102, 13.090384, 55.113968, 80.65349, 183.8881, 11.619905, 5.6510305]
2025-08-07 07:32:10,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [83.0, 89.0, 43.0, 87.0, 25.0, 63.0, 97.0, 118.0, 133.0, 20.0]
2025-08-07 07:32:10,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (57.53) for latency ExtremeClogL1U23
2025-08-07 07:32:10,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 31 minutes, 48 seconds)
2025-08-07 07:33:50,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:33:51,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 98.85189 ± 72.555
2025-08-07 07:33:51,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [217.4413, 110.999115, 78.24101, 114.624146, 8.278712, 56.104725, 40.706013, 242.81726, 76.18301, 43.1236]
2025-08-07 07:33:51,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [133.0, 94.0, 82.0, 109.0, 20.0, 129.0, 49.0, 157.0, 79.0, 70.0]
2025-08-07 07:33:51,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (98.85) for latency ExtremeClogL1U23
2025-08-07 07:33:51,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 30 minutes, 42 seconds)
2025-08-07 07:35:32,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:35:33,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 52.22321 ± 59.347
2025-08-07 07:35:33,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [14.889893, 148.27188, 180.02524, 17.251705, 73.65993, 28.219408, 8.071755, 33.5548, 8.703077, 9.584421]
2025-08-07 07:35:33,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [29.0, 103.0, 232.0, 30.0, 102.0, 51.0, 34.0, 47.0, 23.0, 24.0]
2025-08-07 07:35:33,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 29 minutes, 20 seconds)
2025-08-07 07:37:10,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:37:12,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 119.99084 ± 104.712
2025-08-07 07:37:12,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [92.823364, 300.36526, 142.67116, 201.62624, 44.766914, 12.6473255, 11.371617, 10.0889, 97.35244, 286.19522]
2025-08-07 07:37:12,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [152.0, 185.0, 96.0, 143.0, 139.0, 35.0, 22.0, 34.0, 122.0, 392.0]
2025-08-07 07:37:12,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (119.99) for latency ExtremeClogL1U23
2025-08-07 07:37:12,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 27 minutes, 33 seconds)
2025-08-07 07:38:51,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:38:52,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 87.54607 ± 104.245
2025-08-07 07:38:52,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [14.666809, 259.88815, 14.426603, 5.0926404, 15.370315, 5.3337984, 15.997569, 288.74673, 109.603516, 146.33453]
2025-08-07 07:38:52,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [35.0, 188.0, 24.0, 16.0, 33.0, 17.0, 33.0, 239.0, 100.0, 174.0]
2025-08-07 07:38:52,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 25 minutes, 3 seconds)
2025-08-07 07:40:32,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:40:33,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 143.48619 ± 109.194
2025-08-07 07:40:33,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [181.52887, 284.27383, 14.014952, 20.683973, 7.9532323, 215.69638, 10.091274, 219.20598, 233.41331, 248.00015]
2025-08-07 07:40:33,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [124.0, 196.0, 24.0, 45.0, 21.0, 121.0, 23.0, 249.0, 134.0, 153.0]
2025-08-07 07:40:33,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (143.49) for latency ExtremeClogL1U23
2025-08-07 07:40:33,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 24 minutes, 11 seconds)
2025-08-07 07:42:12,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:42:13,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 63.46868 ± 70.167
2025-08-07 07:42:13,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [133.86003, 9.553605, 193.64523, 40.791943, 13.345391, 40.350266, 6.3279743, 9.830928, 13.112606, 173.86874]
2025-08-07 07:42:13,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [86.0, 21.0, 118.0, 50.0, 23.0, 53.0, 25.0, 21.0, 23.0, 109.0]
2025-08-07 07:42:13,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 22 minutes, 19 seconds)
2025-08-07 07:43:53,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:43:54,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 127.83527 ± 117.333
2025-08-07 07:43:54,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [215.42159, 49.148586, 238.11757, 246.21706, 332.47333, 8.567372, 154.7546, 12.092568, 12.375389, 9.18459]
2025-08-07 07:43:54,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [177.0, 137.0, 133.0, 139.0, 176.0, 21.0, 92.0, 24.0, 23.0, 19.0]
2025-08-07 07:43:55,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 20 minutes, 32 seconds)
2025-08-07 07:45:35,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:45:37,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 230.72427 ± 125.037
2025-08-07 07:45:37,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [195.43262, 331.096, 205.15134, 60.894733, 178.06317, 372.79514, 425.22925, 300.59283, 230.54506, 7.442346]
2025-08-07 07:45:37,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [213.0, 177.0, 158.0, 54.0, 119.0, 172.0, 257.0, 145.0, 116.0, 23.0]
2025-08-07 07:45:37,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (230.72) for latency ExtremeClogL1U23
2025-08-07 07:45:37,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 19 minutes, 42 seconds)
2025-08-07 07:47:15,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:47:16,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 181.83727 ± 177.535
2025-08-07 07:47:16,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [249.55623, 330.15292, 319.05124, 21.451696, 50.821747, 26.149042, 565.53107, 8.896876, 26.727032, 220.03479]
2025-08-07 07:47:16,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [144.0, 180.0, 176.0, 33.0, 65.0, 44.0, 311.0, 18.0, 39.0, 118.0]
2025-08-07 07:47:16,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 17 minutes, 45 seconds)
2025-08-07 07:48:57,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:48:58,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 119.02425 ± 124.426
2025-08-07 07:48:58,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [228.63637, 9.476258, 106.36659, 5.1631684, 357.559, 57.755867, 309.28052, 41.64828, 10.310916, 64.04571]
2025-08-07 07:48:58,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [127.0, 23.0, 136.0, 16.0, 230.0, 75.0, 156.0, 49.0, 22.0, 72.0]
2025-08-07 07:48:58,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 16 minutes, 24 seconds)
2025-08-07 07:50:37,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:50:38,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 104.99501 ± 127.970
2025-08-07 07:50:38,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [12.537469, 26.775131, 370.02954, 11.025301, 78.71546, 108.85525, 13.427718, 51.470543, 40.05907, 337.0546]
2025-08-07 07:50:38,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 39.0, 172.0, 21.0, 65.0, 132.0, 23.0, 63.0, 46.0, 206.0]
2025-08-07 07:50:38,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 14 minutes, 46 seconds)
2025-08-07 07:52:18,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:52:21,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 250.45459 ± 164.060
2025-08-07 07:52:21,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [161.43379, 6.9271517, 293.04965, 8.661618, 410.45667, 289.1374, 202.2858, 337.1771, 572.8401, 222.57672]
2025-08-07 07:52:21,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [273.0, 19.0, 192.0, 20.0, 253.0, 149.0, 129.0, 164.0, 400.0, 123.0]
2025-08-07 07:52:21,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (250.45) for latency ExtremeClogL1U23
2025-08-07 07:52:21,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 13 minutes, 18 seconds)
2025-08-07 07:54:00,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:54:01,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 98.82973 ± 128.153
2025-08-07 07:54:01,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [18.874067, 10.2194, 9.799927, 20.677876, 22.187098, 273.52966, 15.511465, 278.76517, 327.93536, 10.797175]
2025-08-07 07:54:01,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [40.0, 23.0, 21.0, 38.0, 37.0, 137.0, 41.0, 157.0, 194.0, 23.0]
2025-08-07 07:54:01,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 11 minutes, 13 seconds)
2025-08-07 07:55:41,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:55:42,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 147.39047 ± 164.647
2025-08-07 07:55:42,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [582.9637, 44.099777, 4.9878674, 7.290253, 67.25773, 135.79085, 147.99245, 206.64453, 37.0522, 239.82541]
2025-08-07 07:55:42,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [331.0, 47.0, 17.0, 18.0, 112.0, 165.0, 130.0, 115.0, 47.0, 128.0]
2025-08-07 07:55:42,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 9 minutes, 54 seconds)
2025-08-07 07:57:22,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:57:23,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 112.45805 ± 101.935
2025-08-07 07:57:23,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [357.59885, 22.762959, 109.10895, 7.2947106, 83.41544, 130.51245, 170.5446, 39.498432, 16.456966, 187.38712]
2025-08-07 07:57:23,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [174.0, 42.0, 124.0, 21.0, 104.0, 80.0, 131.0, 46.0, 27.0, 138.0]
2025-08-07 07:57:23,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 7 minutes, 54 seconds)
2025-08-07 07:59:04,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:59:06,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 134.93335 ± 145.977
2025-08-07 07:59:06,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [421.80853, 326.30548, 291.2739, 20.774502, 64.953415, 8.730576, 7.782946, 6.0775843, 120.477776, 81.14877]
2025-08-07 07:59:06,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [233.0, 195.0, 192.0, 42.0, 69.0, 22.0, 18.0, 19.0, 121.0, 109.0]
2025-08-07 07:59:06,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 6 minutes, 46 seconds)
2025-08-07 08:00:45,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:00:47,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 212.02449 ± 170.559
2025-08-07 08:00:47,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [101.13743, 111.84927, 43.53242, 197.82397, 319.9044, 376.9247, 547.7425, 9.349822, 52.21396, 359.7665]
2025-08-07 08:00:47,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [117.0, 127.0, 47.0, 107.0, 160.0, 209.0, 454.0, 21.0, 67.0, 192.0]
2025-08-07 08:00:47,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 4 minutes, 57 seconds)
2025-08-07 08:02:27,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:02:29,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 205.90234 ± 124.731
2025-08-07 08:02:29,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [311.6099, 371.16232, 118.2385, 96.64825, 274.88937, 187.6262, 410.17026, 103.58884, 7.4423876, 177.64737]
2025-08-07 08:02:29,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [178.0, 199.0, 168.0, 100.0, 146.0, 102.0, 218.0, 115.0, 17.0, 114.0]
2025-08-07 08:02:29,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 3 minutes, 31 seconds)
2025-08-07 08:04:07,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:04:09,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 193.26143 ± 147.532
2025-08-07 08:04:09,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [260.85995, 114.821945, 8.359492, 49.493717, 190.71756, 296.03748, 38.353745, 265.46124, 180.38737, 528.12177]
2025-08-07 08:04:09,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [139.0, 149.0, 21.0, 50.0, 144.0, 201.0, 47.0, 151.0, 127.0, 272.0]
2025-08-07 08:04:09,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 1 minute, 37 seconds)
2025-08-07 08:05:50,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:05:52,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 213.42160 ± 251.201
2025-08-07 08:05:52,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [27.606806, 5.6116505, 49.720837, 98.38564, 35.466087, 265.46515, 205.43721, 806.6383, 91.64936, 548.235]
2025-08-07 08:05:52,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [48.0, 17.0, 60.0, 169.0, 48.0, 169.0, 124.0, 419.0, 109.0, 303.0]
2025-08-07 08:05:52,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 26 seconds)
2025-08-07 08:07:31,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:07:34,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 276.52057 ± 208.361
2025-08-07 08:07:34,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [537.25464, 93.97556, 262.0392, 447.04807, 150.53188, 651.00214, 8.741721, 11.065879, 257.10373, 346.44293]
2025-08-07 08:07:34,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [285.0, 125.0, 143.0, 429.0, 140.0, 344.0, 20.0, 20.0, 154.0, 274.0]
2025-08-07 08:07:34,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (276.52) for latency ExtremeClogL1U23
2025-08-07 08:07:34,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 58 minutes, 38 seconds)
2025-08-07 08:09:13,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:09:15,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 239.28922 ± 140.628
2025-08-07 08:09:15,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [570.62427, 270.98492, 38.28406, 260.65408, 83.35178, 261.26294, 142.73738, 187.62685, 241.5336, 335.832]
2025-08-07 08:09:15,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [330.0, 152.0, 57.0, 137.0, 113.0, 190.0, 91.0, 105.0, 132.0, 178.0]
2025-08-07 08:09:15,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 56 minutes, 49 seconds)
2025-08-07 08:10:55,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:10:56,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 161.52432 ± 133.868
2025-08-07 08:10:56,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [287.7351, 295.93427, 248.82513, 3.8411243, 354.44492, 73.101776, 272.80664, 12.358868, 38.932095, 27.263227]
2025-08-07 08:10:56,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [141.0, 154.0, 144.0, 15.0, 228.0, 68.0, 154.0, 25.0, 46.0, 40.0]
2025-08-07 08:10:56,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 54 minutes, 56 seconds)
2025-08-07 08:12:37,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:12:38,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 177.98120 ± 109.652
2025-08-07 08:12:38,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [72.401436, 271.9191, 297.77505, 273.0882, 273.3721, 49.077003, 288.71274, 7.5066476, 79.50652, 166.45331]
2025-08-07 08:12:38,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [110.0, 173.0, 155.0, 158.0, 145.0, 55.0, 144.0, 19.0, 103.0, 106.0]
2025-08-07 08:12:38,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 53 minutes, 41 seconds)
2025-08-07 08:14:18,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:14:19,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 174.89449 ± 164.137
2025-08-07 08:14:19,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [13.958089, 194.4075, 10.790097, 9.824287, 520.52155, 131.21796, 404.27, 241.98059, 144.90833, 77.06643]
2025-08-07 08:14:19,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 144.0, 23.0, 27.0, 322.0, 170.0, 194.0, 164.0, 137.0, 71.0]
2025-08-07 08:14:19,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 51 minutes, 36 seconds)
2025-08-07 08:16:01,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:16:03,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 156.83568 ± 130.752
2025-08-07 08:16:03,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [287.8737, 335.2537, 121.110344, 13.792647, 187.41757, 330.4438, 248.28488, 8.850571, 26.177792, 9.151946]
2025-08-07 08:16:03,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [151.0, 179.0, 115.0, 25.0, 157.0, 211.0, 116.0, 20.0, 38.0, 23.0]
2025-08-07 08:16:03,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 50 minutes, 10 seconds)
2025-08-07 08:17:41,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:17:43,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 184.62154 ± 96.615
2025-08-07 08:17:43,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [7.569716, 131.57286, 339.8348, 136.04071, 296.64966, 234.05739, 138.8381, 218.7431, 87.4001, 255.50897]
2025-08-07 08:17:43,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 171.0, 216.0, 150.0, 142.0, 161.0, 125.0, 124.0, 112.0, 131.0]
2025-08-07 08:17:43,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 48 minutes, 19 seconds)
2025-08-07 08:19:22,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:19:24,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 280.29581 ± 160.146
2025-08-07 08:19:24,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [165.02493, 9.499042, 516.72925, 501.3925, 160.27394, 361.78873, 292.95743, 408.20334, 114.66327, 272.42545]
2025-08-07 08:19:24,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [105.0, 21.0, 256.0, 251.0, 161.0, 165.0, 140.0, 178.0, 130.0, 181.0]
2025-08-07 08:19:24,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (280.30) for latency ExtremeClogL1U23
2025-08-07 08:19:24,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 46 minutes, 41 seconds)
2025-08-07 08:21:05,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:21:07,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 232.17583 ± 190.789
2025-08-07 08:21:07,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [176.04526, 418.70923, 345.55032, 167.32112, 6.6580067, 637.2651, 47.898483, 106.226364, 61.317204, 354.76697]
2025-08-07 08:21:07,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [98.0, 245.0, 190.0, 104.0, 20.0, 351.0, 48.0, 65.0, 67.0, 186.0]
2025-08-07 08:21:07,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 45 minutes, 12 seconds)
2025-08-07 08:22:47,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:22:49,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 171.50992 ± 114.103
2025-08-07 08:22:49,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [303.0271, 209.13094, 242.5764, 160.82315, 13.298897, 13.27954, 170.79626, 49.97233, 177.10063, 375.094]
2025-08-07 08:22:49,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [168.0, 165.0, 141.0, 107.0, 25.0, 23.0, 103.0, 54.0, 98.0, 236.0]
2025-08-07 08:22:49,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 43 minutes, 35 seconds)
2025-08-07 08:24:29,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:24:31,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 190.62083 ± 142.335
2025-08-07 08:24:31,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3.125356, 16.028913, 306.96442, 321.18274, 145.33586, 167.87134, 46.566113, 293.54492, 149.01889, 456.5698]
2025-08-07 08:24:31,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 24.0, 160.0, 209.0, 134.0, 104.0, 60.0, 149.0, 164.0, 261.0]
2025-08-07 08:24:31,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 41 minutes, 36 seconds)
2025-08-07 08:26:11,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:26:13,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 235.47237 ± 273.877
2025-08-07 08:26:13,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [9.699467, 58.039474, 160.40033, 296.4318, 9.16942, 8.050135, 749.55286, 62.42926, 746.1067, 254.84412]
2025-08-07 08:26:13,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 55.0, 131.0, 173.0, 23.0, 21.0, 411.0, 72.0, 475.0, 142.0]
2025-08-07 08:26:13,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 40 minutes, 13 seconds)
2025-08-07 08:27:51,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:27:54,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 236.88469 ± 177.979
2025-08-07 08:27:54,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [222.81802, 135.2991, 104.20728, 522.82056, 10.582688, 594.671, 78.078545, 244.60426, 261.193, 194.57245]
2025-08-07 08:27:54,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [130.0, 129.0, 115.0, 392.0, 23.0, 415.0, 93.0, 196.0, 259.0, 169.0]
2025-08-07 08:27:54,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 38 minutes, 29 seconds)
2025-08-07 08:29:35,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:29:37,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 228.57022 ± 182.865
2025-08-07 08:29:37,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [505.39667, 378.7771, 50.31628, 475.47156, 340.59665, 128.64185, 299.47922, 82.39838, 11.567796, 13.0566]
2025-08-07 08:29:37,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [283.0, 184.0, 48.0, 245.0, 158.0, 164.0, 202.0, 90.0, 22.0, 23.0]
2025-08-07 08:29:37,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 36 minutes, 53 seconds)
2025-08-07 08:31:17,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:31:19,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 257.30899 ± 207.226
2025-08-07 08:31:19,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [8.042027, 219.83548, 8.387226, 417.38715, 113.715065, 450.6294, 543.8854, 536.6507, 262.45096, 12.106366]
2025-08-07 08:31:19,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 136.0, 22.0, 347.0, 124.0, 242.0, 247.0, 267.0, 193.0, 23.0]
2025-08-07 08:31:19,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 35 minutes, 17 seconds)
2025-08-07 08:32:59,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:33:01,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 202.25485 ± 122.301
2025-08-07 08:33:01,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [197.34744, 220.40775, 222.66724, 446.0212, 308.5754, 235.32288, 141.5909, 13.693436, 227.33582, 9.5866]
2025-08-07 08:33:01,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [210.0, 146.0, 180.0, 198.0, 181.0, 121.0, 114.0, 24.0, 125.0, 22.0]
2025-08-07 08:33:01,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 33 minutes, 35 seconds)
2025-08-07 08:34:41,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:34:43,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 248.02188 ± 181.747
2025-08-07 08:34:43,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [564.8207, 338.34875, 288.9862, 254.24323, 23.529324, 13.232994, 409.42932, 9.997232, 408.24152, 169.38953]
2025-08-07 08:34:43,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [332.0, 235.0, 169.0, 184.0, 39.0, 25.0, 276.0, 20.0, 216.0, 107.0]
2025-08-07 08:34:43,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 31 minutes, 54 seconds)
2025-08-07 08:36:22,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:36:25,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 320.55225 ± 386.780
2025-08-07 08:36:25,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1132.4536, 936.703, 9.24692, 348.85852, 13.428148, 80.10847, 10.65601, 242.85611, 11.629592, 419.58173]
2025-08-07 08:36:25,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [736.0, 442.0, 19.0, 215.0, 23.0, 125.0, 24.0, 173.0, 22.0, 202.0]
2025-08-07 08:36:25,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (320.55) for latency ExtremeClogL1U23
2025-08-07 08:36:25,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 30 minutes, 21 seconds)
2025-08-07 08:38:07,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:38:10,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 365.35144 ± 157.810
2025-08-07 08:38:10,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [11.90557, 383.70242, 573.9623, 234.76959, 381.44272, 507.13785, 392.5826, 365.54596, 260.64163, 541.8239]
2025-08-07 08:38:10,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 236.0, 295.0, 125.0, 191.0, 403.0, 231.0, 200.0, 143.0, 372.0]
2025-08-07 08:38:10,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (365.35) for latency ExtremeClogL1U23
2025-08-07 08:38:10,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 28 minutes, 54 seconds)
2025-08-07 08:39:49,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:39:53,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 510.32104 ± 338.995
2025-08-07 08:39:53,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [299.30206, 876.863, 577.4648, 742.52124, 653.3807, 1072.4326, 10.049269, 608.6276, 249.93857, 12.63043]
2025-08-07 08:39:53,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [145.0, 511.0, 301.0, 454.0, 340.0, 611.0, 23.0, 330.0, 119.0, 23.0]
2025-08-07 08:39:53,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (510.32) for latency ExtremeClogL1U23
2025-08-07 08:39:53,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 27 minutes, 16 seconds)
2025-08-07 08:41:34,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:41:36,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 362.59995 ± 257.694
2025-08-07 08:41:36,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [487.27875, 524.46967, 780.5113, 584.11646, 10.243665, 507.03564, 309.73398, 403.28522, 9.574896, 9.750265]
2025-08-07 08:41:36,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [245.0, 267.0, 363.0, 293.0, 22.0, 228.0, 141.0, 269.0, 21.0, 23.0]
2025-08-07 08:41:36,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 25 minutes, 49 seconds)
2025-08-07 08:43:18,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:43:20,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 307.96503 ± 212.116
2025-08-07 08:43:20,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [369.6238, 525.53876, 600.0889, 6.768246, 463.86328, 72.78519, 199.56282, 519.1495, 13.397346, 308.87256]
2025-08-07 08:43:20,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [156.0, 234.0, 295.0, 17.0, 202.0, 136.0, 106.0, 221.0, 25.0, 131.0]
2025-08-07 08:43:20,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 24 minutes, 28 seconds)
2025-08-07 08:44:59,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:45:01,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 401.54330 ± 140.120
2025-08-07 08:45:01,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [435.77197, 384.91125, 484.5238, 499.70865, 13.241403, 406.84064, 431.85297, 515.926, 341.4052, 501.25104]
2025-08-07 08:45:01,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [172.0, 160.0, 204.0, 198.0, 24.0, 163.0, 176.0, 212.0, 150.0, 213.0]
2025-08-07 08:45:01,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 22 minutes, 33 seconds)
2025-08-07 08:46:42,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:46:44,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 405.60477 ± 139.654
2025-08-07 08:46:44,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [395.54245, 514.9554, 489.6718, 420.5149, 339.28424, 437.16724, 508.5659, 15.681537, 471.10876, 463.5555]
2025-08-07 08:46:44,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [153.0, 245.0, 215.0, 161.0, 199.0, 191.0, 217.0, 24.0, 223.0, 199.0]
2025-08-07 08:46:44,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 20 minutes, 33 seconds)
2025-08-07 08:48:27,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:48:30,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 415.91425 ± 196.135
2025-08-07 08:48:30,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [452.2857, 12.896212, 399.72662, 668.0358, 507.53525, 601.0325, 508.28812, 443.752, 96.10979, 469.48047]
2025-08-07 08:48:30,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [199.0, 23.0, 149.0, 323.0, 227.0, 277.0, 235.0, 207.0, 198.0, 196.0]
2025-08-07 08:48:30,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 19 minutes, 19 seconds)
2025-08-07 08:50:07,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:50:09,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 330.65741 ± 204.374
2025-08-07 08:50:09,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [387.6698, 7.3749666, 10.8143, 505.87717, 391.58743, 512.2751, 598.21625, 454.48166, 338.3748, 99.90283]
2025-08-07 08:50:09,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [197.0, 19.0, 24.0, 220.0, 164.0, 249.0, 329.0, 191.0, 142.0, 63.0]
2025-08-07 08:50:09,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 17 minutes)
2025-08-07 08:51:49,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:51:51,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 325.41339 ± 198.869
2025-08-07 08:51:51,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [385.16498, 468.07767, 10.729444, 9.944829, 109.294655, 440.39523, 376.85547, 507.14078, 603.0714, 343.45963]
2025-08-07 08:51:51,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [185.0, 205.0, 22.0, 23.0, 68.0, 208.0, 151.0, 239.0, 327.0, 135.0]
2025-08-07 08:51:51,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 14 minutes, 55 seconds)
2025-08-07 08:53:33,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:53:34,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 246.26169 ± 226.352
2025-08-07 08:53:34,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [488.32214, 3.812327, 513.96515, 8.580046, 465.0733, 11.531604, 97.666275, 345.69083, 520.1595, 7.815845]
2025-08-07 08:53:34,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [220.0, 17.0, 230.0, 20.0, 184.0, 23.0, 67.0, 147.0, 233.0, 19.0]
2025-08-07 08:53:34,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 13 minutes, 34 seconds)
2025-08-07 08:55:15,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:55:17,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 361.00613 ± 154.590
2025-08-07 08:55:17,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [399.48367, 538.72534, 151.3903, 494.74146, 409.16455, 424.50833, 469.60574, 395.45996, 10.624541, 316.35745]
2025-08-07 08:55:17,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [151.0, 254.0, 91.0, 246.0, 162.0, 167.0, 197.0, 158.0, 23.0, 135.0]
2025-08-07 08:55:17,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 11 minutes, 51 seconds)
2025-08-07 08:56:58,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:57:00,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 374.05954 ± 212.211
2025-08-07 08:57:00,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [7.790292, 11.968101, 724.0803, 431.12546, 478.60114, 439.8146, 446.2109, 248.68422, 442.43665, 509.8835]
2025-08-07 08:57:00,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 22.0, 294.0, 172.0, 202.0, 180.0, 184.0, 122.0, 190.0, 230.0]
2025-08-07 08:57:00,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 9 minutes, 38 seconds)
2025-08-07 08:58:39,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:58:41,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 303.45950 ± 177.741
2025-08-07 08:58:41,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [455.85406, 488.79886, 183.74078, 92.687874, 104.67462, 481.74664, 477.2583, 9.473481, 407.23453, 333.12576]
2025-08-07 08:58:41,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [183.0, 208.0, 101.0, 69.0, 69.0, 209.0, 205.0, 19.0, 160.0, 136.0]
2025-08-07 08:58:41,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 8 minutes, 13 seconds)
2025-08-07 09:00:20,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:00:22,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 405.34781 ± 135.507
2025-08-07 09:00:22,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [446.15503, 423.43008, 446.04797, 424.6037, 509.26157, 461.02982, 5.5213604, 423.74582, 446.86047, 466.82214]
2025-08-07 09:00:22,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [188.0, 166.0, 176.0, 177.0, 205.0, 194.0, 16.0, 162.0, 180.0, 195.0]
2025-08-07 09:00:22,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 6 minutes, 26 seconds)
2025-08-07 09:02:02,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:02:04,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 407.65186 ± 200.067
2025-08-07 09:02:04,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [537.9302, 446.27774, 376.66818, 424.18802, 642.8023, 103.21247, 414.19812, 674.9841, 8.326527, 447.93106]
2025-08-07 09:02:04,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [226.0, 175.0, 146.0, 163.0, 271.0, 65.0, 162.0, 258.0, 21.0, 175.0]
2025-08-07 09:02:04,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 4 minutes, 35 seconds)
2025-08-07 09:03:45,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:03:47,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 317.04013 ± 204.740
2025-08-07 09:03:47,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [11.883226, 490.70975, 431.51465, 367.1642, 444.01447, 8.257711, 514.4624, 9.0712595, 420.2756, 473.048]
2025-08-07 09:03:47,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 199.0, 167.0, 138.0, 172.0, 21.0, 208.0, 19.0, 165.0, 190.0]
2025-08-07 09:03:47,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 2 minutes, 48 seconds)
2025-08-07 09:05:27,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:05:28,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 315.61627 ± 159.138
2025-08-07 09:05:28,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [360.30762, 6.223726, 460.51645, 7.10149, 412.8342, 412.1456, 430.08502, 323.90573, 395.5759, 347.46725]
2025-08-07 09:05:28,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [141.0, 20.0, 182.0, 23.0, 162.0, 159.0, 161.0, 136.0, 153.0, 140.0]
2025-08-07 09:05:28,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 1 minute, 2 seconds)
2025-08-07 09:07:09,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:07:11,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 325.12799 ± 176.654
2025-08-07 09:07:11,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [6.9372177, 426.03607, 400.96698, 301.05643, 513.0669, 354.2776, 513.1364, 114.97454, 112.685196, 508.14255]
2025-08-07 09:07:11,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 157.0, 154.0, 134.0, 207.0, 132.0, 209.0, 72.0, 74.0, 212.0]
2025-08-07 09:07:11,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 59 minutes, 29 seconds)
2025-08-07 09:08:51,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:08:52,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 171.43338 ± 230.893
2025-08-07 09:08:52,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [109.21891, 4.058851, 6.7787185, 7.4036493, 566.76666, 486.77524, 9.791826, 12.721048, 506.4711, 4.348002]
2025-08-07 09:08:52,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [70.0, 24.0, 21.0, 19.0, 253.0, 188.0, 25.0, 25.0, 204.0, 22.0]
2025-08-07 09:08:52,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 57 minutes, 47 seconds)
2025-08-07 09:10:32,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:10:34,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 226.81966 ± 202.333
2025-08-07 09:10:34,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [5.3991017, 401.9154, 417.00797, 9.744745, 119.60636, 8.53473, 486.24072, 351.68195, 5.7009935, 462.36438]
2025-08-07 09:10:34,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 148.0, 160.0, 21.0, 77.0, 20.0, 198.0, 139.0, 17.0, 191.0]
2025-08-07 09:10:34,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 56 minutes, 1 second)
2025-08-07 09:12:14,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:12:16,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 444.18427 ± 94.650
2025-08-07 09:12:16,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [548.20416, 388.23804, 365.53247, 399.25552, 581.87335, 422.46768, 615.097, 393.59924, 323.13672, 404.43826]
2025-08-07 09:12:16,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [225.0, 147.0, 145.0, 147.0, 240.0, 161.0, 244.0, 150.0, 235.0, 158.0]
2025-08-07 09:12:17,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 54 minutes, 23 seconds)
2025-08-07 09:13:57,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:14:00,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 413.55557 ± 397.951
2025-08-07 09:14:00,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [7.292171, 514.8676, 630.6993, 3.594478, 310.05185, 1204.7772, 979.46484, 312.41293, 162.22157, 10.173318]
2025-08-07 09:14:00,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 213.0, 241.0, 15.0, 131.0, 441.0, 395.0, 130.0, 81.0, 21.0]
2025-08-07 09:14:00,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 52 minutes, 48 seconds)
2025-08-07 09:15:39,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:15:41,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 327.72192 ± 162.619
2025-08-07 09:15:41,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [108.1219, 415.11557, 205.9786, 11.759107, 397.6345, 556.2503, 285.24484, 445.0167, 457.0577, 395.04007]
2025-08-07 09:15:41,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [67.0, 161.0, 157.0, 24.0, 149.0, 238.0, 128.0, 174.0, 173.0, 154.0]
2025-08-07 09:15:41,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 50 minutes, 57 seconds)
2025-08-07 09:17:22,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:17:25,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 435.11761 ± 195.714
2025-08-07 09:17:25,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [9.22097, 449.5459, 369.517, 642.36255, 820.4088, 406.10532, 393.6782, 420.0848, 455.52274, 384.7302]
2025-08-07 09:17:25,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 171.0, 154.0, 306.0, 307.0, 153.0, 154.0, 167.0, 176.0, 150.0]
2025-08-07 09:17:25,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 49 minutes, 32 seconds)
2025-08-07 09:19:03,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:19:05,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 285.58621 ± 205.550
2025-08-07 09:19:05,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [436.29895, 432.69562, 8.59059, 9.887498, 513.064, 408.38324, 366.22598, 522.62555, 151.78148, 6.309031]
2025-08-07 09:19:05,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [166.0, 166.0, 19.0, 22.0, 204.0, 159.0, 142.0, 209.0, 84.0, 16.0]
2025-08-07 09:19:05,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 47 minutes, 42 seconds)
2025-08-07 09:20:46,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:20:48,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 361.86395 ± 189.873
2025-08-07 09:20:48,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [500.89633, 540.562, 11.57811, 348.6824, 400.28894, 590.05505, 455.87872, 404.47354, 12.415293, 353.80908]
2025-08-07 09:20:48,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [197.0, 223.0, 22.0, 134.0, 158.0, 221.0, 176.0, 152.0, 25.0, 145.0]
2025-08-07 09:20:48,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 46 minutes)
2025-08-07 09:22:29,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:22:30,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 292.01498 ± 186.922
2025-08-07 09:22:30,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [450.24756, 475.75726, 453.1995, 377.72046, 10.393587, 11.615071, 394.11823, 372.6279, 13.371873, 361.09833]
2025-08-07 09:22:30,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [176.0, 201.0, 181.0, 153.0, 24.0, 20.0, 167.0, 150.0, 25.0, 147.0]
2025-08-07 09:22:30,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 44 minutes, 15 seconds)
2025-08-07 09:24:09,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:24:12,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 490.94345 ± 408.725
2025-08-07 09:24:12,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1068.628, 7.7937946, 12.460712, 438.66833, 1340.207, 117.19318, 454.03568, 564.509, 429.11987, 476.81943]
2025-08-07 09:24:12,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [423.0, 21.0, 25.0, 173.0, 503.0, 75.0, 180.0, 231.0, 165.0, 187.0]
2025-08-07 09:24:12,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 42 minutes, 37 seconds)
2025-08-07 09:25:52,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:25:54,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 264.90643 ± 215.509
2025-08-07 09:25:54,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [354.34543, 5.535376, 360.01218, 465.82248, 401.47415, 511.1016, 9.381097, 520.8521, 7.6343966, 12.905667]
2025-08-07 09:25:54,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [142.0, 17.0, 144.0, 178.0, 160.0, 208.0, 21.0, 204.0, 18.0, 24.0]
2025-08-07 09:25:54,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 40 minutes, 43 seconds)
2025-08-07 09:27:33,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:27:36,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 415.14111 ± 239.438
2025-08-07 09:27:36,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [772.2968, 571.80225, 620.01044, 11.744254, 376.1496, 320.38245, 526.4299, 577.44244, 9.106796, 366.04626]
2025-08-07 09:27:36,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [300.0, 239.0, 244.0, 22.0, 157.0, 135.0, 216.0, 247.0, 22.0, 147.0]
2025-08-07 09:27:36,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 39 minutes, 10 seconds)
2025-08-07 09:29:16,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:29:18,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 427.42993 ± 158.419
2025-08-07 09:29:18,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [425.11533, 648.6804, 423.43338, 402.41327, 429.40625, 440.87527, 474.59943, 5.1475697, 568.10046, 456.52762]
2025-08-07 09:29:18,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [171.0, 259.0, 166.0, 164.0, 168.0, 169.0, 198.0, 15.0, 220.0, 178.0]
2025-08-07 09:29:18,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 37 minutes, 25 seconds)
2025-08-07 09:30:59,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:31:01,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 320.38788 ± 229.837
2025-08-07 09:31:01,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [313.01666, 622.1674, 412.1212, 11.227493, 453.24738, 9.381206, 9.880897, 380.92743, 327.0967, 664.8122]
2025-08-07 09:31:01,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [141.0, 270.0, 157.0, 25.0, 176.0, 21.0, 23.0, 170.0, 138.0, 276.0]
2025-08-07 09:31:01,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 35 minutes, 43 seconds)
2025-08-07 09:32:42,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:32:43,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 296.47501 ± 179.817
2025-08-07 09:32:43,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [456.28952, 6.833252, 243.95699, 386.63998, 529.2988, 132.82764, 478.90488, 371.23022, 346.3345, 12.434112]
2025-08-07 09:32:43,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [175.0, 20.0, 104.0, 148.0, 228.0, 86.0, 190.0, 144.0, 137.0, 24.0]
2025-08-07 09:32:43,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 34 minutes, 5 seconds)
2025-08-07 09:34:22,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:34:24,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 371.75250 ± 245.939
2025-08-07 09:34:24,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [455.5775, 639.4083, 9.037629, 461.10242, 639.3817, 9.458613, 537.39294, 481.5921, 477.59506, 6.97907]
2025-08-07 09:34:24,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [172.0, 260.0, 22.0, 180.0, 248.0, 21.0, 213.0, 189.0, 182.0, 23.0]
2025-08-07 09:34:24,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 32 minutes, 20 seconds)
2025-08-07 09:36:04,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:36:06,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 284.86047 ± 202.695
2025-08-07 09:36:06,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [498.6484, 345.21832, 14.552367, 7.4173164, 9.239433, 269.53888, 289.98807, 363.65753, 607.41656, 442.92798]
2025-08-07 09:36:06,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [197.0, 139.0, 24.0, 20.0, 22.0, 121.0, 132.0, 146.0, 248.0, 167.0]
2025-08-07 09:36:06,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 30 minutes, 36 seconds)
2025-08-07 09:37:47,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:37:48,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 238.74052 ± 252.584
2025-08-07 09:37:48,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [8.335788, 448.5054, 346.7788, 9.1446295, 723.42487, 289.14944, 14.645392, 528.2289, 6.9806767, 12.211238]
2025-08-07 09:37:48,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 175.0, 148.0, 22.0, 389.0, 132.0, 25.0, 209.0, 17.0, 22.0]
2025-08-07 09:37:48,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 28 minutes, 54 seconds)
2025-08-07 09:39:29,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:39:31,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 271.20383 ± 196.709
2025-08-07 09:39:31,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [11.677829, 148.43167, 474.18015, 346.36905, 435.16534, 15.503374, 322.44363, 546.14746, 404.42014, 7.6996627]
2025-08-07 09:39:31,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 263.0, 185.0, 139.0, 168.0, 25.0, 137.0, 213.0, 163.0, 19.0]
2025-08-07 09:39:31,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 27 minutes, 12 seconds)
2025-08-07 09:41:11,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:41:13,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 404.14471 ± 326.682
2025-08-07 09:41:13,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [4.7423215, 467.43967, 437.37128, 203.77637, 783.0412, 703.7958, 8.863251, 10.626391, 965.6305, 456.16064]
2025-08-07 09:41:13,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 176.0, 180.0, 98.0, 338.0, 269.0, 18.0, 24.0, 390.0, 173.0]
2025-08-07 09:41:13,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 29 seconds)
2025-08-07 09:42:53,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:42:55,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 327.42542 ± 217.750
2025-08-07 09:42:55,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [395.26855, 12.807621, 661.51105, 442.7762, 21.382856, 8.596725, 420.1856, 395.8991, 496.53134, 419.29514]
2025-08-07 09:42:55,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [152.0, 24.0, 248.0, 166.0, 41.0, 23.0, 158.0, 149.0, 196.0, 166.0]
2025-08-07 09:42:55,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 23 minutes, 48 seconds)
2025-08-07 09:44:35,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:44:38,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 472.21396 ± 398.478
2025-08-07 09:44:38,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [561.22406, 425.2356, 1537.6458, 471.26874, 463.58704, 410.246, 7.7749205, 7.2967362, 395.14795, 442.7127]
2025-08-07 09:44:38,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [221.0, 174.0, 594.0, 182.0, 177.0, 162.0, 21.0, 19.0, 157.0, 181.0]
2025-08-07 09:44:38,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 11 seconds)
2025-08-07 09:46:19,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:46:20,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 351.81683 ± 287.057
2025-08-07 09:46:20,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [10.111167, 430.8176, 425.34616, 9.148785, 464.04132, 1008.3092, 11.336253, 286.6547, 446.0592, 426.3441]
2025-08-07 09:46:20,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 163.0, 166.0, 20.0, 177.0, 366.0, 24.0, 134.0, 171.0, 166.0]
2025-08-07 09:46:20,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 29 seconds)
2025-08-07 09:48:00,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:48:03,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 423.00626 ± 260.071
2025-08-07 09:48:03,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [506.77588, 125.570465, 764.10345, 629.1155, 7.261171, 517.2099, 642.7204, 571.84436, 453.52344, 11.937766]
2025-08-07 09:48:03,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [213.0, 77.0, 401.0, 257.0, 23.0, 203.0, 262.0, 232.0, 175.0, 23.0]
2025-08-07 09:48:03,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 46 seconds)
2025-08-07 09:49:44,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:49:48,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 652.90234 ± 501.112
2025-08-07 09:49:48,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [928.87604, 1160.8002, 830.90173, 521.67645, 456.66312, 8.538388, 398.83804, 1733.7278, 12.040735, 476.96164]
2025-08-07 09:49:48,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [338.0, 426.0, 296.0, 203.0, 178.0, 20.0, 165.0, 660.0, 24.0, 180.0]
2025-08-07 09:49:48,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (652.90) for latency ExtremeClogL1U23
2025-08-07 09:49:48,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 8 seconds)
2025-08-07 09:51:27,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:51:29,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 465.19916 ± 333.822
2025-08-07 09:51:29,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [837.22144, 5.319771, 399.02298, 432.25415, 502.96857, 9.143833, 484.53903, 1198.6643, 405.0872, 377.77005]
2025-08-07 09:51:29,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [322.0, 16.0, 155.0, 157.0, 200.0, 23.0, 186.0, 477.0, 156.0, 147.0]
2025-08-07 09:51:29,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 26 seconds)
2025-08-07 09:53:12,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:53:14,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 607.35260 ± 180.609
2025-08-07 09:53:14,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [421.07654, 792.0629, 433.3746, 474.32025, 840.5319, 825.61743, 511.1823, 845.8492, 447.36032, 482.15042]
2025-08-07 09:53:14,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [166.0, 286.0, 171.0, 189.0, 294.0, 293.0, 192.0, 307.0, 169.0, 188.0]
2025-08-07 09:53:14,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 46 seconds)
2025-08-07 09:54:54,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:54:56,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 401.02887 ± 247.149
2025-08-07 09:54:56,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [325.5525, 456.81885, 728.3482, 694.65063, 252.89001, 10.1342325, 670.36914, 424.46072, 444.68823, 2.376177]
2025-08-07 09:54:56,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [144.0, 179.0, 268.0, 250.0, 124.0, 21.0, 259.0, 163.0, 172.0, 12.0]
2025-08-07 09:54:56,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 1 second)
2025-08-07 09:56:38,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:56:40,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 344.74344 ± 166.767
2025-08-07 09:56:40,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [11.889048, 431.71414, 425.6161, 426.7251, 452.0733, 444.57462, 13.203565, 426.21567, 391.53284, 423.88983]
2025-08-07 09:56:40,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 162.0, 165.0, 160.0, 171.0, 163.0, 22.0, 160.0, 144.0, 166.0]
2025-08-07 09:56:40,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 20 seconds)
2025-08-07 09:58:19,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:58:21,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 391.11591 ± 160.448
2025-08-07 09:58:21,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [449.1599, 454.79333, 363.17117, 5.390114, 396.20123, 448.7291, 635.9196, 459.46863, 469.53506, 228.79095]
2025-08-07 09:58:21,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [173.0, 176.0, 138.0, 21.0, 155.0, 170.0, 237.0, 175.0, 178.0, 112.0]
2025-08-07 09:58:21,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 33 seconds)
2025-08-07 10:00:01,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:00:02,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 257.92017 ± 206.198
2025-08-07 10:00:02,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [457.99234, 7.96876, 8.741845, 10.943136, 492.3097, 10.166654, 361.29672, 433.2006, 363.32755, 433.25446]
2025-08-07 10:00:02,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [175.0, 18.0, 18.0, 22.0, 197.0, 21.0, 142.0, 167.0, 142.0, 171.0]
2025-08-07 10:00:02,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 50 seconds)
2025-08-07 10:01:42,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:01:44,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 441.93521 ± 404.941
2025-08-07 10:01:44,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [9.834398, 14.965869, 376.9972, 9.925559, 772.42737, 394.25317, 649.06775, 355.00897, 436.85217, 1400.0194]
2025-08-07 10:01:44,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 25.0, 154.0, 22.0, 256.0, 157.0, 235.0, 154.0, 174.0, 474.0]
2025-08-07 10:01:44,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 5 seconds)
2025-08-07 10:03:25,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:03:28,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 446.51425 ± 292.389
2025-08-07 10:03:28,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [620.7476, 388.81442, 419.9509, 465.43747, 10.197859, 12.721726, 693.9009, 1005.07275, 607.75806, 240.54112]
2025-08-07 10:03:28,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [222.0, 158.0, 166.0, 181.0, 23.0, 25.0, 409.0, 359.0, 218.0, 122.0]
2025-08-07 10:03:28,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 24 seconds)
2025-08-07 10:05:09,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:05:11,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 415.35645 ± 254.428
2025-08-07 10:05:11,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [507.09552, 233.31006, 413.66132, 10.43878, 761.7387, 5.6218286, 645.9483, 459.11078, 388.03857, 728.60034]
2025-08-07 10:05:11,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [186.0, 107.0, 161.0, 22.0, 262.0, 24.0, 253.0, 179.0, 154.0, 246.0]
2025-08-07 10:05:11,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 42 seconds)
2025-08-07 10:06:51,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:06:53,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 296.35855 ± 195.032
2025-08-07 10:06:53,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [12.437702, 422.8452, 449.34152, 273.89722, 447.45834, 387.13757, 431.54428, 10.198459, 12.793505, 515.93164]
2025-08-07 10:06:53,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 159.0, 170.0, 193.0, 167.0, 153.0, 164.0, 21.0, 23.0, 209.0]
2025-08-07 10:06:53,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1251 [DEBUG]: Training session finished
