2025-05-11 20:26:59,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2
2025-05-11 20:26:59,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2
2025-05-11 20:26:59,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x76bd18dcee80>}
2025-05-11 20:26:59,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1111 [DEBUG]: using device: cpu
2025-05-11 20:26:59,845 baseline-bpql-noisy-walker2d:77 [WARNING]: args.assumed_delay != args.horizon: 2 != 24
2025-05-11 20:26:59,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1133 [INFO]: Creating new trainer
2025-05-11 20:26:59,854 baseline-bpql-noisy-walker2d:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=29, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-11 20:26:59,854 baseline-bpql-noisy-walker2d:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-11 20:27:00,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1194 [DEBUG]: Starting training session...
2025-05-11 20:27:00,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 1/100
2025-05-11 20:29:29,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:29:30,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 0.44264 ± 115.502
2025-05-11 20:29:30,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [24.580362, 122.12501, 67.80917, -301.9449, -78.038666, 104.98982, -25.7725, 49.04979, 43.977936, -2.349597]
2025-05-11 20:29:30,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [44.0, 68.0, 237.0, 239.0, 120.0, 59.0, 98.0, 179.0, 137.0, 39.0]
2025-05-11 20:29:30,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (0.44) for latency ExtremeClogL1U23
2025-05-11 20:29:30,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 20:29:30,650 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 20:29:30,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 8 minutes, 25 seconds)
2025-05-11 20:32:16,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:32:18,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 87.71452 ± 75.108
2025-05-11 20:32:18,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [54.28205, 54.640007, -5.352442, 84.89517, 104.13396, 213.72447, 7.629542, 60.541172, 67.2486, 235.40263]
2025-05-11 20:32:18,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [198.0, 66.0, 147.0, 111.0, 136.0, 228.0, 182.0, 161.0, 225.0, 178.0]
2025-05-11 20:32:18,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (87.71) for latency ExtremeClogL1U23
2025-05-11 20:32:18,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 20:32:18,504 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 20:32:18,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 20 minutes, 2 seconds)
2025-05-11 20:35:06,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:35:08,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 114.28600 ± 94.622
2025-05-11 20:35:08,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [173.21127, 79.52522, 1.9548305, 193.41359, -5.3500633, 212.31088, 12.418391, 46.6543, 154.90594, 273.81564]
2025-05-11 20:35:08,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [216.0, 145.0, 179.0, 163.0, 232.0, 260.0, 240.0, 188.0, 107.0, 248.0]
2025-05-11 20:35:08,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (114.29) for latency ExtremeClogL1U23
2025-05-11 20:35:08,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 20:35:08,808 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 20:35:08,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 23 minutes, 21 seconds)
2025-05-11 20:37:53,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:37:55,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 191.39719 ± 88.944
2025-05-11 20:37:55,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [155.85736, 118.450874, 221.83014, 360.9319, 205.56444, 124.88379, 301.12744, 155.4095, 231.20387, 38.712612]
2025-05-11 20:37:55,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [225.0, 107.0, 159.0, 192.0, 227.0, 89.0, 168.0, 99.0, 205.0, 233.0]
2025-05-11 20:37:55,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (191.40) for latency ExtremeClogL1U23
2025-05-11 20:37:55,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 20:37:55,550 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 20:37:55,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 22 minutes, 10 seconds)
2025-05-11 20:41:24,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:41:27,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 202.97847 ± 101.658
2025-05-11 20:41:27,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [86.16688, 233.71075, 341.97614, 188.40707, 103.434105, 97.24776, 364.84814, 217.06808, 94.316864, 302.60892]
2025-05-11 20:41:27,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [314.0, 185.0, 218.0, 165.0, 163.0, 240.0, 254.0, 147.0, 243.0, 167.0]
2025-05-11 20:41:27,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (202.98) for latency ExtremeClogL1U23
2025-05-11 20:41:27,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 20:41:27,121 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 20:41:27,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 34 minutes, 33 seconds)
2025-05-11 20:44:58,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:45:00,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 160.19931 ± 115.727
2025-05-11 20:45:00,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [39.948612, 45.213192, 74.410286, 138.74126, 177.84233, 182.53223, 7.2028947, 335.34277, 254.39197, 346.36758]
2025-05-11 20:45:00,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [63.0, 92.0, 102.0, 110.0, 132.0, 155.0, 18.0, 209.0, 178.0, 262.0]
2025-05-11 20:45:00,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 51 minutes, 21 seconds)
2025-05-11 20:48:32,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:48:34,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 222.58342 ± 91.824
2025-05-11 20:48:34,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [235.21117, 275.41873, 399.65353, 185.63463, 283.50714, 158.9968, 277.5319, 174.71112, 35.332844, 199.83643]
2025-05-11 20:48:34,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [124.0, 154.0, 209.0, 97.0, 157.0, 105.0, 155.0, 106.0, 43.0, 117.0]
2025-05-11 20:48:34,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (222.58) for latency ExtremeClogL1U23
2025-05-11 20:48:34,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 20:48:34,323 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 20:48:34,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 2 minutes, 30 seconds)
2025-05-11 20:52:05,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:52:07,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 232.87549 ± 87.054
2025-05-11 20:52:07,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [12.559245, 222.3681, 214.31659, 288.8575, 276.5614, 331.4493, 337.3894, 220.90466, 219.47882, 204.86995]
2025-05-11 20:52:07,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 115.0, 130.0, 166.0, 158.0, 233.0, 179.0, 144.0, 124.0, 113.0]
2025-05-11 20:52:07,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (232.88) for latency ExtremeClogL1U23
2025-05-11 20:52:07,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 20:52:07,320 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 20:52:07,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 12 minutes, 20 seconds)
2025-05-11 20:55:03,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:55:05,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 246.50957 ± 134.419
2025-05-11 20:55:05,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [401.72327, 302.22525, 104.25722, 287.56265, 439.87262, 136.77637, 124.06414, 327.35855, 13.957809, 327.298]
2025-05-11 20:55:05,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [272.0, 196.0, 104.0, 150.0, 298.0, 116.0, 78.0, 256.0, 24.0, 182.0]
2025-05-11 20:55:05,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (246.51) for latency ExtremeClogL1U23
2025-05-11 20:55:05,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 20:55:05,561 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 20:55:05,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 12 minutes, 26 seconds)
2025-05-11 20:58:06,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:58:08,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 188.69693 ± 102.127
2025-05-11 20:58:08,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [195.6721, 200.64514, 189.37587, 286.94336, 55.464848, 100.85368, 248.09198, 380.11447, 209.58508, 20.22276]
2025-05-11 20:58:08,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [126.0, 119.0, 135.0, 158.0, 52.0, 101.0, 147.0, 180.0, 134.0, 35.0]
2025-05-11 20:58:08,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 20 seconds)
2025-05-11 21:01:23,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:01:25,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 173.73163 ± 106.698
2025-05-11 21:01:25,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [225.48047, 240.65306, 296.06006, 60.011597, 20.305944, 292.56674, 246.97928, 242.90091, 97.064285, 15.29392]
2025-05-11 21:01:25,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [136.0, 136.0, 168.0, 68.0, 41.0, 209.0, 155.0, 148.0, 112.0, 26.0]
2025-05-11 21:01:25,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 52 minutes, 6 seconds)
2025-05-11 21:05:00,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:05:02,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 229.74661 ± 90.800
2025-05-11 21:05:02,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [225.70699, 205.92514, 308.59982, 413.42957, 152.83907, 63.728584, 156.40002, 268.2342, 233.44734, 269.1552]
2025-05-11 21:05:02,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [137.0, 131.0, 179.0, 230.0, 134.0, 64.0, 124.0, 162.0, 152.0, 167.0]
2025-05-11 21:05:02,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 49 minutes, 54 seconds)
2025-05-11 21:08:22,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:08:24,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 280.12375 ± 123.327
2025-05-11 21:08:24,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [289.35202, 262.11575, 180.12062, 395.6392, 424.30402, 216.2619, 465.80206, 280.68442, 269.2842, 17.673676]
2025-05-11 21:08:24,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [144.0, 154.0, 118.0, 222.0, 284.0, 128.0, 249.0, 154.0, 142.0, 30.0]
2025-05-11 21:08:24,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (280.12) for latency ExtremeClogL1U23
2025-05-11 21:08:24,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 21:08:24,265 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 21:08:24,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 43 minutes, 18 seconds)
2025-05-11 21:11:20,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:11:22,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 263.82910 ± 104.005
2025-05-11 21:11:22,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [226.36697, 278.35895, 337.81985, 288.29175, 76.5656, 356.07675, 125.43646, 238.18549, 456.22452, 254.96458]
2025-05-11 21:11:22,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [140.0, 158.0, 218.0, 171.0, 71.0, 188.0, 93.0, 142.0, 233.0, 173.0]
2025-05-11 21:11:22,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 39 minutes, 59 seconds)
2025-05-11 21:14:16,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:14:18,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 273.14725 ± 119.559
2025-05-11 21:14:18,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [267.2146, 428.6454, 176.44875, 112.51297, 373.32562, 487.46274, 306.47864, 215.14249, 124.8429, 239.39812]
2025-05-11 21:14:18,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [161.0, 226.0, 139.0, 97.0, 209.0, 272.0, 181.0, 197.0, 148.0, 140.0]
2025-05-11 21:14:18,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 34 minutes, 49 seconds)
2025-05-11 21:17:13,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:17:14,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 241.24838 ± 69.016
2025-05-11 21:17:14,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [276.69278, 249.91597, 398.91342, 189.21454, 219.70451, 211.16727, 142.89034, 171.46869, 262.6423, 289.87408]
2025-05-11 21:17:14,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [166.0, 141.0, 196.0, 114.0, 138.0, 128.0, 93.0, 131.0, 148.0, 158.0]
2025-05-11 21:17:14,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 25 minutes, 52 seconds)
2025-05-11 21:20:10,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:20:12,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 365.23633 ± 79.389
2025-05-11 21:20:12,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [329.23508, 235.54709, 336.70413, 388.2851, 382.8558, 387.08255, 376.1411, 313.1382, 339.89047, 563.4835]
2025-05-11 21:20:12,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [189.0, 135.0, 195.0, 205.0, 202.0, 205.0, 202.0, 210.0, 173.0, 299.0]
2025-05-11 21:20:12,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (365.24) for latency ExtremeClogL1U23
2025-05-11 21:20:12,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 21:20:12,903 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 21:20:12,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 11 minutes, 50 seconds)
2025-05-11 21:23:07,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:23:09,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 324.19714 ± 182.559
2025-05-11 21:23:09,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [443.79266, 153.33167, 27.398912, 464.3413, 346.3676, 462.82678, 561.95074, 15.229087, 424.29086, 342.44196]
2025-05-11 21:23:09,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [245.0, 84.0, 37.0, 232.0, 181.0, 221.0, 297.0, 28.0, 200.0, 186.0]
2025-05-11 21:23:09,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 1 minute, 54 seconds)
2025-05-11 21:26:07,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:26:09,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 334.16397 ± 48.171
2025-05-11 21:26:09,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [315.76474, 315.1253, 306.55286, 298.61993, 287.99295, 267.47772, 374.11255, 414.1915, 353.24188, 408.56024]
2025-05-11 21:26:09,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [165.0, 170.0, 168.0, 141.0, 164.0, 180.0, 217.0, 212.0, 188.0, 227.0]
2025-05-11 21:26:09,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 59 minutes, 31 seconds)
2025-05-11 21:29:09,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:29:11,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 286.49677 ± 140.852
2025-05-11 21:29:11,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [232.21463, 302.36356, 307.49915, 536.68494, 106.40479, -2.250647, 350.04837, 361.38748, 304.16428, 366.45154]
2025-05-11 21:29:11,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [142.0, 157.0, 174.0, 328.0, 96.0, 20.0, 175.0, 197.0, 173.0, 187.0]
2025-05-11 21:29:11,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 58 minutes, 17 seconds)
2025-05-11 21:32:41,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:32:44,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 275.17053 ± 139.623
2025-05-11 21:32:44,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [398.49918, 390.55576, 198.90443, 282.02844, 38.395596, 31.053417, 330.5996, 297.25693, 308.69693, 475.71475]
2025-05-11 21:32:44,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [218.0, 243.0, 121.0, 209.0, 68.0, 46.0, 208.0, 159.0, 214.0, 213.0]
2025-05-11 21:32:44,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 4 minutes, 47 seconds)
2025-05-11 21:36:03,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:36:05,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 310.45114 ± 87.801
2025-05-11 21:36:05,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [298.5343, 268.61215, 306.36093, 300.79553, 402.14114, 345.42307, 123.52245, 481.4788, 301.6376, 276.00537]
2025-05-11 21:36:05,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [165.0, 155.0, 158.0, 169.0, 201.0, 195.0, 93.0, 274.0, 178.0, 164.0]
2025-05-11 21:36:05,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 7 minutes, 40 seconds)
2025-05-11 21:38:54,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:38:56,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 260.39774 ± 198.103
2025-05-11 21:38:56,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [218.91325, 539.89886, 117.91275, 336.75357, 2.743839, 225.40477, 464.97433, 70.03053, 573.5737, 53.77142]
2025-05-11 21:38:56,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [159.0, 288.0, 104.0, 201.0, 22.0, 142.0, 236.0, 78.0, 248.0, 75.0]
2025-05-11 21:38:56,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 3 minutes)
2025-05-11 21:41:47,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:41:49,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 284.94058 ± 147.555
2025-05-11 21:41:49,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [341.67056, 8.376217, 24.915035, 239.24133, 249.6289, 397.33954, 395.97702, 447.23627, 387.1645, 357.85654]
2025-05-11 21:41:49,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [185.0, 20.0, 45.0, 125.0, 212.0, 209.0, 214.0, 220.0, 214.0, 200.0]
2025-05-11 21:41:49,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 58 minutes, 7 seconds)
2025-05-11 21:44:40,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:44:42,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 321.25781 ± 74.731
2025-05-11 21:44:42,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [290.59625, 156.53705, 401.2044, 316.5343, 408.332, 251.07352, 364.7359, 286.13632, 343.97183, 393.45642]
2025-05-11 21:44:42,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [176.0, 77.0, 216.0, 182.0, 256.0, 146.0, 191.0, 159.0, 185.0, 205.0]
2025-05-11 21:44:42,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 52 minutes, 44 seconds)
2025-05-11 21:47:32,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:47:34,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 326.99411 ± 115.342
2025-05-11 21:47:34,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [383.5341, 308.34274, 508.41977, 363.52856, 232.35777, 358.72745, 310.98892, 451.95428, 285.21533, 66.87226]
2025-05-11 21:47:34,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [225.0, 197.0, 281.0, 201.0, 190.0, 187.0, 214.0, 241.0, 188.0, 61.0]
2025-05-11 21:47:34,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 39 minutes, 41 seconds)
2025-05-11 21:50:25,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:50:27,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 358.40775 ± 235.578
2025-05-11 21:50:27,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [509.5541, 606.1476, 160.3556, -2.4194806, 607.77637, 655.318, 502.31647, 325.076, 163.50896, 56.443962]
2025-05-11 21:50:27,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [235.0, 304.0, 134.0, 11.0, 288.0, 310.0, 254.0, 182.0, 91.0, 72.0]
2025-05-11 21:50:27,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 29 minutes, 42 seconds)
2025-05-11 21:53:16,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:53:19,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 392.05142 ± 164.191
2025-05-11 21:53:19,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [234.10571, 551.5621, 508.97858, 26.10705, 493.32648, 485.2313, 515.1637, 402.9931, 482.1168, 220.92943]
2025-05-11 21:53:19,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [140.0, 280.0, 254.0, 39.0, 252.0, 239.0, 268.0, 226.0, 255.0, 132.0]
2025-05-11 21:53:19,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (392.05) for latency ExtremeClogL1U23
2025-05-11 21:53:19,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 21:53:19,140 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 21:53:19,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 27 minutes, 7 seconds)
2025-05-11 21:56:09,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:56:12,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 445.30615 ± 250.631
2025-05-11 21:56:12,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [138.83224, 522.012, 465.0133, 967.21234, 476.23456, 344.9007, 452.12415, 429.65582, 657.9127, -0.8365725]
2025-05-11 21:56:12,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [72.0, 260.0, 244.0, 433.0, 241.0, 159.0, 272.0, 225.0, 298.0, 12.0]
2025-05-11 21:56:12,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (445.31) for latency ExtremeClogL1U23
2025-05-11 21:56:12,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 21:56:12,237 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 21:56:12,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 24 minutes, 12 seconds)
2025-05-11 21:59:02,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:59:04,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 463.64517 ± 257.364
2025-05-11 21:59:04,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [571.76715, -3.403117, 537.4052, 257.0547, 43.018547, 505.92725, 732.77246, 713.15906, 571.35364, 707.397]
2025-05-11 21:59:04,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [279.0, 8.0, 262.0, 134.0, 56.0, 244.0, 343.0, 327.0, 275.0, 312.0]
2025-05-11 21:59:04,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (463.65) for latency ExtremeClogL1U23
2025-05-11 21:59:04,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 21:59:04,704 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 21:59:04,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 21 minutes, 7 seconds)
2025-05-11 22:02:00,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:02:03,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 427.70694 ± 302.753
2025-05-11 22:02:03,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [481.1209, 24.721474, 28.532366, 688.22906, 205.482, 190.34535, 538.2992, 524.93134, 549.7822, 1045.6254]
2025-05-11 22:02:03,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [262.0, 36.0, 51.0, 273.0, 112.0, 105.0, 282.0, 287.0, 293.0, 629.0]
2025-05-11 22:02:03,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 19 minutes, 47 seconds)
2025-05-11 22:04:54,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:04:57,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 547.34485 ± 189.577
2025-05-11 22:04:57,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [659.9602, 621.48193, 574.4424, 689.302, 660.46893, 646.30334, 540.32947, 8.684302, 475.8712, 596.6044]
2025-05-11 22:04:57,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [316.0, 295.0, 273.0, 332.0, 313.0, 312.0, 277.0, 31.0, 211.0, 304.0]
2025-05-11 22:04:57,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (547.34) for latency ExtremeClogL1U23
2025-05-11 22:04:57,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 22:04:57,583 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 22:04:57,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 17 minutes, 16 seconds)
2025-05-11 22:07:42,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:07:45,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 538.09222 ± 152.977
2025-05-11 22:07:45,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [965.5034, 533.44794, 501.91162, 498.50424, 329.08267, 521.18805, 519.4899, 503.2356, 486.60812, 521.95135]
2025-05-11 22:07:45,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [433.0, 261.0, 253.0, 249.0, 179.0, 258.0, 236.0, 265.0, 248.0, 259.0]
2025-05-11 22:07:45,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 13 minutes, 31 seconds)
2025-05-11 22:10:35,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:10:38,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 431.44473 ± 157.100
2025-05-11 22:10:38,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [21.401382, 564.0382, 522.6379, 488.888, 510.04498, 355.88007, 308.27148, 495.7368, 548.5611, 498.98746]
2025-05-11 22:10:38,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [34.0, 285.0, 258.0, 246.0, 230.0, 213.0, 179.0, 263.0, 268.0, 252.0]
2025-05-11 22:10:38,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 10 minutes, 33 seconds)
2025-05-11 22:13:31,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:13:34,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 601.91467 ± 150.876
2025-05-11 22:13:34,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [560.35297, 670.9624, 602.8164, 734.9862, 728.8832, 745.1191, 215.19876, 491.1552, 672.371, 597.30115]
2025-05-11 22:13:34,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [278.0, 302.0, 301.0, 316.0, 356.0, 321.0, 126.0, 226.0, 269.0, 300.0]
2025-05-11 22:13:34,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (601.91) for latency ExtremeClogL1U23
2025-05-11 22:13:34,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 22:13:34,728 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 22:13:34,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 8 minutes, 30 seconds)
2025-05-11 22:16:22,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:16:26,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 825.62628 ± 238.669
2025-05-11 22:16:26,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [619.2431, 894.333, 884.8685, 591.7255, 603.52325, 1372.5354, 678.02625, 941.9166, 634.83295, 1035.2578]
2025-05-11 22:16:26,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [273.0, 411.0, 423.0, 298.0, 312.0, 523.0, 359.0, 439.0, 300.0, 444.0]
2025-05-11 22:16:26,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (825.63) for latency ExtremeClogL1U23
2025-05-11 22:16:26,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 22:16:26,917 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 22:16:26,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 4 minutes, 11 seconds)
2025-05-11 22:19:17,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:19:20,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 605.51184 ± 147.125
2025-05-11 22:19:20,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [735.16223, 517.1394, 567.69183, 472.51834, 781.26794, 417.65308, 508.28934, 912.3207, 589.52075, 553.5551]
2025-05-11 22:19:20,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [347.0, 283.0, 267.0, 255.0, 351.0, 198.0, 245.0, 420.0, 290.0, 253.0]
2025-05-11 22:19:20,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 1 minute, 18 seconds)
2025-05-11 22:22:16,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:22:20,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 694.35193 ± 317.773
2025-05-11 22:22:20,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [162.43912, 814.7116, 1068.5734, 792.50555, 723.1417, 869.77435, 752.6585, 743.23114, 30.72918, 985.75476]
2025-05-11 22:22:20,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [130.0, 349.0, 428.0, 342.0, 325.0, 419.0, 308.0, 330.0, 55.0, 387.0]
2025-05-11 22:22:20,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 45 seconds)
2025-05-11 22:25:09,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:25:11,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 440.22656 ± 269.558
2025-05-11 22:25:11,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [606.9188, 453.43225, 90.527084, 33.672146, 504.35898, 656.17487, 794.727, 673.01373, 31.91947, 557.52155]
2025-05-11 22:25:11,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [272.0, 197.0, 139.0, 45.0, 224.0, 288.0, 377.0, 299.0, 43.0, 266.0]
2025-05-11 22:25:11,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 57 minutes, 30 seconds)
2025-05-11 22:28:03,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:28:07,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 906.54181 ± 678.066
2025-05-11 22:28:07,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1131.3303, 644.3228, 2686.1157, 1219.183, 399.876, 30.560661, 803.8623, 623.1995, 933.7818, 593.18506]
2025-05-11 22:28:07,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [456.0, 269.0, 1000.0, 460.0, 199.0, 50.0, 329.0, 271.0, 361.0, 267.0]
2025-05-11 22:28:07,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (906.54) for latency ExtremeClogL1U23
2025-05-11 22:28:07,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 22:28:07,372 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 22:28:07,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 54 minutes, 31 seconds)
2025-05-11 22:31:10,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:31:17,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1223.02258 ± 773.347
2025-05-11 22:31:17,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2620.1, 1053.5154, 507.66907, 2378.27, 817.87274, 18.574326, 1761.4774, 837.55835, 1315.3018, 919.8869]
2025-05-11 22:31:17,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [991.0, 472.0, 232.0, 1000.0, 341.0, 35.0, 772.0, 293.0, 458.0, 387.0]
2025-05-11 22:31:17,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (1223.02) for latency ExtremeClogL1U23
2025-05-11 22:31:17,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 22:31:17,523 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 22:31:17,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 55 minutes, 9 seconds)
2025-05-11 22:33:58,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:34:05,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1199.65222 ± 693.164
2025-05-11 22:34:05,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2355.5747, 614.33405, 1409.9703, 1468.3513, 659.76733, 963.75024, 1466.2781, 2226.5132, 808.8471, 23.136951]
2025-05-11 22:34:05,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [993.0, 316.0, 570.0, 626.0, 282.0, 368.0, 599.0, 1000.0, 347.0, 56.0]
2025-05-11 22:34:05,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 51 minutes, 1 second)
2025-05-11 22:37:05,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:37:11,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1149.01831 ± 578.254
2025-05-11 22:37:11,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1027.6672, 1063.3811, 1438.2358, 1614.8098, 265.85846, 1053.4669, 852.1149, 1340.9095, 426.02747, 2407.7131]
2025-05-11 22:37:11,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [404.0, 480.0, 605.0, 601.0, 160.0, 416.0, 342.0, 573.0, 224.0, 1000.0]
2025-05-11 22:37:11,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 49 minutes, 24 seconds)
2025-05-11 22:40:05,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:40:08,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 802.39807 ± 395.329
2025-05-11 22:40:08,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [536.6573, 734.5497, 1688.0514, 597.4536, 32.541218, 845.1918, 985.65594, 977.7173, 780.0362, 846.1261]
2025-05-11 22:40:08,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [243.0, 298.0, 689.0, 305.0, 42.0, 317.0, 393.0, 425.0, 313.0, 351.0]
2025-05-11 22:40:08,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 47 minutes, 31 seconds)
2025-05-11 22:42:51,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:42:55,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 699.24500 ± 915.352
2025-05-11 22:42:55,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2277.1257, 1445.5162, 28.89418, 24.516079, 2381.6638, 29.194681, 508.8193, 22.354229, 241.25562, 33.11052]
2025-05-11 22:42:55,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 556.0, 50.0, 37.0, 983.0, 53.0, 239.0, 35.0, 130.0, 47.0]
2025-05-11 22:42:55,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 42 minutes, 54 seconds)
2025-05-11 22:45:46,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:45:50,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 797.60413 ± 446.518
2025-05-11 22:45:50,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [173.63428, 980.028, 688.1241, 507.3226, 578.79565, 1334.551, 1770.7068, 869.2806, 377.7002, 695.898]
2025-05-11 22:45:50,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [98.0, 450.0, 294.0, 281.0, 273.0, 512.0, 674.0, 406.0, 200.0, 323.0]
2025-05-11 22:45:50,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 37 minutes, 12 seconds)
2025-05-11 22:48:40,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:48:46,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1129.68408 ± 703.990
2025-05-11 22:48:46,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [404.19162, 938.4493, 1790.5479, 2114.4038, 742.848, 157.72289, 158.01025, 1514.9535, 1911.0812, 1564.6321]
2025-05-11 22:48:46,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [210.0, 424.0, 704.0, 905.0, 315.0, 89.0, 146.0, 645.0, 724.0, 622.0]
2025-05-11 22:48:46,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 35 minutes, 36 seconds)
2025-05-11 22:51:43,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:51:46,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 737.64844 ± 340.105
2025-05-11 22:51:46,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [585.5279, 904.16486, 995.43536, 1010.09143, 541.68085, 214.44048, 459.46545, 813.3568, 426.74695, 1425.5741]
2025-05-11 22:51:46,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [264.0, 365.0, 327.0, 390.0, 264.0, 121.0, 226.0, 325.0, 220.0, 500.0]
2025-05-11 22:51:46,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 31 minutes, 36 seconds)
2025-05-11 22:54:30,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:54:35,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 995.39453 ± 641.846
2025-05-11 22:54:35,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [20.017962, 952.5144, 1611.274, 1604.5527, 386.70877, 1834.0398, 188.05505, 868.45746, 719.48236, 1768.843]
2025-05-11 22:54:35,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [32.0, 405.0, 585.0, 594.0, 188.0, 754.0, 108.0, 351.0, 287.0, 767.0]
2025-05-11 22:54:35,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 27 minutes, 24 seconds)
2025-05-11 22:57:30,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:57:36,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1084.66907 ± 588.363
2025-05-11 22:57:36,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1606.9327, 380.8821, 1060.9037, 1460.6284, 550.8242, 303.62866, 1904.5629, 821.1759, 769.35754, 1987.7941]
2025-05-11 22:57:36,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [588.0, 192.0, 402.0, 622.0, 253.0, 179.0, 676.0, 293.0, 341.0, 626.0]
2025-05-11 22:57:36,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 26 minutes, 42 seconds)
2025-05-11 23:00:27,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:00:32,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1080.71362 ± 839.383
2025-05-11 23:00:32,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1245.1788, 1651.521, 1947.557, 959.43445, 66.308784, 805.4563, 1040.1705, 253.73878, 21.100225, 2816.671]
2025-05-11 23:00:32,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [528.0, 556.0, 706.0, 382.0, 66.0, 287.0, 428.0, 143.0, 34.0, 957.0]
2025-05-11 23:00:32,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 24 minutes, 1 second)
2025-05-11 23:03:19,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:03:23,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 834.61346 ± 530.693
2025-05-11 23:03:23,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [753.33856, 768.46796, 924.5834, 187.59543, 135.09186, 1111.345, 2038.0829, 775.8073, 401.1875, 1250.6353]
2025-05-11 23:03:23,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [348.0, 324.0, 405.0, 104.0, 67.0, 438.0, 635.0, 325.0, 189.0, 466.0]
2025-05-11 23:03:23,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 20 minutes, 16 seconds)
2025-05-11 23:06:24,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:06:31,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1275.54688 ± 728.176
2025-05-11 23:06:31,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [929.5747, 997.18066, 1903.4391, 472.509, 493.26843, 1943.3784, 1018.0681, 448.1624, 1914.6005, 2635.2883]
2025-05-11 23:06:31,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [405.0, 386.0, 749.0, 224.0, 234.0, 758.0, 404.0, 212.0, 632.0, 1000.0]
2025-05-11 23:06:31,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (1275.55) for latency ExtremeClogL1U23
2025-05-11 23:06:31,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 23:06:31,519 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 23:06:31,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 18 minutes, 37 seconds)
2025-05-11 23:09:13,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:09:20,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1285.21997 ± 901.182
2025-05-11 23:09:20,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1721.3047, 801.7103, 734.57007, 791.8729, 1213.1206, 51.553623, 1905.2289, 223.68712, 2500.613, 2908.5388]
2025-05-11 23:09:20,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [650.0, 336.0, 292.0, 306.0, 469.0, 57.0, 653.0, 121.0, 803.0, 1000.0]
2025-05-11 23:09:20,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (1285.22) for latency ExtremeClogL1U23
2025-05-11 23:09:20,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 23:09:20,226 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 23:09:20,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 15 minutes, 35 seconds)
2025-05-11 23:12:18,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:12:22,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 885.42725 ± 1059.921
2025-05-11 23:12:22,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [0.5659053, 2831.949, 2495.7637, 234.94011, 2014.4961, 16.817236, 199.6479, 256.1513, 25.763563, 778.17804]
2025-05-11 23:12:22,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 916.0, 1000.0, 135.0, 664.0, 33.0, 125.0, 128.0, 50.0, 332.0]
2025-05-11 23:12:22,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 12 minutes, 59 seconds)
2025-05-11 23:15:04,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:15:09,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1018.55402 ± 747.790
2025-05-11 23:15:09,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [314.51022, 681.6778, 1378.002, 1899.4694, 21.4542, 2451.6255, 396.81427, 1209.8328, 375.02878, 1457.1257]
2025-05-11 23:15:09,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [171.0, 259.0, 556.0, 718.0, 31.0, 1000.0, 216.0, 461.0, 181.0, 598.0]
2025-05-11 23:15:09,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 8 minutes, 38 seconds)
2025-05-11 23:18:03,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:18:07,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 865.98029 ± 623.921
2025-05-11 23:18:07,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1178.4369, 442.32553, 2072.8723, 31.722702, 186.42456, 1477.3507, 1272.2743, 1033.3129, 747.49194, 217.5908]
2025-05-11 23:18:07,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [404.0, 207.0, 670.0, 51.0, 106.0, 564.0, 427.0, 380.0, 284.0, 116.0]
2025-05-11 23:18:07,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 6 minutes, 47 seconds)
2025-05-11 23:20:56,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:21:03,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1456.80334 ± 985.369
2025-05-11 23:21:03,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [19.05343, 630.50586, 2211.7297, 1247.4453, 2381.8696, 2607.9534, 2578.8389, 2160.3467, 482.64468, 247.64537]
2025-05-11 23:21:03,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [29.0, 282.0, 755.0, 538.0, 1000.0, 1000.0, 834.0, 842.0, 219.0, 139.0]
2025-05-11 23:21:03,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (1456.80) for latency ExtremeClogL1U23
2025-05-11 23:21:03,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 23:21:03,921 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 23:21:03,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 2 minutes, 8 seconds)
2025-05-11 23:24:02,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:24:07,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1005.92493 ± 806.029
2025-05-11 23:24:07,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1730.2382, 487.24255, 448.45496, 557.5534, 2943.836, 855.044, 481.58798, 203.33063, 1642.9738, 708.9884]
2025-05-11 23:24:07,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [595.0, 209.0, 233.0, 263.0, 1000.0, 313.0, 208.0, 98.0, 522.0, 294.0]
2025-05-11 23:24:07,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 1 minute, 15 seconds)
2025-05-11 23:27:01,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:27:05,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 760.44427 ± 692.961
2025-05-11 23:27:05,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [412.61093, 34.84208, 23.880234, 1941.9397, 19.057968, 1915.45, 570.22095, 790.96735, 1283.1726, 612.30115]
2025-05-11 23:27:05,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [189.0, 41.0, 35.0, 754.0, 29.0, 753.0, 264.0, 335.0, 553.0, 275.0]
2025-05-11 23:27:05,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 57 minutes, 45 seconds)
2025-05-11 23:29:53,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:29:59,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1234.66968 ± 766.630
2025-05-11 23:29:59,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [911.5923, 1037.9194, 1982.3116, 989.97144, 3048.6946, 432.89114, 801.46625, 726.02997, 602.13165, 1813.6885]
2025-05-11 23:29:59,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [336.0, 361.0, 798.0, 381.0, 911.0, 196.0, 349.0, 370.0, 255.0, 774.0]
2025-05-11 23:29:59,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 55 minutes, 37 seconds)
2025-05-11 23:32:49,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:32:56,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1516.95703 ± 979.038
2025-05-11 23:32:56,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [631.23236, 1883.2904, 1560.7621, 2332.4304, 331.47043, 3197.382, 442.8694, 1095.6732, 793.42395, 2901.037]
2025-05-11 23:32:56,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [261.0, 726.0, 637.0, 852.0, 153.0, 1000.0, 202.0, 389.0, 304.0, 1000.0]
2025-05-11 23:32:56,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (1516.96) for latency ExtremeClogL1U23
2025-05-11 23:32:56,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 23:32:56,958 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 23:32:56,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 52 minutes, 38 seconds)
2025-05-11 23:35:43,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:35:46,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 682.26135 ± 674.242
2025-05-11 23:35:46,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [821.05414, 142.28471, 16.73167, 307.60452, 2160.8618, 747.93164, 1641.6736, 7.7132044, 415.41864, 561.3398]
2025-05-11 23:35:46,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [282.0, 67.0, 28.0, 143.0, 715.0, 247.0, 571.0, 16.0, 195.0, 237.0]
2025-05-11 23:35:46,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 48 minutes, 49 seconds)
2025-05-11 23:38:42,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:38:50,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1546.50977 ± 911.109
2025-05-11 23:38:50,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2416.6055, 126.21901, 1360.0269, 1186.9956, 740.2151, 1797.7747, 2783.8103, 1582.3499, 501.45908, 2969.643]
2025-05-11 23:38:50,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [966.0, 68.0, 467.0, 472.0, 316.0, 760.0, 1000.0, 614.0, 177.0, 1000.0]
2025-05-11 23:38:50,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (1546.51) for latency ExtremeClogL1U23
2025-05-11 23:38:50,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 23:38:50,561 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 23:38:50,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 45 minutes, 58 seconds)
2025-05-11 23:41:47,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:41:54,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1259.22900 ± 723.365
2025-05-11 23:41:54,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1046.8723, 215.13115, 357.75137, 2377.404, 1083.2438, 2526.074, 843.22253, 1415.5984, 1662.8557, 1064.1368]
2025-05-11 23:41:54,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [431.0, 114.0, 160.0, 1000.0, 409.0, 1000.0, 310.0, 605.0, 656.0, 423.0]
2025-05-11 23:41:54,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 43 minutes, 36 seconds)
2025-05-11 23:44:46,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:44:54,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1544.23230 ± 1030.811
2025-05-11 23:44:54,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [357.3742, 1453.8643, 779.1797, 727.32336, 2862.764, 3.1538253, 2484.4575, 2532.4844, 2932.3857, 1309.336]
2025-05-11 23:44:54,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [178.0, 513.0, 446.0, 327.0, 1000.0, 24.0, 1000.0, 1000.0, 1000.0, 413.0]
2025-05-11 23:44:54,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 41 minutes, 26 seconds)
2025-05-11 23:47:45,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:47:54,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1757.14526 ± 922.473
2025-05-11 23:47:54,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [398.559, 1064.1163, 645.23157, 907.8977, 2924.549, 2118.1099, 2951.9065, 2705.2397, 2345.1416, 1510.7019]
2025-05-11 23:47:54,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [173.0, 437.0, 271.0, 356.0, 1000.0, 737.0, 1000.0, 992.0, 849.0, 584.0]
2025-05-11 23:47:54,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (1757.15) for latency ExtremeClogL1U23
2025-05-11 23:47:54,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 23:47:54,004 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 23:47:54,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 38 minutes, 40 seconds)
2025-05-11 23:50:50,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:50:54,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 993.34558 ± 968.726
2025-05-11 23:50:54,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1776.4653, 2887.327, 192.66301, 199.58932, 29.795956, 1245.3446, 2296.4653, 520.9632, 20.060564, 764.7826]
2025-05-11 23:50:54,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [566.0, 952.0, 102.0, 106.0, 39.0, 403.0, 883.0, 241.0, 34.0, 311.0]
2025-05-11 23:50:54,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 36 minutes, 55 seconds)
2025-05-11 23:53:31,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:53:39,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1508.01196 ± 921.180
2025-05-11 23:53:39,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1838.9725, 1461.3558, 1574.6033, 2562.6016, 752.3586, 17.051287, 418.5542, 2849.5, 995.27606, 2609.8464]
2025-05-11 23:53:39,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [686.0, 478.0, 557.0, 917.0, 334.0, 28.0, 187.0, 1000.0, 406.0, 1000.0]
2025-05-11 23:53:39,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 31 minutes, 50 seconds)
2025-05-11 23:56:45,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:56:53,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1628.35742 ± 800.551
2025-05-11 23:56:53,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2307.6418, 1242.1345, 2351.7368, 186.41573, 450.01398, 2520.9248, 1784.9579, 2104.3555, 2282.6733, 1052.7211]
2025-05-11 23:56:53,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [823.0, 585.0, 876.0, 119.0, 215.0, 957.0, 653.0, 777.0, 798.0, 345.0]
2025-05-11 23:56:53,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 29 minutes, 57 seconds)
2025-05-11 23:59:30,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:59:37,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1220.33264 ± 1023.325
2025-05-11 23:59:37,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [337.18872, 219.19286, 2522.2283, 2378.4563, 59.114285, 225.86766, 2416.4849, 242.7389, 1862.3955, 1939.6586]
2025-05-11 23:59:37,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [157.0, 114.0, 913.0, 917.0, 94.0, 115.0, 889.0, 127.0, 700.0, 678.0]
2025-05-11 23:59:37,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 25 minutes, 20 seconds)
2025-05-12 00:02:30,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:02:38,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1362.40430 ± 1050.566
2025-05-12 00:02:38,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2832.8372, 411.48074, 2343.0386, 593.16504, 2811.7576, 2491.1174, 437.92786, 599.62714, 932.9844, 170.10703]
2025-05-12 00:02:38,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 181.0, 1000.0, 295.0, 1000.0, 1000.0, 202.0, 285.0, 375.0, 86.0]
2025-05-12 00:02:38,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 22 minutes, 31 seconds)
2025-05-12 00:05:28,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:05:35,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1351.67578 ± 755.461
2025-05-12 00:05:35,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [804.4159, 1392.3776, 1148.4943, 873.1268, 2026.1864, 1400.7063, 241.06255, 555.76447, 2535.9824, 2538.641]
2025-05-12 00:05:35,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [345.0, 436.0, 419.0, 318.0, 756.0, 548.0, 124.0, 233.0, 1000.0, 923.0]
2025-05-12 00:05:35,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 19 minutes, 15 seconds)
2025-05-12 00:08:26,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:08:33,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1336.31470 ± 856.463
2025-05-12 00:08:33,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [608.2814, 869.4757, 1678.0748, 1718.7754, 2288.8428, 2148.6963, 84.67077, 1315.4182, 2588.3247, 62.585697]
2025-05-12 00:08:33,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [265.0, 369.0, 649.0, 581.0, 853.0, 838.0, 81.0, 457.0, 1000.0, 80.0]
2025-05-12 00:08:33,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 17 minutes, 27 seconds)
2025-05-12 00:11:25,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:11:32,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1156.66638 ± 874.488
2025-05-12 00:11:32,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1048.2726, 1490.1144, 2011.4799, 210.55836, 34.753323, 911.6055, 595.9436, 347.65988, 2124.0483, 2792.228]
2025-05-12 00:11:32,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [389.0, 995.0, 750.0, 109.0, 50.0, 372.0, 252.0, 198.0, 803.0, 1000.0]
2025-05-12 00:11:32,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 13 minutes, 12 seconds)
2025-05-12 00:14:33,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:14:40,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1354.48633 ± 797.403
2025-05-12 00:14:40,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [793.8433, 866.9677, 2097.4424, 2083.9587, 284.302, 1433.5729, 435.09348, 2831.2405, 837.335, 1881.1063]
2025-05-12 00:14:40,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [289.0, 343.0, 825.0, 749.0, 136.0, 468.0, 202.0, 1000.0, 330.0, 679.0]
2025-05-12 00:14:40,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 12 minutes, 14 seconds)
2025-05-12 00:17:33,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:17:38,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1131.64160 ± 523.757
2025-05-12 00:17:38,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1172.6501, 620.4587, 1282.0688, 1268.7682, 85.88713, 1086.1271, 2229.7363, 945.31415, 1176.642, 1448.7651]
2025-05-12 00:17:38,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [461.0, 277.0, 461.0, 501.0, 76.0, 439.0, 833.0, 394.0, 421.0, 486.0]
2025-05-12 00:17:38,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 9 minutes, 3 seconds)
2025-05-12 00:20:19,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:20:25,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1183.82092 ± 1106.733
2025-05-12 00:20:25,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2684.8264, 290.4356, 3008.548, 2625.3372, 210.05687, 993.8943, 3.1059732, 155.28282, 1271.078, 595.64343]
2025-05-12 00:20:25,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [916.0, 139.0, 1000.0, 1000.0, 139.0, 461.0, 29.0, 119.0, 511.0, 250.0]
2025-05-12 00:20:25,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 5 minutes, 15 seconds)
2025-05-12 00:23:15,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:23:20,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1077.54773 ± 871.470
2025-05-12 00:23:20,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2715.8584, 535.39233, 160.27617, 750.9642, 902.1578, 2763.9346, 1118.396, 310.27914, 778.0248, 740.1944]
2025-05-12 00:23:20,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 225.0, 83.0, 374.0, 378.0, 957.0, 403.0, 166.0, 334.0, 248.0]
2025-05-12 00:23:20,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 2 minutes, 7 seconds)
2025-05-12 00:26:24,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:26:32,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1654.54846 ± 893.715
2025-05-12 00:26:32,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [987.8314, 1660.0331, 393.13495, 2640.3635, 192.46751, 1650.7676, 2709.6235, 1694.3314, 1648.145, 2968.7852]
2025-05-12 00:26:32,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [310.0, 547.0, 157.0, 1000.0, 94.0, 707.0, 1000.0, 574.0, 631.0, 1000.0]
2025-05-12 00:26:32,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 2 seconds)
2025-05-12 00:29:26,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:29:33,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1135.04199 ± 580.098
2025-05-12 00:29:33,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [922.4804, 864.6761, 461.98203, 1211.8088, 2693.746, 1331.8789, 1242.4316, 952.69916, 617.5106, 1051.2059]
2025-05-12 00:29:33,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [413.0, 334.0, 200.0, 446.0, 1000.0, 511.0, 499.0, 400.0, 283.0, 418.0]
2025-05-12 00:29:33,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 56 minutes, 33 seconds)
2025-05-12 00:32:34,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:32:42,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1591.51428 ± 960.003
2025-05-12 00:32:42,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [662.98, 656.5669, 2367.175, 865.2209, 1492.2131, 3035.331, 2548.207, 880.6957, 517.92737, 2888.8257]
2025-05-12 00:32:42,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [275.0, 267.0, 902.0, 326.0, 542.0, 1000.0, 1000.0, 351.0, 196.0, 1000.0]
2025-05-12 00:32:42,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 54 minutes, 11 seconds)
2025-05-12 00:35:38,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:35:49,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2009.05310 ± 967.617
2025-05-12 00:35:49,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1778.404, 332.0177, 2688.3623, 1671.2365, 2917.3208, 426.5405, 3018.7117, 1583.1216, 2739.5845, 2935.232]
2025-05-12 00:35:49,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [649.0, 177.0, 1000.0, 632.0, 1000.0, 193.0, 1000.0, 575.0, 1000.0, 969.0]
2025-05-12 00:35:49,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (2009.05) for latency ExtremeClogL1U23
2025-05-12 00:35:49,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-12 00:35:49,007 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-12 00:35:49,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 52 minutes, 19 seconds)
2025-05-12 00:38:22,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:38:32,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1908.82544 ± 1015.068
2025-05-12 00:38:32,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1013.06085, 2836.805, 2588.926, 1400.4404, 2901.709, 160.39926, 2804.2666, 3002.2415, 1840.6146, 539.7912]
2025-05-12 00:38:32,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [383.0, 1000.0, 887.0, 554.0, 1000.0, 80.0, 1000.0, 1000.0, 661.0, 237.0]
2025-05-12 00:38:32,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 48 minutes, 36 seconds)
2025-05-12 00:41:28,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:41:37,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1810.52344 ± 919.980
2025-05-12 00:41:37,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2459.657, 2517.0786, 513.36115, 2703.7888, 952.76874, 3107.6096, 435.20465, 2438.9163, 1214.1678, 1762.6821]
2025-05-12 00:41:37,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [890.0, 1000.0, 229.0, 1000.0, 432.0, 1000.0, 210.0, 918.0, 468.0, 682.0]
2025-05-12 00:41:37,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 45 minutes, 15 seconds)
2025-05-12 00:44:22,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:44:28,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1126.44580 ± 950.103
2025-05-12 00:44:28,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [19.66976, 1210.9797, 769.7256, 2879.26, 2630.5366, 1742.3524, 421.44604, 148.1773, 428.69974, 1013.6094]
2025-05-12 00:44:28,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [32.0, 503.0, 334.0, 1000.0, 1000.0, 607.0, 187.0, 78.0, 212.0, 330.0]
2025-05-12 00:44:28,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 41 minutes, 45 seconds)
2025-05-12 00:47:09,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:47:14,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 952.03876 ± 1054.179
2025-05-12 00:47:14,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [23.702593, 17.420517, 789.8178, 1317.106, 162.13023, 836.4913, 368.67014, 188.16742, 2877.8052, 2939.0767]
2025-05-12 00:47:14,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [51.0, 174.0, 320.0, 380.0, 79.0, 381.0, 181.0, 104.0, 1000.0, 1000.0]
2025-05-12 00:47:14,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 37 minutes, 47 seconds)
2025-05-12 00:50:04,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:50:14,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2043.31714 ± 923.001
2025-05-12 00:50:14,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [3127.625, 2840.2336, 2482.816, 1355.6356, 1080.1489, 23.7388, 2658.7075, 2264.9253, 2787.934, 1811.406]
2025-05-12 00:50:14,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 919.0, 841.0, 477.0, 457.0, 44.0, 1000.0, 1000.0, 1000.0, 601.0]
2025-05-12 00:50:14,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (2043.32) for latency ExtremeClogL1U23
2025-05-12 00:50:14,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-12 00:50:14,281 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-12 00:50:14,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 34 minutes, 36 seconds)
2025-05-12 00:53:00,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:53:08,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1880.70764 ± 1119.815
2025-05-12 00:53:08,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1567.0854, 1215.0684, 81.33938, 1310.4814, 2883.008, 2750.9219, 3200.2463, 3154.241, 163.24004, 2481.4436]
2025-05-12 00:53:08,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [582.0, 459.0, 111.0, 549.0, 1000.0, 1000.0, 1000.0, 1000.0, 96.0, 856.0]
2025-05-12 00:53:08,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 32 minutes, 8 seconds)
2025-05-12 00:55:56,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:56:02,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1260.37134 ± 898.864
2025-05-12 00:56:02,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [677.8568, 2484.4094, 2840.4915, 2028.4702, 1522.9529, 1350.976, 259.48724, 627.615, 77.800095, 733.65497]
2025-05-12 00:56:02,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [291.0, 935.0, 1000.0, 837.0, 528.0, 560.0, 145.0, 208.0, 74.0, 325.0]
2025-05-12 00:56:02,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 28 minutes, 49 seconds)
2025-05-12 00:58:57,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:59:01,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 815.77991 ± 752.078
2025-05-12 00:59:01,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [832.3822, 613.8441, 1428.0597, 233.386, 250.17181, 2483.338, 785.8339, 3.0189307, 1516.7319, 11.032951]
2025-05-12 00:59:01,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [338.0, 265.0, 623.0, 118.0, 132.0, 935.0, 353.0, 24.0, 611.0, 22.0]
2025-05-12 00:59:01,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 26 minutes, 12 seconds)
2025-05-12 01:01:45,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 01:01:50,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 878.30371 ± 813.302
2025-05-12 01:01:50,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [669.2323, 948.4362, 2487.8284, 14.680545, 641.9405, 27.953663, 44.299854, 2132.43, 607.76227, 1208.4728]
2025-05-12 01:01:50,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [285.0, 394.0, 894.0, 29.0, 267.0, 39.0, 66.0, 755.0, 286.0, 435.0]
2025-05-12 01:01:50,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 23 minutes, 21 seconds)
2025-05-12 01:04:40,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 01:04:50,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2040.00464 ± 1045.144
2025-05-12 01:04:50,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [3570.021, 2759.9417, 2732.1865, 2740.558, 1357.9426, 1368.4445, 162.07869, 709.4052, 2978.3298, 2021.1384]
2025-05-12 01:04:50,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 885.0, 1000.0, 534.0, 463.0, 93.0, 306.0, 1000.0, 660.0]
2025-05-12 01:04:50,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 20 minutes, 26 seconds)
2025-05-12 01:07:32,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 01:07:40,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1644.07788 ± 1335.929
2025-05-12 01:07:40,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [3077.0234, 2943.952, 136.01341, 143.7173, 2568.0596, 3074.5366, 170.51901, 922.03723, 3112.9954, 291.9259]
2025-05-12 01:07:40,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 97.0, 98.0, 907.0, 1000.0, 115.0, 401.0, 1000.0, 161.0]
2025-05-12 01:07:40,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 17 minutes, 26 seconds)
2025-05-12 01:10:28,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 01:10:36,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1693.99780 ± 1108.655
2025-05-12 01:10:36,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1528.5011, 30.003387, 3003.8362, 1587.0515, 2812.0386, 2613.4268, 1086.4958, 8.735587, 1116.5201, 3153.3687]
2025-05-12 01:10:36,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [570.0, 32.0, 1000.0, 582.0, 961.0, 932.0, 418.0, 22.0, 467.0, 1000.0]
2025-05-12 01:10:36,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 14 minutes, 33 seconds)
2025-05-12 01:13:31,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 01:13:39,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1866.88794 ± 1022.978
2025-05-12 01:13:39,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [3079.97, 931.54333, 2185.5703, 1665.2921, 3279.577, 37.36581, 3111.9766, 1454.8103, 2001.9584, 920.81384]
2025-05-12 01:13:39,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [969.0, 354.0, 716.0, 630.0, 1000.0, 55.0, 1000.0, 529.0, 719.0, 340.0]
2025-05-12 01:13:39,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 11 minutes, 42 seconds)
2025-05-12 01:16:19,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 01:16:25,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1408.79028 ± 1157.551
2025-05-12 01:16:25,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [3044.6797, 522.44275, 1586.3202, 17.13082, 142.46173, 2942.0725, 1523.5132, 1417.8734, 2859.696, 31.711596]
2025-05-12 01:16:25,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [946.0, 222.0, 528.0, 22.0, 88.0, 931.0, 533.0, 488.0, 1000.0, 54.0]
2025-05-12 01:16:25,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 8 minutes, 45 seconds)
2025-05-12 01:19:47,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 01:19:57,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1733.96875 ± 864.002
2025-05-12 01:19:57,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2121.0344, 1868.0219, 1665.4746, 1596.1189, 3070.1255, 1918.0739, 908.4841, 240.89992, 3077.2542, 874.2013]
2025-05-12 01:19:57,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [784.0, 630.0, 594.0, 523.0, 1000.0, 631.0, 357.0, 144.0, 1000.0, 316.0]
2025-05-12 01:19:57,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 2 seconds)
2025-05-12 01:23:26,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 01:23:31,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1127.47571 ± 1056.782
2025-05-12 01:23:31,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1594.9298, 725.1962, 234.54901, 137.97319, 2708.3677, 1836.3411, 3009.0193, 986.59753, 32.913998, 8.86799]
2025-05-12 01:23:31,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [615.0, 274.0, 119.0, 84.0, 1000.0, 707.0, 1000.0, 404.0, 41.0, 17.0]
2025-05-12 01:23:31,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 10 seconds)
2025-05-12 01:26:17,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 01:26:25,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1643.40271 ± 743.506
2025-05-12 01:26:25,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [740.2266, 786.52985, 1191.584, 2713.3323, 1111.9126, 2350.0144, 1011.84753, 1766.2633, 1965.907, 2796.4092]
2025-05-12 01:26:25,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [326.0, 326.0, 463.0, 1000.0, 442.0, 1000.0, 407.0, 742.0, 1000.0, 1000.0]
2025-05-12 01:26:25,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1251 [DEBUG]: Training session finished
