2025-05-08 09:23:32,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mem4
2025-05-08 09:23:32,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mem4
2025-05-08 09:23:32,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1008 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x79b23d3cca90>}
2025-05-08 09:23:32,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1009 [DEBUG]: using device: cpu
2025-05-08 09:23:32,306 baseline-bpql-noisy-walker2d:77 [WARNING]: args.assumed_delay != args.horizon: 4 != 32
2025-05-08 09:23:32,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1031 [INFO]: Creating new trainer
2025-05-08 09:23:32,313 baseline-bpql-noisy-walker2d:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=41, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-08 09:23:32,313 baseline-bpql-noisy-walker2d:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-08 09:23:32,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1092 [DEBUG]: Starting training session...
2025-05-08 09:23:32,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 1/100
2025-05-08 09:26:01,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:26:02,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: -1.43886 ± 10.695
2025-05-08 09:26:02,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [-8.263147, -7.131551, -4.971614, -5.317722, -6.7626367, 27.807688, 8.077543, -4.711912, -6.552626, -6.562597]
2025-05-08 09:26:02,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [33.0, 27.0, 26.0, 24.0, 25.0, 48.0, 33.0, 42.0, 23.0, 24.0]
2025-05-08 09:26:02,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (-1.44) for latency ExtremeSparseL4U32
2025-05-08 09:26:02,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 09:26:02,175 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 09:26:02,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 6 minutes, 46 seconds)
2025-05-08 09:28:53,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:28:54,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 75.86250 ± 82.559
2025-05-08 09:28:54,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [68.457375, 29.819801, 27.78953, -65.637634, 137.8173, 251.06175, 61.88546, 30.158533, 60.233387, 157.03943]
2025-05-08 09:28:54,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [130.0, 43.0, 84.0, 154.0, 127.0, 241.0, 106.0, 84.0, 102.0, 153.0]
2025-05-08 09:28:54,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (75.86) for latency ExtremeSparseL4U32
2025-05-08 09:28:54,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 09:28:54,657 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 09:28:54,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 23 minutes)
2025-05-08 09:31:34,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:31:36,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 126.26310 ± 94.887
2025-05-08 09:31:36,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [270.42606, 280.71176, 1.12643, 128.15796, 158.1457, -15.320668, 122.8347, 171.27748, 93.6414, 51.63018]
2025-05-08 09:31:36,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [298.0, 321.0, 18.0, 148.0, 202.0, 170.0, 191.0, 172.0, 146.0, 70.0]
2025-05-08 09:31:36,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (126.26) for latency ExtremeSparseL4U32
2025-05-08 09:31:36,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 09:31:36,288 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 09:31:36,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 20 minutes, 38 seconds)
2025-05-08 09:34:16,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:34:17,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 138.29918 ± 139.263
2025-05-08 09:34:17,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [29.207378, 155.72214, 30.47653, 34.19127, 33.950817, 162.14203, 455.77686, 322.30377, 135.63013, 23.590998]
2025-05-08 09:34:17,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [40.0, 139.0, 54.0, 45.0, 42.0, 120.0, 321.0, 223.0, 148.0, 36.0]
2025-05-08 09:34:17,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (138.30) for latency ExtremeSparseL4U32
2025-05-08 09:34:17,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 09:34:17,630 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 09:34:17,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 18 minutes)
2025-05-08 09:36:59,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:37:01,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 212.74051 ± 165.004
2025-05-08 09:37:01,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [63.314342, 369.38675, 194.99855, 280.56696, 35.44764, 452.54163, 478.92548, 168.18973, 56.395016, 27.63895]
2025-05-08 09:37:01,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [114.0, 246.0, 155.0, 152.0, 98.0, 255.0, 340.0, 121.0, 119.0, 83.0]
2025-05-08 09:37:01,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (212.74) for latency ExtremeSparseL4U32
2025-05-08 09:37:01,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 09:37:01,021 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 09:37:01,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 15 minutes, 59 seconds)
2025-05-08 09:39:47,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:39:49,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 239.14023 ± 94.818
2025-05-08 09:39:49,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [281.97232, 62.684788, 252.17162, 318.80896, 142.69933, 153.19261, 248.60779, 420.81396, 247.58853, 262.86224]
2025-05-08 09:39:49,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [169.0, 85.0, 176.0, 232.0, 96.0, 287.0, 160.0, 275.0, 137.0, 145.0]
2025-05-08 09:39:49,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (239.14) for latency ExtremeSparseL4U32
2025-05-08 09:39:49,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 09:39:49,103 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 09:39:49,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 19 minutes, 6 seconds)
2025-05-08 09:42:33,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:42:34,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 211.70956 ± 66.328
2025-05-08 09:42:34,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [87.50615, 217.90732, 248.52051, 215.09969, 291.1212, 262.69742, 180.2955, 225.31648, 285.09055, 103.54087]
2025-05-08 09:42:34,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [69.0, 133.0, 140.0, 128.0, 183.0, 141.0, 132.0, 138.0, 170.0, 86.0]
2025-05-08 09:42:34,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 14 minutes, 14 seconds)
2025-05-08 09:45:20,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:45:21,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 199.43379 ± 161.052
2025-05-08 09:45:21,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [536.0105, 181.91435, 77.31231, 16.596575, 267.13776, 54.616978, 16.498917, 390.69965, 239.92136, 213.62952]
2025-05-08 09:45:21,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [268.0, 108.0, 71.0, 32.0, 193.0, 77.0, 28.0, 248.0, 172.0, 110.0]
2025-05-08 09:45:21,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 13 minutes, 10 seconds)
2025-05-08 09:48:05,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:48:07,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 185.27406 ± 92.611
2025-05-08 09:48:07,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [207.55319, 25.844528, 238.49442, 300.3298, 131.19574, 344.41623, 225.23631, 148.9858, 153.00713, 77.677536]
2025-05-08 09:48:07,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [141.0, 60.0, 143.0, 379.0, 129.0, 196.0, 189.0, 159.0, 143.0, 79.0]
2025-05-08 09:48:07,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 11 minutes, 36 seconds)
2025-05-08 09:50:50,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:50:51,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 105.36308 ± 111.086
2025-05-08 09:50:51,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [21.980509, 14.55892, 2.2085817, 19.750896, 314.97025, 252.40503, 44.15998, 64.573784, 241.37192, 77.650894]
2025-05-08 09:50:51,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [32.0, 28.0, 110.0, 26.0, 144.0, 160.0, 63.0, 80.0, 134.0, 115.0]
2025-05-08 09:50:51,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 9 minutes, 13 seconds)
2025-05-08 09:53:37,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:53:38,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 186.98209 ± 43.564
2025-05-08 09:53:38,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [177.34431, 234.54227, 207.39622, 147.576, 276.08008, 143.42447, 167.02637, 131.21199, 217.57526, 167.64377]
2025-05-08 09:53:38,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [106.0, 167.0, 126.0, 113.0, 158.0, 102.0, 117.0, 133.0, 137.0, 114.0]
2025-05-08 09:53:38,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 6 minutes, 3 seconds)
2025-05-08 09:56:27,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:56:29,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 192.91013 ± 129.082
2025-05-08 09:56:29,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [184.69904, -4.428455, 26.125614, 299.18323, 263.33353, 191.10147, 189.78642, 151.68701, 475.9728, 151.64072]
2025-05-08 09:56:29,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [132.0, 22.0, 38.0, 168.0, 154.0, 134.0, 110.0, 96.0, 247.0, 108.0]
2025-05-08 09:56:29,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 4 minutes, 45 seconds)
2025-05-08 09:59:18,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:59:20,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 273.20441 ± 97.236
2025-05-08 09:59:20,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [47.581593, 450.23636, 233.17372, 221.91594, 278.73587, 255.52644, 279.50687, 309.2299, 341.63345, 314.50388]
2025-05-08 09:59:20,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [53.0, 232.0, 120.0, 113.0, 217.0, 127.0, 134.0, 161.0, 163.0, 152.0]
2025-05-08 09:59:20,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (273.20) for latency ExtremeSparseL4U32
2025-05-08 09:59:20,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 09:59:20,134 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 09:59:20,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 3 minutes, 5 seconds)
2025-05-08 10:02:40,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:02:41,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 177.80795 ± 100.850
2025-05-08 10:02:41,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [229.51509, 91.335205, 326.51633, 197.18152, 204.88988, 227.23642, 197.13545, 280.3172, 10.035203, 13.917418]
2025-05-08 10:02:41,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [112.0, 74.0, 186.0, 112.0, 101.0, 106.0, 99.0, 144.0, 28.0, 29.0]
2025-05-08 10:02:41,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 10 minutes, 40 seconds)
2025-05-08 10:05:39,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:05:41,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 284.56125 ± 120.833
2025-05-08 10:05:41,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [203.79143, 312.61, 373.54648, 284.71436, 55.583305, 484.7809, 342.6202, 404.05176, 233.54178, 150.37233]
2025-05-08 10:05:41,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [112.0, 155.0, 167.0, 146.0, 55.0, 245.0, 162.0, 193.0, 190.0, 104.0]
2025-05-08 10:05:41,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (284.56) for latency ExtremeSparseL4U32
2025-05-08 10:05:41,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 10:05:41,189 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 10:05:41,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 12 minutes)
2025-05-08 10:08:27,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:08:28,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 238.57011 ± 145.101
2025-05-08 10:08:28,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [206.94934, 497.17285, 148.07426, 25.687548, 28.26557, 217.72661, 256.10526, 411.0891, 358.65024, 235.98045]
2025-05-08 10:08:28,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [100.0, 214.0, 113.0, 39.0, 53.0, 115.0, 118.0, 213.0, 149.0, 102.0]
2025-05-08 10:08:28,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 9 minutes, 12 seconds)
2025-05-08 10:11:11,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:11:14,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 357.04340 ± 62.281
2025-05-08 10:11:14,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [428.36057, 297.0742, 276.3792, 284.90347, 407.5131, 290.02554, 383.31445, 395.65845, 452.8017, 354.4033]
2025-05-08 10:11:14,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [223.0, 140.0, 148.0, 280.0, 182.0, 134.0, 274.0, 181.0, 283.0, 309.0]
2025-05-08 10:11:14,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (357.04) for latency ExtremeSparseL4U32
2025-05-08 10:11:14,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 10:11:14,311 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 10:11:14,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 4 minutes, 53 seconds)
2025-05-08 10:14:07,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:14:10,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 299.34744 ± 185.939
2025-05-08 10:14:10,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [357.74924, 646.01227, 40.9812, 173.31668, 351.657, 440.2905, 19.021927, 270.3048, 218.76161, 475.37915]
2025-05-08 10:14:10,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [157.0, 340.0, 97.0, 195.0, 258.0, 252.0, 27.0, 178.0, 158.0, 246.0]
2025-05-08 10:14:10,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 3 minutes, 19 seconds)
2025-05-08 10:17:30,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:17:32,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 330.39093 ± 162.433
2025-05-08 10:17:32,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [255.09149, 268.844, 256.55594, 20.70455, 449.03006, 437.74933, 226.59181, 667.7106, 383.3848, 338.2465]
2025-05-08 10:17:32,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [117.0, 156.0, 117.0, 26.0, 199.0, 223.0, 102.0, 390.0, 153.0, 147.0]
2025-05-08 10:17:32,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 35 seconds)
2025-05-08 10:20:13,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:20:15,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 331.44659 ± 158.119
2025-05-08 10:20:15,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [442.9626, 542.5295, 464.79352, 134.85889, 342.74045, 413.7927, 463.0654, 19.416233, 252.47446, 237.83258]
2025-05-08 10:20:15,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [199.0, 279.0, 207.0, 122.0, 141.0, 170.0, 280.0, 33.0, 122.0, 125.0]
2025-05-08 10:20:15,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 53 minutes, 8 seconds)
2025-05-08 10:23:33,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:23:35,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 252.19736 ± 231.574
2025-05-08 10:23:35,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [245.56654, 214.91544, 204.53827, 223.37653, 191.86084, 906.2327, 202.26913, 269.90976, 42.617523, 20.686943]
2025-05-08 10:23:35,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [116.0, 122.0, 110.0, 120.0, 115.0, 404.0, 102.0, 143.0, 67.0, 32.0]
2025-05-08 10:23:35,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 58 minutes, 44 seconds)
2025-05-08 10:27:04,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:27:06,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 388.59509 ± 106.432
2025-05-08 10:27:06,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [423.96136, 361.23248, 254.87392, 286.11334, 626.2142, 272.85294, 420.6129, 341.48846, 484.0945, 414.5068]
2025-05-08 10:27:06,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [187.0, 139.0, 127.0, 117.0, 244.0, 122.0, 229.0, 145.0, 187.0, 199.0]
2025-05-08 10:27:06,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (388.60) for latency ExtremeSparseL4U32
2025-05-08 10:27:06,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 10:27:06,658 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 10:27:06,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 7 minutes, 36 seconds)
2025-05-08 10:30:26,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:30:28,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 345.51639 ± 167.509
2025-05-08 10:30:28,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [240.28923, 408.463, 260.92667, 326.0115, 661.7132, 361.8081, 281.78058, 501.44308, -6.560978, 419.28964]
2025-05-08 10:30:28,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [114.0, 153.0, 125.0, 168.0, 317.0, 170.0, 126.0, 211.0, 19.0, 182.0]
2025-05-08 10:30:28,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 11 minutes, 10 seconds)
2025-05-08 10:33:56,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:34:00,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 467.70956 ± 244.952
2025-05-08 10:34:00,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [410.2779, 33.069687, 286.63693, 852.7243, 320.73648, 342.61514, 862.63556, 395.40558, 578.90015, 594.0942]
2025-05-08 10:34:00,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [236.0, 49.0, 148.0, 483.0, 287.0, 225.0, 332.0, 219.0, 311.0, 258.0]
2025-05-08 10:34:00,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (467.71) for latency ExtremeSparseL4U32
2025-05-08 10:34:00,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 10:34:00,595 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 10:34:00,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 10 minutes, 17 seconds)
2025-05-08 10:37:27,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:37:31,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 601.67780 ± 261.575
2025-05-08 10:37:31,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [733.36816, 608.4298, 414.48923, 1110.6102, 217.15793, 577.1743, 272.12146, 508.89767, 649.0315, 925.4977]
2025-05-08 10:37:31,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [270.0, 257.0, 174.0, 386.0, 254.0, 267.0, 123.0, 193.0, 267.0, 561.0]
2025-05-08 10:37:31,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (601.68) for latency ExtremeSparseL4U32
2025-05-08 10:37:31,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 10:37:31,369 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 10:37:31,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 18 minutes, 58 seconds)
2025-05-08 10:40:37,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:40:41,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 478.77051 ± 282.277
2025-05-08 10:40:41,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [692.6269, 799.545, 852.51825, 297.13742, 327.2766, 604.30396, 764.0889, 266.2105, 166.20975, 17.787828]
2025-05-08 10:40:41,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [287.0, 411.0, 327.0, 138.0, 197.0, 362.0, 373.0, 142.0, 143.0, 29.0]
2025-05-08 10:40:41,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 13 minutes, 8 seconds)
2025-05-08 10:44:03,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:44:09,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 773.35730 ± 383.307
2025-05-08 10:44:09,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [934.7972, 1001.6509, 74.10663, 963.139, 1100.3037, 409.49524, 231.3018, 1293.0355, 709.75714, 1015.9857]
2025-05-08 10:44:09,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [378.0, 319.0, 97.0, 492.0, 524.0, 187.0, 110.0, 734.0, 303.0, 443.0]
2025-05-08 10:44:09,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (773.36) for latency ExtremeSparseL4U32
2025-05-08 10:44:09,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 10:44:09,530 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 10:44:09,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 8 minutes, 53 seconds)
2025-05-08 10:47:39,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:47:42,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 527.70587 ± 220.062
2025-05-08 10:47:42,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [300.20374, 575.9126, 640.7184, 473.55157, 665.5059, 29.58833, 499.47275, 504.66235, 841.2664, 746.1761]
2025-05-08 10:47:42,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [129.0, 225.0, 269.0, 188.0, 250.0, 35.0, 247.0, 191.0, 362.0, 301.0]
2025-05-08 10:47:42,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 8 minutes, 2 seconds)
2025-05-08 10:51:09,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:51:11,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 357.50565 ± 142.727
2025-05-08 10:51:11,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [384.1481, 244.30475, 250.90324, 547.8728, 248.41312, 227.88078, 233.25174, 317.21738, 645.31775, 475.74677]
2025-05-08 10:51:11,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [146.0, 110.0, 107.0, 188.0, 108.0, 106.0, 104.0, 126.0, 302.0, 164.0]
2025-05-08 10:51:11,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 3 minutes, 56 seconds)
2025-05-08 10:54:34,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:54:37,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 616.44055 ± 334.627
2025-05-08 10:54:37,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [800.7404, 576.9371, 1308.7382, 942.6085, 370.97098, 290.1106, 249.11977, 787.1677, 630.8819, 207.13104]
2025-05-08 10:54:37,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [294.0, 217.0, 464.0, 298.0, 175.0, 134.0, 116.0, 267.0, 255.0, 99.0]
2025-05-08 10:54:37,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 59 minutes, 30 seconds)
2025-05-08 10:57:45,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:57:47,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 264.82526 ± 146.321
2025-05-08 10:57:47,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [320.8221, 194.86375, 204.5899, 221.12883, 125.40244, 674.6507, 187.98477, 187.04701, 290.1723, 241.59065]
2025-05-08 10:57:47,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [210.0, 103.0, 109.0, 112.0, 97.0, 256.0, 99.0, 111.0, 137.0, 117.0]
2025-05-08 10:57:47,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 55 minutes, 59 seconds)
2025-05-08 11:00:32,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:00:34,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 290.92389 ± 158.027
2025-05-08 11:00:34,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [285.94495, 248.95284, 275.7884, 39.919247, 711.7234, 278.148, 260.6401, 244.47214, 327.8194, 235.83049]
2025-05-08 11:00:34,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [134.0, 116.0, 137.0, 67.0, 288.0, 127.0, 124.0, 122.0, 152.0, 120.0]
2025-05-08 11:00:34,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 43 minutes, 9 seconds)
2025-05-08 11:03:21,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:03:24,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 444.53119 ± 484.451
2025-05-08 11:03:24,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [790.1554, 29.189728, 523.9525, 183.85428, 40.09251, 27.204803, 1625.0571, 192.50336, 812.8029, 220.49931]
2025-05-08 11:03:24,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [337.0, 38.0, 243.0, 131.0, 63.0, 36.0, 552.0, 110.0, 319.0, 157.0]
2025-05-08 11:03:24,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 30 minutes, 20 seconds)
2025-05-08 11:06:08,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:06:10,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 435.28375 ± 298.566
2025-05-08 11:06:10,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [251.90758, 375.32043, 646.3553, 878.26, 288.09396, 56.160023, 1040.8479, 275.97916, 244.84401, 295.06927]
2025-05-08 11:06:10,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [125.0, 193.0, 249.0, 321.0, 142.0, 55.0, 426.0, 132.0, 121.0, 148.0]
2025-05-08 11:06:10,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 17 minutes, 53 seconds)
2025-05-08 11:08:57,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:09:00,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 640.28284 ± 421.495
2025-05-08 11:09:00,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [723.49365, 636.99774, 1787.491, 231.25899, 624.129, 260.95673, 650.7612, 492.27448, 697.04395, 298.42117]
2025-05-08 11:09:00,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [336.0, 236.0, 596.0, 132.0, 252.0, 122.0, 225.0, 189.0, 263.0, 135.0]
2025-05-08 11:09:00,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 6 minutes, 51 seconds)
2025-05-08 11:11:49,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:11:51,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 354.65491 ± 376.019
2025-05-08 11:11:51,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [428.6286, 888.76166, 62.571976, 34.089695, 29.818605, 16.428648, 1046.553, 711.3456, 299.5526, 28.798668]
2025-05-08 11:11:51,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [195.0, 365.0, 87.0, 47.0, 44.0, 26.0, 427.0, 247.0, 141.0, 41.0]
2025-05-08 11:11:51,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 3 seconds)
2025-05-08 11:14:26,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:14:30,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 955.23615 ± 573.154
2025-05-08 11:14:30,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [969.3062, 377.4895, 424.98508, 1159.5934, 228.60657, 2025.0093, 679.295, 1075.682, 1862.4031, 749.9909]
2025-05-08 11:14:30,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [320.0, 162.0, 179.0, 424.0, 109.0, 712.0, 244.0, 426.0, 670.0, 266.0]
2025-05-08 11:14:30,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (955.24) for latency ExtremeSparseL4U32
2025-05-08 11:14:30,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 11:14:30,321 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 11:14:30,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 55 minutes, 37 seconds)
2025-05-08 11:16:50,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:16:54,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1050.46521 ± 606.418
2025-05-08 11:16:54,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [1079.1421, 1507.235, 802.13983, 286.65686, 836.49426, 2293.5964, 258.87717, 1382.3682, 1526.0868, 532.0556]
2025-05-08 11:16:54,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [373.0, 498.0, 311.0, 143.0, 337.0, 887.0, 126.0, 443.0, 534.0, 225.0]
2025-05-08 11:16:54,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (1050.47) for latency ExtremeSparseL4U32
2025-05-08 11:16:54,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 11:16:54,654 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 11:16:54,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 47 minutes, 29 seconds)
2025-05-08 11:19:15,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:19:18,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 760.21057 ± 429.413
2025-05-08 11:19:18,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [203.60248, 1408.7572, 1096.581, 1380.9993, 765.8501, 660.9605, 655.7285, 264.11157, 960.32324, 205.1913]
2025-05-08 11:19:18,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [140.0, 472.0, 386.0, 508.0, 284.0, 247.0, 238.0, 127.0, 335.0, 106.0]
2025-05-08 11:19:18,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 40 minutes, 10 seconds)
2025-05-08 11:21:35,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:21:37,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 592.81531 ± 390.708
2025-05-08 11:21:37,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [31.505554, 579.0485, 1012.9665, 899.18274, 30.689978, 460.33124, 954.70776, 1209.6373, 398.0852, 351.99844]
2025-05-08 11:21:37,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [43.0, 244.0, 367.0, 332.0, 42.0, 195.0, 353.0, 399.0, 182.0, 164.0]
2025-05-08 11:21:37,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 31 minutes, 29 seconds)
2025-05-08 11:23:55,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:23:58,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 755.08167 ± 697.793
2025-05-08 11:23:58,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [41.441093, 20.609484, 895.9174, 1313.796, 25.699379, 58.593468, 436.6414, 1997.4678, 1231.0886, 1529.5621]
2025-05-08 11:23:58,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [56.0, 36.0, 317.0, 427.0, 43.0, 101.0, 180.0, 680.0, 420.0, 518.0]
2025-05-08 11:23:58,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 23 minutes, 3 seconds)
2025-05-08 11:26:17,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:26:19,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 599.31311 ± 580.325
2025-05-08 11:26:19,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [1176.3425, 31.360693, 34.706818, 39.817276, 1135.7175, 1369.4739, 263.62338, 1495.7468, 211.0013, 235.3405]
2025-05-08 11:26:19,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [406.0, 52.0, 44.0, 53.0, 404.0, 466.0, 127.0, 524.0, 99.0, 155.0]
2025-05-08 11:26:19,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 17 minutes, 8 seconds)
2025-05-08 11:28:42,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:28:46,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 987.21875 ± 970.446
2025-05-08 11:28:46,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [2342.2466, 626.00757, 2398.1155, 1151.9855, 156.22832, 24.7921, 24.164436, 2372.6252, 742.6764, 33.344265]
2025-05-08 11:28:46,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [821.0, 275.0, 778.0, 395.0, 90.0, 34.0, 36.0, 840.0, 298.0, 43.0]
2025-05-08 11:28:46,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 15 minutes, 20 seconds)
2025-05-08 11:31:04,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:31:09,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1273.35461 ± 823.219
2025-05-08 11:31:09,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [1180.5536, 929.04254, 733.90753, 2871.16, 6.3490167, 2513.6204, 879.83307, 1400.4481, 1576.9456, 641.68665]
2025-05-08 11:31:09,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [399.0, 361.0, 274.0, 1000.0, 27.0, 899.0, 305.0, 462.0, 529.0, 240.0]
2025-05-08 11:31:09,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (1273.35) for latency ExtremeSparseL4U32
2025-05-08 11:31:09,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 11:31:09,892 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 11:31:09,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 12 minutes, 47 seconds)
2025-05-08 11:33:33,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:33:39,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1394.19604 ± 1070.393
2025-05-08 11:33:39,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [2870.11, 2833.4644, 30.461885, 1169.3821, 323.30035, 29.469797, 1217.3033, 2873.2498, 1138.7977, 1456.4222]
2025-05-08 11:33:39,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 45.0, 376.0, 122.0, 52.0, 451.0, 1000.0, 404.0, 514.0]
2025-05-08 11:33:39,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (1394.20) for latency ExtremeSparseL4U32
2025-05-08 11:33:39,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 11:33:39,060 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 11:33:39,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 12 minutes, 14 seconds)
2025-05-08 11:35:51,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:35:54,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 712.12390 ± 433.099
2025-05-08 11:35:54,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [602.34546, 910.45294, 1106.0917, 1191.2, 268.9951, 705.3642, 1442.4054, 14.128373, 633.462, 246.79428]
2025-05-08 11:35:54,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [233.0, 345.0, 422.0, 395.0, 122.0, 265.0, 569.0, 30.0, 271.0, 111.0]
2025-05-08 11:35:54,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 8 minutes, 53 seconds)
2025-05-08 11:38:26,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:38:32,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1271.90601 ± 1106.108
2025-05-08 11:38:32,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [1829.0541, 2149.5369, 2927.2085, 29.155771, 27.834517, 2686.6738, 450.8231, 189.65631, 2071.3948, 357.72214]
2025-05-08 11:38:32,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [627.0, 784.0, 1000.0, 50.0, 44.0, 1000.0, 176.0, 143.0, 705.0, 157.0]
2025-05-08 11:38:32,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 9 minutes, 27 seconds)
2025-05-08 11:40:50,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:40:55,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1081.70142 ± 978.729
2025-05-08 11:40:55,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [21.539898, 853.5404, 1610.3568, 2717.2373, 2043.8683, 427.5783, 375.70166, 2448.0364, 39.01556, 280.13843]
2025-05-08 11:40:55,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [33.0, 380.0, 618.0, 1000.0, 678.0, 201.0, 171.0, 888.0, 58.0, 318.0]
2025-05-08 11:40:55,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 6 minutes, 14 seconds)
2025-05-08 11:43:17,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:43:24,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1652.09937 ± 917.808
2025-05-08 11:43:24,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [970.4583, 1078.6631, 1276.9849, 2871.4749, 1929.0656, 2922.5847, 1373.0736, 7.9355416, 1263.9187, 2826.8352]
2025-05-08 11:43:24,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [372.0, 393.0, 431.0, 1000.0, 663.0, 1000.0, 481.0, 29.0, 441.0, 1000.0]
2025-05-08 11:43:24,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (1652.10) for latency ExtremeSparseL4U32
2025-05-08 11:43:24,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 11:43:24,457 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 11:43:24,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 4 minutes, 52 seconds)
2025-05-08 11:45:42,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:45:46,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1283.06067 ± 507.171
2025-05-08 11:45:46,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [570.96484, 1301.6445, 1019.9414, 2224.8467, 1451.7106, 1107.9578, 1351.4791, 1698.6417, 1670.2029, 433.21667]
2025-05-08 11:45:46,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [220.0, 478.0, 365.0, 777.0, 459.0, 383.0, 475.0, 627.0, 544.0, 180.0]
2025-05-08 11:45:46,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 1 minute, 19 seconds)
2025-05-08 11:47:59,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:48:07,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 2061.09961 ± 866.449
2025-05-08 11:48:07,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [2395.8838, 2615.4976, 27.742062, 2353.7864, 1979.3307, 1068.5918, 2914.0786, 2639.4275, 2896.4011, 1720.2585]
2025-05-08 11:48:07,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [809.0, 895.0, 35.0, 777.0, 657.0, 420.0, 1000.0, 868.0, 1000.0, 604.0]
2025-05-08 11:48:07,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (2061.10) for latency ExtremeSparseL4U32
2025-05-08 11:48:07,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 11:48:07,760 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 11:48:07,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 59 minutes, 42 seconds)
2025-05-08 11:50:32,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:50:38,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1505.38770 ± 1134.868
2025-05-08 11:50:38,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [3118.5488, 3120.346, 1945.209, 2945.0144, 240.90108, 666.1197, 138.4541, 1394.8352, 577.8566, 906.5925]
2025-05-08 11:50:38,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 648.0, 1000.0, 117.0, 261.0, 105.0, 453.0, 216.0, 332.0]
2025-05-08 11:50:38,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 56 minutes, 7 seconds)
2025-05-08 11:52:59,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:53:06,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1614.46606 ± 862.783
2025-05-08 11:53:06,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [2867.927, 857.655, 2685.3835, 2131.9666, 34.79112, 593.3743, 2201.876, 1584.4923, 1609.7163, 1577.4789]
2025-05-08 11:53:06,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 340.0, 839.0, 718.0, 49.0, 238.0, 719.0, 555.0, 542.0, 512.0]
2025-05-08 11:53:06,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 54 minutes, 30 seconds)
2025-05-08 11:55:20,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:55:25,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1216.07495 ± 927.544
2025-05-08 11:55:25,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [451.06378, 46.17659, 1269.741, 1607.6708, 2395.3047, 2218.4473, 1049.5013, 259.36765, 194.43019, 2669.046]
2025-05-08 11:55:25,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [173.0, 52.0, 414.0, 549.0, 760.0, 766.0, 394.0, 139.0, 101.0, 885.0]
2025-05-08 11:55:25,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 50 minutes, 33 seconds)
2025-05-08 11:57:44,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:57:49,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1125.08691 ± 911.239
2025-05-08 11:57:49,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [605.74146, 69.19601, 225.6082, 1780.0461, 1683.9398, 1128.2845, 1781.1904, 910.3783, 3043.0154, 23.468506]
2025-05-08 11:57:49,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [224.0, 87.0, 103.0, 591.0, 579.0, 410.0, 691.0, 340.0, 1000.0, 35.0]
2025-05-08 11:57:49,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 48 minutes, 22 seconds)
2025-05-08 12:00:19,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:00:24,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1052.70776 ± 1122.828
2025-05-08 12:00:24,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [42.177036, 22.362997, 32.64849, 2750.522, 549.39105, 34.762493, 536.8506, 2479.796, 1310.3605, 2768.2068]
2025-05-08 12:00:24,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [54.0, 31.0, 56.0, 997.0, 208.0, 50.0, 216.0, 887.0, 444.0, 1000.0]
2025-05-08 12:00:24,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 48 minutes, 2 seconds)
2025-05-08 12:02:46,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:02:54,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1762.85681 ± 881.941
2025-05-08 12:02:54,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [1514.491, 2375.235, 2696.99, 2932.813, 435.1075, 1478.8887, 1336.2751, 2906.5298, 1490.2588, 461.97913]
2025-05-08 12:02:54,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [512.0, 823.0, 1000.0, 1000.0, 197.0, 518.0, 483.0, 1000.0, 493.0, 198.0]
2025-05-08 12:02:54,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 45 minutes, 30 seconds)
2025-05-08 12:05:07,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:05:14,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1533.60278 ± 1012.919
2025-05-08 12:05:14,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [1400.9769, 2936.7754, 2955.3333, 2980.2178, 384.1135, 1401.6306, 1191.6566, 483.61948, 287.25543, 1314.4482]
2025-05-08 12:05:14,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [485.0, 1000.0, 996.0, 1000.0, 155.0, 538.0, 440.0, 205.0, 130.0, 471.0]
2025-05-08 12:05:14,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 41 minutes, 53 seconds)
2025-05-08 12:07:41,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:07:46,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1092.36169 ± 726.680
2025-05-08 12:07:46,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [1101.7958, 1006.07306, 744.5742, 24.245195, 2656.4084, 207.55588, 1879.4475, 1307.4304, 1220.0823, 776.0046]
2025-05-08 12:07:46,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [361.0, 390.0, 280.0, 42.0, 826.0, 103.0, 607.0, 459.0, 414.0, 290.0]
2025-05-08 12:07:46,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 41 minutes, 12 seconds)
2025-05-08 12:10:18,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:10:27,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1549.50195 ± 914.462
2025-05-08 12:10:27,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [1015.2946, 2813.1213, 2820.194, 446.76093, 939.02136, 991.53436, 1711.961, 439.23907, 2853.0886, 1464.8037]
2025-05-08 12:10:27,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [364.0, 1000.0, 1000.0, 202.0, 326.0, 400.0, 637.0, 176.0, 1000.0, 472.0]
2025-05-08 12:10:27,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 41 minutes, 1 second)
2025-05-08 12:13:39,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:13:45,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1262.85205 ± 812.899
2025-05-08 12:13:45,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [526.4564, 1573.5205, 848.28436, 746.4891, 217.63115, 2625.2717, 772.37585, 2609.4668, 848.87396, 1860.1508]
2025-05-08 12:13:45,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [187.0, 541.0, 303.0, 274.0, 110.0, 826.0, 285.0, 858.0, 296.0, 593.0]
2025-05-08 12:13:45,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 44 minutes, 6 seconds)
2025-05-08 12:16:25,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:16:31,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1345.84204 ± 1079.833
2025-05-08 12:16:31,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [2273.3474, 1936.7998, 15.709144, 30.392149, 2839.8855, 580.68976, 2368.991, 734.38727, 174.45969, 2503.759]
2025-05-08 12:16:31,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [792.0, 682.0, 31.0, 51.0, 1000.0, 225.0, 812.0, 276.0, 125.0, 787.0]
2025-05-08 12:16:31,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 43 minutes, 29 seconds)
2025-05-08 12:18:52,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:18:57,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1263.18103 ± 1044.546
2025-05-08 12:18:57,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [2953.7861, 2929.4307, 648.5163, 1820.2461, 1441.6122, 463.4021, 523.61633, 16.155598, 1805.1077, 29.937555]
2025-05-08 12:18:57,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 256.0, 600.0, 475.0, 185.0, 203.0, 26.0, 657.0, 53.0]
2025-05-08 12:18:57,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 41 minutes, 34 seconds)
2025-05-08 12:21:18,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:21:27,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 2107.39429 ± 1219.549
2025-05-08 12:21:27,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [3039.9783, 3160.985, 540.0672, 25.873983, 2940.4893, 3092.901, 1059.1487, 3048.564, 3190.424, 975.51245]
2025-05-08 12:21:27,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 223.0, 37.0, 1000.0, 1000.0, 385.0, 1000.0, 1000.0, 334.0]
2025-05-08 12:21:27,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (2107.39) for latency ExtremeSparseL4U32
2025-05-08 12:21:27,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 12:21:27,030 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 12:21:27,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 38 minutes, 31 seconds)
2025-05-08 12:24:34,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:24:40,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1178.92700 ± 887.721
2025-05-08 12:24:40,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [978.8364, 2459.4204, 548.1805, 1475.9781, 1079.2993, 275.42288, 26.060253, 786.84247, 3023.8486, 1135.3811]
2025-05-08 12:24:40,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [331.0, 827.0, 218.0, 533.0, 388.0, 162.0, 49.0, 291.0, 1000.0, 430.0]
2025-05-08 12:24:40,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 39 minutes, 33 seconds)
2025-05-08 12:27:30,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:27:38,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1741.55005 ± 694.614
2025-05-08 12:27:38,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [1257.3903, 2854.6382, 1063.0518, 644.5107, 1438.2958, 2503.1216, 2512.409, 2132.1694, 1785.0812, 1224.831]
2025-05-08 12:27:38,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [442.0, 1000.0, 429.0, 271.0, 508.0, 921.0, 871.0, 719.0, 629.0, 420.0]
2025-05-08 12:27:38,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 34 minutes, 23 seconds)
2025-05-08 12:29:58,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:30:05,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1894.91858 ± 827.673
2025-05-08 12:30:05,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [624.3176, 3140.8328, 1807.3567, 2128.7986, 3139.5227, 1187.2867, 2153.9404, 1001.637, 2476.9734, 1288.52]
2025-05-08 12:30:05,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [238.0, 1000.0, 608.0, 775.0, 1000.0, 406.0, 770.0, 341.0, 801.0, 453.0]
2025-05-08 12:30:06,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 29 minutes, 36 seconds)
2025-05-08 12:32:23,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:32:27,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1006.66730 ± 907.716
2025-05-08 12:32:27,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [803.0026, 624.96246, 416.60666, 717.2171, 1703.442, 472.68076, 3170.1277, 269.496, 45.11618, 1844.0215]
2025-05-08 12:32:27,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [266.0, 247.0, 183.0, 271.0, 544.0, 209.0, 1000.0, 150.0, 75.0, 538.0]
2025-05-08 12:32:27,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 26 minutes, 23 seconds)
2025-05-08 12:34:48,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:34:53,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1189.59570 ± 954.052
2025-05-08 12:34:53,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [589.25, 193.23332, 1727.4756, 2695.5405, 218.27094, 741.93115, 2949.2312, 1529.4456, 309.27, 942.30835]
2025-05-08 12:34:53,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [254.0, 94.0, 710.0, 1000.0, 113.0, 272.0, 1000.0, 543.0, 137.0, 370.0]
2025-05-08 12:34:53,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 23 minutes, 19 seconds)
2025-05-08 12:37:19,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:37:26,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1627.41040 ± 809.106
2025-05-08 12:37:26,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [2001.7905, 2029.5593, 880.5858, 2986.8186, 1435.0496, 475.4941, 1251.8231, 2963.367, 1273.0217, 976.59595]
2025-05-08 12:37:26,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [696.0, 713.0, 308.0, 1000.0, 498.0, 224.0, 588.0, 961.0, 445.0, 336.0]
2025-05-08 12:37:26,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 16 minutes, 34 seconds)
2025-05-08 12:39:46,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:39:53,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1667.80310 ± 1200.393
2025-05-08 12:39:53,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [3019.9995, 2879.7007, 230.5105, 1313.9541, 2156.4966, 2972.4275, 1213.1346, 15.926592, 34.493042, 2841.3887]
2025-05-08 12:39:53,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 104.0, 476.0, 727.0, 1000.0, 433.0, 27.0, 46.0, 1000.0]
2025-05-08 12:39:53,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 11 minutes, 7 seconds)
2025-05-08 12:42:09,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:42:15,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1224.54785 ± 1164.575
2025-05-08 12:42:15,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [33.04895, 2944.6575, 2838.193, 1320.7192, 628.8822, 589.3003, 2914.0906, 921.72577, 25.06834, 29.79301]
2025-05-08 12:42:15,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [53.0, 981.0, 1000.0, 461.0, 259.0, 223.0, 1000.0, 341.0, 46.0, 38.0]
2025-05-08 12:42:15,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 8 minutes, 3 seconds)
2025-05-08 12:44:40,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:44:44,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1028.98474 ± 856.509
2025-05-08 12:44:44,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [755.81573, 3075.6206, 8.577643, 1559.7046, 930.69617, 768.7584, 191.41658, 398.83066, 838.3003, 1762.1266]
2025-05-08 12:44:44,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [275.0, 1000.0, 40.0, 550.0, 323.0, 287.0, 90.0, 165.0, 335.0, 615.0]
2025-05-08 12:44:44,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 6 minutes, 17 seconds)
2025-05-08 12:46:57,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:47:02,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1078.84497 ± 985.354
2025-05-08 12:47:02,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [906.6861, 514.2332, 2918.1287, 2531.1917, 2041.6747, 824.8067, 529.4441, 36.883545, 29.231289, 456.1695]
2025-05-08 12:47:02,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [336.0, 277.0, 972.0, 838.0, 669.0, 326.0, 232.0, 58.0, 38.0, 183.0]
2025-05-08 12:47:02,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 3 minutes, 9 seconds)
2025-05-08 12:49:24,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:49:32,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1867.71326 ± 979.455
2025-05-08 12:49:32,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [845.9505, 1549.7776, 149.79439, 1134.3635, 1118.0159, 2812.9016, 2822.1228, 2326.585, 2933.1223, 2984.4995]
2025-05-08 12:49:32,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [289.0, 548.0, 69.0, 426.0, 443.0, 951.0, 1000.0, 763.0, 1000.0, 1000.0]
2025-05-08 12:49:32,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 30 seconds)
2025-05-08 12:51:55,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:51:59,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1005.52051 ± 856.895
2025-05-08 12:51:59,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [424.45035, 661.2953, 1774.303, 18.408401, 15.247696, 1359.6589, 1201.4872, 1015.1536, 580.6605, 3004.54]
2025-05-08 12:51:59,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [177.0, 233.0, 587.0, 28.0, 28.0, 470.0, 450.0, 352.0, 248.0, 1000.0]
2025-05-08 12:51:59,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 77/100 (estimated time remaining: 58 minutes)
2025-05-08 12:54:53,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:54:59,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1067.00122 ± 752.939
2025-05-08 12:54:59,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [2037.8208, 1428.0581, 519.3165, 16.831074, 372.12427, 826.04425, 1078.4146, 1726.6849, 313.42776, 2351.2896]
2025-05-08 12:54:59,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [689.0, 501.0, 205.0, 29.0, 151.0, 276.0, 375.0, 624.0, 151.0, 807.0]
2025-05-08 12:54:59,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 78/100 (estimated time remaining: 58 minutes, 34 seconds)
2025-05-08 12:57:51,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:57:57,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1196.62476 ± 967.918
2025-05-08 12:57:57,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [602.7144, 540.78656, 319.89514, 3103.6152, 1243.4866, 3035.32, 558.3268, 704.41547, 881.7721, 975.91504]
2025-05-08 12:57:57,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [268.0, 204.0, 137.0, 1000.0, 427.0, 1000.0, 213.0, 255.0, 318.0, 354.0]
2025-05-08 12:57:57,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 79/100 (estimated time remaining: 58 minutes, 11 seconds)
2025-05-08 13:00:39,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:00:44,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1017.29541 ± 989.544
2025-05-08 13:00:44,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [197.16927, 561.9763, 1818.0575, 2347.0837, 25.852285, 974.43994, 32.05903, 843.9853, 340.39694, 3031.9343]
2025-05-08 13:00:44,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [94.0, 225.0, 634.0, 814.0, 37.0, 411.0, 52.0, 326.0, 144.0, 995.0]
2025-05-08 13:00:44,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 80/100 (estimated time remaining: 57 minutes, 32 seconds)
2025-05-08 13:03:09,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:03:13,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1074.53271 ± 1037.745
2025-05-08 13:03:13,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [1014.4785, 208.34238, 1070.5939, 1108.7354, 77.80944, 1409.8118, 32.081387, 15.363638, 2955.8188, 2852.2917]
2025-05-08 13:03:13,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [330.0, 111.0, 384.0, 440.0, 93.0, 484.0, 42.0, 29.0, 1000.0, 953.0]
2025-05-08 13:03:13,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 81/100 (estimated time remaining: 54 minutes, 45 seconds)
2025-05-08 13:05:22,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:05:28,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1592.84192 ± 982.750
2025-05-08 13:05:28,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [2858.1592, 2550.1843, 1577.6736, 1358.8068, 2262.8396, 1006.8568, 26.80789, 440.79474, 851.8648, 2994.4307]
2025-05-08 13:05:28,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [961.0, 830.0, 543.0, 439.0, 772.0, 364.0, 41.0, 190.0, 313.0, 1000.0]
2025-05-08 13:05:28,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 82/100 (estimated time remaining: 51 minutes, 15 seconds)
2025-05-08 13:07:47,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:07:51,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1115.31006 ± 1170.173
2025-05-08 13:07:51,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [18.67079, 365.42865, 471.26306, 3056.0425, 1820.3397, 3315.2256, 46.02467, 34.11692, 816.926, 1209.0625]
2025-05-08 13:07:51,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [38.0, 152.0, 180.0, 1000.0, 578.0, 1000.0, 68.0, 58.0, 291.0, 446.0]
2025-05-08 13:07:51,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 83/100 (estimated time remaining: 46 minutes, 21 seconds)
2025-05-08 13:10:12,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:10:15,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 795.34277 ± 1082.240
2025-05-08 13:10:15,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [171.68672, 2801.326, 42.160923, 2972.9033, 195.835, 679.0714, 132.01456, 895.3325, 11.89599, 51.20102]
2025-05-08 13:10:15,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [92.0, 941.0, 63.0, 986.0, 93.0, 275.0, 158.0, 332.0, 22.0, 91.0]
2025-05-08 13:10:15,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 84/100 (estimated time remaining: 41 minutes, 49 seconds)
2025-05-08 13:12:31,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:12:35,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1105.50952 ± 689.715
2025-05-08 13:12:35,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [873.9688, 1377.5342, 2363.4565, 1140.2197, 486.88892, 676.9872, 26.019085, 810.05725, 2213.0107, 1086.9532]
2025-05-08 13:12:35,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [316.0, 453.0, 704.0, 397.0, 237.0, 275.0, 38.0, 294.0, 765.0, 402.0]
2025-05-08 13:12:35,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 85/100 (estimated time remaining: 37 minutes, 56 seconds)
2025-05-08 13:15:05,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:15:11,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1281.48889 ± 1024.643
2025-05-08 13:15:11,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [2991.105, 1603.7043, 1864.6692, 25.128452, 58.07612, 377.24823, 1748.3367, 2793.8743, 706.4721, 646.2748]
2025-05-08 13:15:11,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 572.0, 670.0, 38.0, 69.0, 161.0, 1000.0, 1000.0, 259.0, 236.0]
2025-05-08 13:15:11,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 86/100 (estimated time remaining: 35 minutes, 52 seconds)
2025-05-08 13:17:24,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:17:28,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1029.39380 ± 832.445
2025-05-08 13:17:28,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [17.089853, 726.8338, 717.4922, 1611.4673, 3021.97, 537.03046, 946.55023, 45.019527, 1362.924, 1307.5613]
2025-05-08 13:17:28,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [28.0, 286.0, 260.0, 521.0, 1000.0, 208.0, 365.0, 67.0, 504.0, 448.0]
2025-05-08 13:17:28,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 87/100 (estimated time remaining: 33 minutes, 37 seconds)
2025-05-08 13:19:50,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:19:58,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1860.71460 ± 955.791
2025-05-08 13:19:58,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [773.95624, 419.77405, 1113.7412, 667.6689, 2919.028, 2879.5305, 2931.4797, 2338.123, 2350.3882, 2213.4568]
2025-05-08 13:19:58,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [310.0, 203.0, 399.0, 291.0, 1000.0, 1000.0, 1000.0, 800.0, 862.0, 761.0]
2025-05-08 13:19:58,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 88/100 (estimated time remaining: 31 minutes, 30 seconds)
2025-05-08 13:22:09,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:22:11,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 597.19916 ± 504.827
2025-05-08 13:22:11,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [383.15442, 26.369287, 23.80332, 684.9933, 1443.8867, 549.9077, 423.24625, 1577.6898, 620.9674, 237.9738]
2025-05-08 13:22:11,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [179.0, 43.0, 49.0, 255.0, 489.0, 217.0, 174.0, 528.0, 228.0, 115.0]
2025-05-08 13:22:11,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 89/100 (estimated time remaining: 28 minutes, 38 seconds)
2025-05-08 13:24:33,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:24:38,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1325.64685 ± 987.190
2025-05-08 13:24:38,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [2560.5354, 577.093, 3027.7297, 409.30707, 25.036263, 1465.5216, 1051.149, 1319.6594, 380.0344, 2440.4036]
2025-05-08 13:24:38,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [888.0, 237.0, 1000.0, 165.0, 38.0, 614.0, 416.0, 457.0, 199.0, 827.0]
2025-05-08 13:24:38,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 90/100 (estimated time remaining: 26 minutes, 31 seconds)
2025-05-08 13:27:03,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:27:07,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 951.18274 ± 461.066
2025-05-08 13:27:07,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [1185.7874, 547.45575, 1158.3958, 1606.2849, 810.87775, 517.3046, 659.24994, 1703.4929, 1113.5477, 209.43022]
2025-05-08 13:27:07,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [378.0, 216.0, 400.0, 539.0, 289.0, 199.0, 277.0, 578.0, 378.0, 94.0]
2025-05-08 13:27:07,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 91/100 (estimated time remaining: 23 minutes, 52 seconds)
2025-05-08 13:29:17,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:29:22,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1078.12402 ± 931.737
2025-05-08 13:29:22,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [23.174902, 732.42114, 2395.2188, 1470.9268, 532.7586, 435.65997, 1462.6405, 746.5324, 33.03973, 2948.8672]
2025-05-08 13:29:22,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [40.0, 286.0, 709.0, 528.0, 229.0, 164.0, 508.0, 285.0, 50.0, 1000.0]
2025-05-08 13:29:22,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 92/100 (estimated time remaining: 21 minutes, 23 seconds)
2025-05-08 13:31:37,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:31:42,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1098.44946 ± 838.229
2025-05-08 13:31:42,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [216.9591, 526.88556, 906.6944, 26.9227, 2634.1956, 2023.8883, 1337.3683, 2090.5613, 559.8717, 661.146]
2025-05-08 13:31:42,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [110.0, 194.0, 328.0, 42.0, 892.0, 707.0, 458.0, 705.0, 242.0, 252.0]
2025-05-08 13:31:42,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 93/100 (estimated time remaining: 18 minutes, 45 seconds)
2025-05-08 13:34:11,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:34:18,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1648.88599 ± 1163.035
2025-05-08 13:34:18,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [2543.763, 1207.7119, 2978.6316, 255.1103, 16.11238, 2900.295, 3029.3157, 1780.8577, 1757.1432, 19.92012]
2025-05-08 13:34:18,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [846.0, 410.0, 1000.0, 118.0, 27.0, 1000.0, 1000.0, 599.0, 598.0, 34.0]
2025-05-08 13:34:18,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 94/100 (estimated time remaining: 16 minutes, 57 seconds)
2025-05-08 13:36:38,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:36:43,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1297.87439 ± 1026.844
2025-05-08 13:36:43,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [2982.4817, 1983.0306, 1265.4747, 254.16676, 220.69405, 772.99927, 1445.8329, 3098.3315, 703.91754, 251.81407]
2025-05-08 13:36:43,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [933.0, 727.0, 464.0, 121.0, 113.0, 288.0, 497.0, 1000.0, 253.0, 204.0]
2025-05-08 13:36:43,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 95/100 (estimated time remaining: 14 minutes, 29 seconds)
2025-05-08 13:38:50,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:38:55,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1203.00464 ± 825.456
2025-05-08 13:38:55,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [1463.6659, 1472.8047, 2997.9653, 41.80132, 2049.5586, 1125.2878, 1149.1843, 313.79327, 488.4389, 927.54553]
2025-05-08 13:38:55,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [509.0, 498.0, 1000.0, 60.0, 683.0, 401.0, 454.0, 134.0, 265.0, 366.0]
2025-05-08 13:38:55,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 47 seconds)
2025-05-08 13:41:02,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:41:05,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1091.29700 ± 921.072
2025-05-08 13:41:05,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [1170.0492, 2422.246, 3079.4045, 669.9006, 360.16797, 1118.9941, 465.09216, 26.28166, 385.0638, 1215.7697]
2025-05-08 13:41:05,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [458.0, 845.0, 1000.0, 248.0, 150.0, 378.0, 188.0, 49.0, 167.0, 447.0]
2025-05-08 13:41:05,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 97/100 (estimated time remaining: 9 minutes, 22 seconds)
2025-05-08 13:42:47,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:42:53,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1830.94006 ± 955.381
2025-05-08 13:42:53,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [3142.129, 1695.2573, 1312.1229, 2888.2048, 3011.4597, 2707.4685, 587.7875, 1119.69, 658.3793, 1186.9027]
2025-05-08 13:42:53,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 568.0, 454.0, 906.0, 1000.0, 890.0, 236.0, 397.0, 243.0, 392.0]
2025-05-08 13:42:53,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 42 seconds)
2025-05-08 13:44:42,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:44:47,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1571.53210 ± 1028.249
2025-05-08 13:44:47,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [698.13696, 2863.8584, 2978.1887, 2506.6384, 1739.203, 640.7657, 1052.0942, 658.5951, 2539.8171, 38.023327]
2025-05-08 13:44:47,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [257.0, 1000.0, 1000.0, 858.0, 1000.0, 244.0, 354.0, 279.0, 857.0, 59.0]
2025-05-08 13:44:47,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 11 seconds)
2025-05-08 13:46:23,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:46:28,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1503.59888 ± 891.881
2025-05-08 13:46:28,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [907.046, 426.41373, 2968.3596, 532.4125, 2881.696, 1983.2366, 1499.0288, 1748.5825, 436.74493, 1652.4675]
2025-05-08 13:46:28,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [344.0, 189.0, 1000.0, 213.0, 1000.0, 1000.0, 485.0, 573.0, 200.0, 568.0]
2025-05-08 13:46:28,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 56 seconds)
2025-05-08 13:48:12,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:48:15,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 1207.03394 ± 1187.475
2025-05-08 13:48:15,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [22.832993, 31.852938, 1181.2224, 2785.343, 1083.7358, 1021.85, 31.465143, 2956.5208, 2924.6719, 30.843987]
2025-05-08 13:48:15,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [41.0, 55.0, 435.0, 1000.0, 373.0, 362.0, 44.0, 1000.0, 1000.0, 35.0]
2025-05-08 13:48:15,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1149 [DEBUG]: Training session finished
