2025-05-06 17:41:35,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc3/noisy-walker2d/SparseU15-bpql-mem32
2025-05-06 17:41:35,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc3/noisy-walker2d/SparseU15-bpql-mem32
2025-05-06 17:41:35,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1008 [DEBUG]: args.trainer_eval_latencies: {'SparseU15': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7a7a489cda00>}
2025-05-06 17:41:35,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1009 [DEBUG]: using device: cpu
2025-05-06 17:41:35,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1031 [INFO]: Creating new trainer
2025-05-06 17:41:35,749 baseline-bpql-noisy-walker2d:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=209, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-06 17:41:35,749 baseline-bpql-noisy-walker2d:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-06 17:41:36,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1092 [DEBUG]: Starting training session...
2025-05-06 17:41:36,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 1/100
2025-05-06 17:44:12,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:44:14,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 17.23907 ± 6.451
2025-05-06 17:44:14,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [27.100695, 18.78773, 20.775337, 13.4592285, 18.096605, 12.049071, 7.5375595, 18.76436, 26.982828, 8.83732]
2025-05-06 17:44:14,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [78.0, 53.0, 31.0, 74.0, 52.0, 57.0, 69.0, 72.0, 35.0, 73.0]
2025-05-06 17:44:14,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (17.24) for latency SparseU15
2025-05-06 17:44:14,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 17:44:14,116 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-walker2d/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 17:44:14,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 20 minutes, 53 seconds)
2025-05-06 17:47:02,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:47:05,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 51.55309 ± 45.002
2025-05-06 17:47:05,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [61.160156, 14.758624, 45.241493, 4.968208, 55.091373, 36.423393, 82.87576, 168.50415, 16.405111, 30.102612]
2025-05-06 17:47:05,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [123.0, 114.0, 114.0, 217.0, 75.0, 47.0, 158.0, 261.0, 188.0, 72.0]
2025-05-06 17:47:05,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (51.55) for latency SparseU15
2025-05-06 17:47:05,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 17:47:05,892 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-walker2d/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 17:47:05,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 29 minutes, 24 seconds)
2025-05-06 17:49:53,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:49:55,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 46.66589 ± 15.654
2025-05-06 17:49:55,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [70.98538, 51.661293, 35.59617, 52.19638, 17.38881, 52.99203, 56.39873, 30.973726, 34.578568, 63.88786]
2025-05-06 17:49:55,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [98.0, 77.0, 102.0, 94.0, 29.0, 82.0, 91.0, 54.0, 78.0, 84.0]
2025-05-06 17:49:55,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 29 minutes, 1 second)
2025-05-06 17:52:45,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:52:48,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 54.28685 ± 23.065
2025-05-06 17:52:48,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [60.816425, 17.187874, 61.1184, 98.6247, 75.919365, 56.386395, 18.952488, 60.047276, 43.352474, 50.46314]
2025-05-06 17:52:48,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [113.0, 30.0, 84.0, 142.0, 218.0, 77.0, 33.0, 150.0, 77.0, 122.0]
2025-05-06 17:52:48,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (54.29) for latency SparseU15
2025-05-06 17:52:48,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 17:52:48,509 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-walker2d/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 17:52:48,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 29 minutes)
2025-05-06 17:55:33,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:55:36,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 58.12770 ± 68.990
2025-05-06 17:55:36,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [28.731506, 71.374855, 46.294445, 58.308594, 18.17571, 13.947969, 258.29172, 17.781256, 30.394464, 37.976505]
2025-05-06 17:55:36,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [40.0, 104.0, 87.0, 111.0, 109.0, 26.0, 215.0, 32.0, 91.0, 114.0]
2025-05-06 17:55:36,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (58.13) for latency SparseU15
2025-05-06 17:55:36,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 17:55:36,109 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-walker2d/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 17:55:36,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 26 minutes, 2 seconds)
2025-05-06 17:58:27,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:58:30,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 46.77612 ± 55.174
2025-05-06 17:58:30,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [33.792233, 27.586678, 15.118797, 53.448696, 16.72528, 32.21281, 208.72798, 40.800816, 21.888523, 17.45935]
2025-05-06 17:58:30,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [113.0, 155.0, 131.0, 104.0, 131.0, 108.0, 187.0, 105.0, 35.0, 130.0]
2025-05-06 17:58:30,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 28 minutes, 12 seconds)
2025-05-06 18:01:16,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 18:01:19,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 42.03025 ± 10.976
2025-05-06 18:01:19,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [34.18954, 45.884026, 60.998898, 46.327484, 31.047337, 33.114094, 40.411118, 28.186678, 60.711224, 39.43214]
2025-05-06 18:01:19,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [49.0, 126.0, 145.0, 105.0, 48.0, 227.0, 68.0, 94.0, 85.0, 89.0]
2025-05-06 18:01:19,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 24 minutes, 32 seconds)
2025-05-06 18:04:09,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 18:04:12,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 49.31092 ± 28.972
2025-05-06 18:04:12,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [24.149202, 58.452568, 65.02414, 31.019218, 19.225468, 19.645554, 87.03842, 84.21108, 15.751821, 88.59177]
2025-05-06 18:04:12,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [35.0, 93.0, 227.0, 94.0, 139.0, 30.0, 122.0, 158.0, 34.0, 144.0]
2025-05-06 18:04:12,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 22 minutes, 51 seconds)
2025-05-06 18:07:00,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 18:07:03,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 80.88270 ± 33.670
2025-05-06 18:07:03,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [62.482494, 127.332306, 84.372, 111.589584, 23.625631, 82.622246, 133.64459, 65.09746, 41.483288, 76.577385]
2025-05-06 18:07:03,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [78.0, 175.0, 141.0, 164.0, 192.0, 90.0, 144.0, 93.0, 49.0, 107.0]
2025-05-06 18:07:03,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (80.88) for latency SparseU15
2025-05-06 18:07:03,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 18:07:03,403 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-walker2d/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 18:07:03,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 19 minutes, 19 seconds)
2025-05-06 18:09:52,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 18:09:53,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 37.09577 ± 13.908
2025-05-06 18:09:53,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [56.26343, 59.859417, 24.40475, 36.061253, 19.19652, 31.20743, 20.940596, 53.24256, 34.91555, 34.866203]
2025-05-06 18:09:53,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [83.0, 74.0, 45.0, 49.0, 30.0, 45.0, 33.0, 84.0, 54.0, 45.0]
2025-05-06 18:09:53,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 17 minutes, 10 seconds)
2025-05-06 18:12:43,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 18:12:45,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 89.66086 ± 57.364
2025-05-06 18:12:45,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [39.484653, 89.97583, 132.82085, 21.372696, 20.139204, 127.67134, 76.13728, 221.31706, 84.19806, 83.49156]
2025-05-06 18:12:45,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [59.0, 75.0, 122.0, 235.0, 29.0, 132.0, 82.0, 150.0, 146.0, 77.0]
2025-05-06 18:12:45,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (89.66) for latency SparseU15
2025-05-06 18:12:45,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 18:12:45,881 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-walker2d/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 18:12:45,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 13 minutes, 52 seconds)
2025-05-06 18:15:33,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 18:15:35,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 60.86164 ± 29.138
2025-05-06 18:15:35,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [73.2546, 21.006037, 51.336735, 82.22438, 128.3213, 58.744286, 58.85098, 62.746693, 50.35342, 21.778078]
2025-05-06 18:15:35,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [92.0, 32.0, 80.0, 93.0, 133.0, 98.0, 74.0, 71.0, 100.0, 32.0]
2025-05-06 18:15:35,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 11 minutes, 13 seconds)
2025-05-06 18:18:27,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 18:18:29,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 126.37004 ± 79.565
2025-05-06 18:18:29,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [189.72168, 38.059708, 163.78271, 203.81316, 168.14464, 20.529099, 259.23117, 59.44385, 31.45778, 129.51657]
2025-05-06 18:18:29,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [114.0, 50.0, 165.0, 128.0, 198.0, 31.0, 195.0, 61.0, 47.0, 97.0]
2025-05-06 18:18:29,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (126.37) for latency SparseU15
2025-05-06 18:18:29,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 18:18:29,590 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-walker2d/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 18:18:29,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 8 minutes, 36 seconds)
2025-05-06 18:21:18,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 18:21:21,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 159.15329 ± 68.357
2025-05-06 18:21:21,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [84.50941, 16.065395, 263.29132, 123.56203, 191.34761, 164.90646, 225.23643, 178.81613, 137.10466, 206.69347]
2025-05-06 18:21:21,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [121.0, 29.0, 191.0, 143.0, 129.0, 124.0, 147.0, 115.0, 226.0, 142.0]
2025-05-06 18:21:21,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (159.15) for latency SparseU15
2025-05-06 18:21:21,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 18:21:21,326 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-walker2d/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 18:21:21,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 5 minutes, 56 seconds)
2025-05-06 18:24:13,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 18:24:16,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 179.35442 ± 99.679
2025-05-06 18:24:16,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [167.91524, 296.55444, 287.68628, 212.38745, 93.607925, 245.99057, 301.43964, 118.48672, 49.494995, 19.980814]
2025-05-06 18:24:16,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [191.0, 242.0, 176.0, 182.0, 112.0, 150.0, 186.0, 100.0, 160.0, 31.0]
2025-05-06 18:24:16,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (179.35) for latency SparseU15
2025-05-06 18:24:16,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 18:24:16,823 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-walker2d/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 18:24:16,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 4 minutes, 39 seconds)
2025-05-06 18:27:07,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 18:27:10,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 127.72777 ± 88.207
2025-05-06 18:27:10,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [19.410606, 19.578817, 77.509056, 143.19226, 128.6263, 96.08631, 85.406235, 212.82956, 166.77505, 327.8635]
2025-05-06 18:27:10,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [30.0, 28.0, 129.0, 103.0, 121.0, 116.0, 144.0, 154.0, 154.0, 190.0]
2025-05-06 18:27:10,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 2 minutes, 1 second)
2025-05-06 18:30:03,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 18:30:07,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 139.74272 ± 116.757
2025-05-06 18:30:07,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [15.846559, 158.5231, 25.77124, 188.75467, 133.61464, 412.50446, 248.43529, 100.79766, 13.981557, 99.198166]
2025-05-06 18:30:07,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [27.0, 298.0, 39.0, 137.0, 229.0, 269.0, 276.0, 134.0, 25.0, 138.0]
2025-05-06 18:30:07,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 1 minute, 14 seconds)
2025-05-06 18:33:02,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 18:33:07,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 368.55453 ± 52.935
2025-05-06 18:33:07,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [284.35587, 363.21664, 366.3484, 422.34857, 375.0258, 416.34473, 306.40244, 342.30334, 339.30002, 469.89966]
2025-05-06 18:33:07,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [240.0, 248.0, 173.0, 303.0, 202.0, 241.0, 303.0, 181.0, 178.0, 252.0]
2025-05-06 18:33:07,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (368.55) for latency SparseU15
2025-05-06 18:33:07,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 18:33:07,670 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-walker2d/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 18:33:07,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours)
2025-05-06 18:36:00,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 18:36:04,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 260.26910 ± 110.786
2025-05-06 18:36:04,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [263.87772, 124.6835, 320.9347, 194.28958, 407.54324, 333.94183, 266.83194, 365.1003, 302.62427, 22.864037]
2025-05-06 18:36:04,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [137.0, 196.0, 170.0, 120.0, 233.0, 171.0, 145.0, 216.0, 166.0, 32.0]
2025-05-06 18:36:04,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 58 minutes, 21 seconds)
2025-05-06 18:38:57,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 18:39:03,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 330.40445 ± 86.533
2025-05-06 18:39:03,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [516.6408, 305.44595, 314.93524, 246.77335, 234.74759, 356.83942, 222.90417, 321.7955, 351.50546, 432.45715]
2025-05-06 18:39:03,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [341.0, 223.0, 264.0, 172.0, 209.0, 203.0, 184.0, 210.0, 210.0, 267.0]
2025-05-06 18:39:03,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 56 minutes, 22 seconds)
2025-05-06 18:41:58,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 18:42:02,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 294.25870 ± 113.501
2025-05-06 18:42:02,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [354.35574, 296.74966, 316.84766, 113.21024, 58.56646, 403.9046, 307.2764, 404.97754, 283.77954, 402.91922]
2025-05-06 18:42:02,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [178.0, 198.0, 157.0, 117.0, 115.0, 199.0, 165.0, 201.0, 150.0, 204.0]
2025-05-06 18:42:02,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 54 minutes, 57 seconds)
2025-05-06 18:44:59,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 18:45:04,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 332.68277 ± 192.340
2025-05-06 18:45:04,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [302.83948, 570.6238, 296.6627, 574.5659, 249.02782, 60.880547, 19.359451, 276.62027, 593.2941, 382.9533]
2025-05-06 18:45:04,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [203.0, 330.0, 175.0, 310.0, 132.0, 126.0, 32.0, 156.0, 364.0, 248.0]
2025-05-06 18:45:04,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 53 minutes, 13 seconds)
2025-05-06 18:47:59,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 18:48:02,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 202.60954 ± 151.042
2025-05-06 18:48:02,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [187.95273, 19.10633, 486.35208, 177.71686, 399.73297, 42.25515, 217.3703, 16.331387, 165.85535, 313.42218]
2025-05-06 18:48:02,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [116.0, 33.0, 375.0, 104.0, 183.0, 50.0, 121.0, 27.0, 140.0, 132.0]
2025-05-06 18:48:02,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 49 minutes, 43 seconds)
2025-05-06 18:50:58,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 18:51:03,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 347.53693 ± 217.948
2025-05-06 18:51:03,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [656.26404, 567.2198, 142.54471, 371.5954, 550.38196, 498.78366, 21.918411, 201.54462, 44.095413, 421.0212]
2025-05-06 18:51:03,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [418.0, 353.0, 120.0, 250.0, 398.0, 265.0, 31.0, 189.0, 49.0, 224.0]
2025-05-06 18:51:03,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 47 minutes, 53 seconds)
2025-05-06 18:54:02,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 18:54:05,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 224.33508 ± 132.165
2025-05-06 18:54:05,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [271.82153, 260.50342, 284.2394, 126.2027, 45.693977, 517.2202, 266.0692, 28.45329, 215.57239, 227.5748]
2025-05-06 18:54:05,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [144.0, 126.0, 156.0, 152.0, 52.0, 276.0, 139.0, 48.0, 127.0, 166.0]
2025-05-06 18:54:05,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 45 minutes, 33 seconds)
2025-05-06 18:57:04,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 18:57:08,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 336.97003 ± 94.329
2025-05-06 18:57:08,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [236.57068, 514.24524, 246.21219, 320.94455, 318.92413, 268.3029, 462.62524, 443.63004, 285.41873, 272.82642]
2025-05-06 18:57:08,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [137.0, 272.0, 135.0, 216.0, 210.0, 163.0, 221.0, 232.0, 157.0, 140.0]
2025-05-06 18:57:08,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 43 minutes, 34 seconds)
2025-05-06 19:00:07,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 19:00:11,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 181.88144 ± 149.530
2025-05-06 19:00:11,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [124.14997, 126.49085, 220.32448, 20.272734, 15.546418, 182.90633, 392.55115, 212.9965, 489.6124, 33.963528]
2025-05-06 19:00:11,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [129.0, 147.0, 140.0, 31.0, 29.0, 174.0, 365.0, 351.0, 256.0, 48.0]
2025-05-06 19:00:11,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 40 minutes, 33 seconds)
2025-05-06 19:03:17,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 19:03:21,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 244.94966 ± 125.095
2025-05-06 19:03:21,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [311.3994, 45.265873, 149.935, 15.039713, 241.65933, 385.5547, 316.47418, 375.7783, 278.90826, 329.482]
2025-05-06 19:03:21,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [152.0, 49.0, 129.0, 26.0, 172.0, 191.0, 145.0, 198.0, 133.0, 160.0]
2025-05-06 19:03:21,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 40 minutes, 24 seconds)
2025-05-06 19:06:22,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 19:06:27,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 299.98767 ± 158.870
2025-05-06 19:06:27,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [327.6259, 326.91446, 20.33391, 561.18414, 20.616945, 293.19318, 385.8536, 388.87125, 273.323, 401.96042]
2025-05-06 19:06:27,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [179.0, 182.0, 32.0, 345.0, 30.0, 240.0, 197.0, 213.0, 218.0, 190.0]
2025-05-06 19:06:27,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 38 minutes, 31 seconds)
2025-05-06 19:09:27,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 19:09:31,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 255.13786 ± 137.467
2025-05-06 19:09:31,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [406.86084, 395.93082, 366.73087, 32.44864, 18.759836, 262.60422, 153.9339, 227.89247, 314.01593, 372.2012]
2025-05-06 19:09:31,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [204.0, 196.0, 194.0, 46.0, 29.0, 141.0, 126.0, 131.0, 172.0, 235.0]
2025-05-06 19:09:31,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 36 minutes, 2 seconds)
2025-05-06 19:12:31,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 19:12:34,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 177.87236 ± 169.851
2025-05-06 19:12:34,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [41.65547, 315.5069, 20.718243, 157.82974, 35.86908, 186.53522, 15.791015, 324.09308, 111.0151, 569.7098]
2025-05-06 19:12:34,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [48.0, 150.0, 31.0, 139.0, 47.0, 139.0, 28.0, 176.0, 134.0, 297.0]
2025-05-06 19:12:34,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 32 minutes, 49 seconds)
2025-05-06 19:15:33,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 19:15:39,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 392.64120 ± 227.526
2025-05-06 19:15:39,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [264.6442, 480.47485, 292.6743, 299.31052, 254.48477, 721.0462, 432.50912, 18.444704, 841.27655, 321.54678]
2025-05-06 19:15:39,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [123.0, 272.0, 134.0, 158.0, 179.0, 428.0, 235.0, 32.0, 461.0, 273.0]
2025-05-06 19:15:39,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (392.64) for latency SparseU15
2025-05-06 19:15:39,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 19:15:39,100 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-walker2d/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 19:15:39,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 30 minutes, 21 seconds)
2025-05-06 19:18:43,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 19:18:48,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 341.35022 ± 209.243
2025-05-06 19:18:48,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [354.49677, 360.01318, 288.9167, 14.829759, 405.96967, 242.73523, 128.12642, 325.58014, 443.68994, 849.1446]
2025-05-06 19:18:48,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [180.0, 187.0, 126.0, 25.0, 285.0, 143.0, 114.0, 168.0, 218.0, 563.0]
2025-05-06 19:18:48,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 27 minutes, 8 seconds)
2025-05-06 19:21:46,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 19:21:50,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 183.73685 ± 116.448
2025-05-06 19:21:50,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [302.5334, 336.59018, 266.59116, 344.87936, 138.21584, 117.87012, 15.940066, 15.145124, 149.0839, 150.51945]
2025-05-06 19:21:50,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [214.0, 157.0, 154.0, 253.0, 127.0, 130.0, 29.0, 27.0, 181.0, 154.0]
2025-05-06 19:21:50,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 23 minutes, 5 seconds)
2025-05-06 19:24:51,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 19:24:55,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 303.62274 ± 102.234
2025-05-06 19:24:55,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [453.39838, 240.88348, 144.89359, 263.90576, 264.02954, 244.31212, 334.0305, 242.37311, 345.02475, 503.37634]
2025-05-06 19:24:55,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [201.0, 149.0, 122.0, 151.0, 146.0, 138.0, 161.0, 170.0, 169.0, 244.0]
2025-05-06 19:24:55,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 20 minutes, 8 seconds)
2025-05-06 19:27:53,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 19:27:58,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 333.39703 ± 183.339
2025-05-06 19:27:58,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [271.16455, 64.272255, 385.03943, 752.5982, 198.90569, 283.60794, 306.76913, 532.174, 190.544, 348.89508]
2025-05-06 19:27:58,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [141.0, 127.0, 293.0, 466.0, 117.0, 137.0, 170.0, 250.0, 167.0, 257.0]
2025-05-06 19:27:58,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 17 minutes, 16 seconds)
2025-05-06 19:31:00,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 19:31:05,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 331.80487 ± 134.922
2025-05-06 19:31:05,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [172.87027, 244.83374, 408.98822, 655.4678, 263.54703, 216.21112, 368.52283, 417.19022, 346.43314, 223.98428]
2025-05-06 19:31:05,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [135.0, 131.0, 241.0, 441.0, 136.0, 183.0, 200.0, 221.0, 159.0, 130.0]
2025-05-06 19:31:05,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 14 minutes, 33 seconds)
2025-05-06 19:34:05,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 19:34:08,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 167.67653 ± 83.950
2025-05-06 19:34:08,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [120.743904, 218.78165, 201.04732, 251.22945, 239.57089, 238.714, 13.63668, 159.05586, 21.047258, 212.9382]
2025-05-06 19:34:08,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [140.0, 124.0, 151.0, 138.0, 135.0, 278.0, 27.0, 98.0, 32.0, 163.0]
2025-05-06 19:34:08,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 10 minutes, 8 seconds)
2025-05-06 19:37:09,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 19:37:12,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 211.84280 ± 87.869
2025-05-06 19:37:12,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [231.21852, 178.91672, 223.15746, 19.35754, 168.30202, 352.02716, 166.72006, 279.75583, 188.08157, 310.89105]
2025-05-06 19:37:12,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [132.0, 116.0, 120.0, 29.0, 115.0, 186.0, 102.0, 168.0, 112.0, 160.0]
2025-05-06 19:37:12,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 7 minutes, 30 seconds)
2025-05-06 19:40:13,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 19:40:16,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 199.41606 ± 41.965
2025-05-06 19:40:16,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [206.17706, 172.38228, 201.16866, 177.77817, 95.287224, 201.62994, 234.44743, 231.22112, 222.10529, 251.96356]
2025-05-06 19:40:16,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [118.0, 96.0, 116.0, 98.0, 116.0, 116.0, 135.0, 128.0, 115.0, 139.0]
2025-05-06 19:40:16,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 4 minutes, 12 seconds)
2025-05-06 19:43:15,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 19:43:18,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 188.67671 ± 75.017
2025-05-06 19:43:18,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [279.8274, 249.76404, 16.66901, 205.61305, 107.674126, 216.42421, 191.49535, 192.95773, 159.29823, 267.0441]
2025-05-06 19:43:18,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [143.0, 134.0, 29.0, 193.0, 122.0, 121.0, 117.0, 117.0, 104.0, 140.0]
2025-05-06 19:43:18,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 45 seconds)
2025-05-06 19:46:20,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 19:46:23,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 195.53799 ± 76.337
2025-05-06 19:46:23,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [249.2179, 15.192334, 195.706, 170.5435, 228.56903, 332.92584, 163.19308, 173.2143, 206.62376, 220.19426]
2025-05-06 19:46:23,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [265.0, 27.0, 117.0, 100.0, 134.0, 154.0, 102.0, 100.0, 122.0, 120.0]
2025-05-06 19:46:23,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 57 minutes, 30 seconds)
2025-05-06 19:49:21,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 19:49:24,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 198.53296 ± 107.787
2025-05-06 19:49:24,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [171.65208, 302.36743, 203.39552, 20.152327, 250.38342, 325.1095, 180.8029, 173.71658, 16.98175, 340.76804]
2025-05-06 19:49:24,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [103.0, 199.0, 117.0, 32.0, 127.0, 158.0, 104.0, 102.0, 30.0, 177.0]
2025-05-06 19:49:24,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 53 minutes, 59 seconds)
2025-05-06 19:52:26,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 19:52:30,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 206.52307 ± 215.519
2025-05-06 19:52:30,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [296.53058, 121.18722, 112.15791, 176.5864, 807.3812, 13.554885, 14.67837, 164.58641, 201.47281, 157.09479]
2025-05-06 19:52:30,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [153.0, 108.0, 109.0, 101.0, 514.0, 27.0, 26.0, 105.0, 112.0, 100.0]
2025-05-06 19:52:30,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 51 minutes, 18 seconds)
2025-05-06 19:55:30,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 19:55:33,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 233.03067 ± 152.833
2025-05-06 19:55:33,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [398.2932, 200.61618, 273.6362, 166.52263, 163.35307, 145.83992, 160.69055, 206.97855, 598.5454, 15.831031]
2025-05-06 19:55:33,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [228.0, 121.0, 153.0, 103.0, 94.0, 94.0, 88.0, 108.0, 350.0, 28.0]
2025-05-06 19:55:33,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 48 minutes, 9 seconds)
2025-05-06 19:58:32,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 19:58:36,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 191.75362 ± 134.963
2025-05-06 19:58:36,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [15.541624, 17.342667, 147.45515, 151.93864, 186.55319, 159.59976, 185.40321, 512.1856, 249.88632, 291.63007]
2025-05-06 19:58:36,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [27.0, 29.0, 95.0, 94.0, 194.0, 97.0, 106.0, 404.0, 146.0, 259.0]
2025-05-06 19:58:36,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 45 minutes, 14 seconds)
2025-05-06 20:01:39,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 20:01:44,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 335.75067 ± 222.368
2025-05-06 20:01:44,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [373.16357, 320.3419, 63.502357, 529.91034, 362.62787, 537.214, 426.74756, 21.068054, 21.550007, 701.3812]
2025-05-06 20:01:44,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [221.0, 185.0, 105.0, 268.0, 194.0, 260.0, 225.0, 32.0, 31.0, 397.0]
2025-05-06 20:01:44,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 42 minutes, 38 seconds)
2025-05-06 20:04:47,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 20:04:50,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 250.88928 ± 136.239
2025-05-06 20:04:50,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [241.68687, 178.99196, 14.377335, 287.974, 257.10944, 398.8661, 16.880713, 371.46124, 409.91833, 331.6268]
2025-05-06 20:04:50,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [130.0, 102.0, 26.0, 144.0, 139.0, 204.0, 32.0, 211.0, 189.0, 183.0]
2025-05-06 20:04:50,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 40 minutes, 31 seconds)
2025-05-06 20:07:52,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 20:07:57,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 352.78677 ± 149.813
2025-05-06 20:07:57,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [712.08875, 400.81506, 310.13556, 260.42023, 290.48584, 255.12537, 384.5304, 321.02286, 122.06466, 471.1789]
2025-05-06 20:07:57,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [358.0, 222.0, 151.0, 139.0, 186.0, 133.0, 169.0, 176.0, 131.0, 227.0]
2025-05-06 20:07:57,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 37 minutes, 40 seconds)
2025-05-06 20:10:57,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 20:11:01,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 272.45163 ± 242.754
2025-05-06 20:11:01,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [142.90231, 74.22554, 16.166681, 155.99933, 71.70936, 873.4122, 286.1191, 496.963, 304.3207, 302.6981]
2025-05-06 20:11:01,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [88.0, 97.0, 27.0, 92.0, 93.0, 432.0, 151.0, 230.0, 222.0, 180.0]
2025-05-06 20:11:01,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 34 minutes, 38 seconds)
2025-05-06 20:13:59,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 20:14:03,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 233.32291 ± 152.356
2025-05-06 20:14:03,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [176.74294, 598.69464, 332.61847, 82.52641, 173.40009, 308.9791, 156.83644, 19.862549, 273.03824, 210.52995]
2025-05-06 20:14:03,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [112.0, 266.0, 182.0, 136.0, 154.0, 345.0, 96.0, 44.0, 173.0, 112.0]
2025-05-06 20:14:03,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 31 minutes, 30 seconds)
2025-05-06 20:17:04,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 20:17:10,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 431.62152 ± 398.818
2025-05-06 20:17:10,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [482.70407, 98.98332, 895.4688, 201.7843, 1300.9691, 15.444946, 697.9279, 359.9676, 21.633387, 241.33188]
2025-05-06 20:17:10,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [246.0, 75.0, 474.0, 122.0, 758.0, 29.0, 395.0, 184.0, 32.0, 125.0]
2025-05-06 20:17:10,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (431.62) for latency SparseU15
2025-05-06 20:17:10,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 20:17:10,553 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-walker2d/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 20:17:10,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 28 minutes, 12 seconds)
2025-05-06 20:20:11,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 20:20:14,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 208.43291 ± 146.994
2025-05-06 20:20:14,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [403.6347, 16.184284, 17.502491, 230.24792, 270.18756, 160.68672, 289.5643, 237.24144, 17.78415, 441.2955]
2025-05-06 20:20:14,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [208.0, 28.0, 28.0, 127.0, 280.0, 101.0, 165.0, 133.0, 31.0, 197.0]
2025-05-06 20:20:14,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 24 minutes, 48 seconds)
2025-05-06 20:23:13,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 20:23:20,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 490.67358 ± 295.305
2025-05-06 20:23:20,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [433.36823, 998.40295, 584.5223, 831.2317, 14.795461, 348.49573, 701.7546, 149.181, 236.68918, 608.29474]
2025-05-06 20:23:20,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [217.0, 542.0, 275.0, 481.0, 31.0, 160.0, 339.0, 254.0, 126.0, 327.0]
2025-05-06 20:23:20,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (490.67) for latency SparseU15
2025-05-06 20:23:20,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 20:23:20,565 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-walker2d/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 20:23:20,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 21 minutes, 31 seconds)
2025-05-06 20:26:22,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 20:26:26,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 264.28458 ± 103.411
2025-05-06 20:26:26,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [207.47246, 366.66608, 288.34824, 527.089, 209.98077, 220.68275, 145.53947, 234.93741, 225.3515, 216.77771]
2025-05-06 20:26:26,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [274.0, 156.0, 149.0, 240.0, 112.0, 126.0, 142.0, 123.0, 129.0, 114.0]
2025-05-06 20:26:26,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 18 minutes, 45 seconds)
2025-05-06 20:29:26,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 20:29:30,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 300.51352 ± 167.370
2025-05-06 20:29:30,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [19.010834, 574.1277, 303.01392, 255.56119, 241.17227, 432.28964, 295.9927, 98.10118, 542.9862, 242.87936]
2025-05-06 20:29:30,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [29.0, 309.0, 143.0, 135.0, 120.0, 201.0, 153.0, 151.0, 264.0, 140.0]
2025-05-06 20:29:30,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 15 minutes, 59 seconds)
2025-05-06 20:32:32,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 20:32:38,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 408.18085 ± 223.512
2025-05-06 20:32:38,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [333.45398, 232.455, 380.8649, 329.98184, 571.7859, 531.3238, 297.79138, 184.23573, 242.15628, 977.7598]
2025-05-06 20:32:38,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [157.0, 138.0, 241.0, 186.0, 282.0, 272.0, 144.0, 122.0, 124.0, 468.0]
2025-05-06 20:32:38,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 12 minutes, 56 seconds)
2025-05-06 20:35:40,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 20:35:43,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 240.69693 ± 194.687
2025-05-06 20:35:43,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [413.3453, 286.9141, 168.51523, 235.41026, 180.08632, 251.7554, 712.0787, 135.02248, 14.096399, 9.745157]
2025-05-06 20:35:43,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [254.0, 135.0, 115.0, 127.0, 112.0, 175.0, 363.0, 116.0, 26.0, 23.0]
2025-05-06 20:35:43,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 10 minutes, 4 seconds)
2025-05-06 20:38:44,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 20:38:49,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 328.85263 ± 231.703
2025-05-06 20:38:49,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [566.83655, 270.2264, 525.5785, 143.60587, 13.644514, 311.32486, 314.12524, 336.6258, 785.87103, 20.687393]
2025-05-06 20:38:49,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [436.0, 241.0, 251.0, 142.0, 24.0, 183.0, 179.0, 186.0, 429.0, 31.0]
2025-05-06 20:38:49,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 6 minutes, 57 seconds)
2025-05-06 20:41:51,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 20:41:54,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 208.49185 ± 103.569
2025-05-06 20:41:54,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [318.8905, 212.08113, 268.76953, 190.20514, 310.33517, 282.51437, 22.638721, 20.54519, 182.04037, 276.8983]
2025-05-06 20:41:54,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [167.0, 111.0, 159.0, 109.0, 142.0, 147.0, 32.0, 33.0, 111.0, 159.0]
2025-05-06 20:41:54,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 3 minutes, 46 seconds)
2025-05-06 20:44:59,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 20:45:04,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 296.32990 ± 96.361
2025-05-06 20:45:04,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [259.10794, 138.46463, 356.28647, 300.37228, 485.99915, 394.4436, 274.63687, 335.55945, 217.80719, 200.6215]
2025-05-06 20:45:04,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [160.0, 187.0, 297.0, 166.0, 328.0, 233.0, 145.0, 203.0, 119.0, 111.0]
2025-05-06 20:45:04,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 1 minute, 23 seconds)
2025-05-06 20:48:03,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 20:48:07,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 293.97449 ± 155.821
2025-05-06 20:48:07,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [174.47813, 188.3591, 184.86116, 18.390451, 304.58917, 337.2842, 377.70233, 626.35834, 375.49432, 352.22763]
2025-05-06 20:48:07,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [109.0, 116.0, 113.0, 29.0, 151.0, 205.0, 206.0, 317.0, 174.0, 176.0]
2025-05-06 20:48:07,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 57 minutes, 43 seconds)
2025-05-06 20:51:07,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 20:51:11,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 297.71149 ± 345.771
2025-05-06 20:51:11,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [275.91318, 18.112476, 1291.366, 350.88675, 188.22002, 210.89787, 240.81851, 236.01663, 143.67377, 21.209373]
2025-05-06 20:51:11,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [146.0, 28.0, 653.0, 283.0, 101.0, 207.0, 136.0, 126.0, 102.0, 32.0]
2025-05-06 20:51:11,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 54 minutes, 24 seconds)
2025-05-06 20:54:12,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 20:54:17,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 330.46017 ± 233.004
2025-05-06 20:54:17,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [389.83237, 351.4287, 13.121405, 198.08401, 13.288189, 826.33575, 226.31639, 346.6828, 580.0077, 359.50412]
2025-05-06 20:54:17,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [217.0, 192.0, 25.0, 183.0, 25.0, 467.0, 127.0, 164.0, 269.0, 164.0]
2025-05-06 20:54:17,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 51 minutes, 18 seconds)
2025-05-06 20:57:18,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 20:57:23,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 289.86426 ± 140.230
2025-05-06 20:57:23,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [327.39328, 197.88953, 124.95032, 206.66139, 156.5934, 527.83234, 368.20712, 322.5743, 149.84384, 516.69714]
2025-05-06 20:57:23,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [151.0, 115.0, 213.0, 107.0, 253.0, 262.0, 176.0, 166.0, 224.0, 281.0]
2025-05-06 20:57:23,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 48 minutes, 19 seconds)
2025-05-06 21:00:27,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 21:00:31,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 279.04022 ± 160.854
2025-05-06 21:00:31,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [513.07654, 330.78305, 218.98184, 269.01794, 425.6232, 115.34395, 16.011656, 192.42958, 179.63045, 529.50397]
2025-05-06 21:00:31,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [191.0, 166.0, 117.0, 146.0, 203.0, 181.0, 27.0, 102.0, 108.0, 280.0]
2025-05-06 21:00:31,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 45 minutes)
2025-05-06 21:03:30,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 21:03:35,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 285.19647 ± 199.578
2025-05-06 21:03:35,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [377.54193, 804.14624, 18.365181, 248.95049, 174.80193, 299.46457, 303.37915, 127.918274, 186.08362, 311.31335]
2025-05-06 21:03:35,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [276.0, 344.0, 33.0, 133.0, 104.0, 222.0, 234.0, 187.0, 111.0, 142.0]
2025-05-06 21:03:35,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 42 minutes, 1 second)
2025-05-06 21:06:38,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 21:06:42,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 340.81549 ± 181.092
2025-05-06 21:06:42,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [364.99695, 252.80452, 22.644484, 415.94104, 439.8367, 411.78458, 459.1622, 617.0921, 404.7453, 19.147112]
2025-05-06 21:06:42,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [187.0, 137.0, 32.0, 207.0, 185.0, 205.0, 211.0, 285.0, 168.0, 32.0]
2025-05-06 21:06:42,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 39 minutes, 17 seconds)
2025-05-06 21:09:45,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 21:09:52,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 475.24750 ± 311.465
2025-05-06 21:09:52,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [158.36423, 19.492472, 488.8255, 671.30176, 764.954, 910.3858, 846.3753, 20.440714, 471.07864, 401.25687]
2025-05-06 21:09:52,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [246.0, 31.0, 212.0, 256.0, 511.0, 429.0, 516.0, 33.0, 210.0, 169.0]
2025-05-06 21:09:52,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 36 minutes, 36 seconds)
2025-05-06 21:12:49,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 21:12:52,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 248.92142 ± 225.629
2025-05-06 21:12:52,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [344.1926, 15.507569, 674.10187, 333.89682, 22.191875, 431.33047, 486.9518, 139.95952, 20.574411, 20.507431]
2025-05-06 21:12:52,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [180.0, 26.0, 303.0, 189.0, 31.0, 197.0, 197.0, 149.0, 33.0, 32.0]
2025-05-06 21:12:52,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 32 minutes, 58 seconds)
2025-05-06 21:15:56,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 21:15:59,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 299.20767 ± 207.295
2025-05-06 21:15:59,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [17.200714, 729.2336, 421.03223, 260.06357, 391.68506, 12.904607, 367.9835, 427.85867, 112.64651, 251.46814]
2025-05-06 21:15:59,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [28.0, 320.0, 175.0, 189.0, 155.0, 23.0, 147.0, 192.0, 74.0, 197.0]
2025-05-06 21:15:59,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 29 minutes, 46 seconds)
2025-05-06 21:19:04,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 21:19:08,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 360.95346 ± 177.607
2025-05-06 21:19:08,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [22.097643, 496.9524, 17.059084, 522.18134, 490.77832, 405.2864, 486.82825, 404.50983, 396.49326, 367.34802]
2025-05-06 21:19:08,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [32.0, 193.0, 31.0, 219.0, 200.0, 161.0, 202.0, 161.0, 166.0, 142.0]
2025-05-06 21:19:08,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 27 minutes, 5 seconds)
2025-05-06 21:22:09,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 21:22:13,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 405.48904 ± 258.933
2025-05-06 21:22:13,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [239.3069, 424.6776, 14.499776, 370.33243, 382.90503, 811.169, 629.28033, 15.551224, 753.45844, 413.70975]
2025-05-06 21:22:13,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [178.0, 211.0, 25.0, 325.0, 161.0, 339.0, 241.0, 25.0, 283.0, 161.0]
2025-05-06 21:22:13,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 23 minutes, 49 seconds)
2025-05-06 21:25:11,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 21:25:15,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 427.80713 ± 172.233
2025-05-06 21:25:15,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [428.38098, 17.829676, 747.7668, 389.47705, 496.26178, 343.04135, 405.5591, 484.34454, 443.16217, 522.2479]
2025-05-06 21:25:15,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [166.0, 28.0, 260.0, 161.0, 215.0, 167.0, 163.0, 191.0, 201.0, 223.0]
2025-05-06 21:25:15,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 20 minutes, 1 second)
2025-05-06 21:28:16,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 21:28:20,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 406.26779 ± 177.929
2025-05-06 21:28:20,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [438.6762, 17.613066, 418.5393, 187.56735, 359.19955, 505.02533, 441.11893, 534.1111, 465.69067, 695.1364]
2025-05-06 21:28:20,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [173.0, 29.0, 170.0, 157.0, 156.0, 208.0, 180.0, 235.0, 195.0, 278.0]
2025-05-06 21:28:20,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 17 minutes, 18 seconds)
2025-05-06 21:31:22,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 21:31:28,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 452.81696 ± 304.026
2025-05-06 21:31:28,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [1225.2457, 494.24445, 400.78925, 671.42096, 395.1219, 468.51135, 301.9167, 267.42386, 281.0947, 22.400621]
2025-05-06 21:31:28,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [637.0, 199.0, 162.0, 418.0, 170.0, 186.0, 227.0, 189.0, 233.0, 32.0]
2025-05-06 21:31:28,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 14 minutes, 17 seconds)
2025-05-06 21:34:30,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 21:34:35,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 471.61774 ± 337.047
2025-05-06 21:34:35,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [12.383962, 19.048542, 417.71295, 1293.8949, 597.0972, 428.00467, 371.5037, 579.69934, 513.5707, 483.26144]
2025-05-06 21:34:35,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [24.0, 32.0, 167.0, 513.0, 261.0, 168.0, 148.0, 255.0, 191.0, 210.0]
2025-05-06 21:34:35,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 11 minutes, 4 seconds)
2025-05-06 21:37:34,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 21:37:38,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 445.03824 ± 274.788
2025-05-06 21:37:38,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [506.23685, 534.4641, 453.26703, 476.54648, 21.101633, 1056.7421, 490.47906, 389.5698, 501.28885, 20.686596]
2025-05-06 21:37:38,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [202.0, 217.0, 171.0, 189.0, 31.0, 406.0, 213.0, 152.0, 201.0, 32.0]
2025-05-06 21:37:38,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 7 minutes, 49 seconds)
2025-05-06 21:40:42,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 21:40:47,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 414.36859 ± 330.173
2025-05-06 21:40:47,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [17.896322, 457.21674, 531.36676, 515.75183, 15.687833, 889.5808, 368.805, 316.80875, 1010.4687, 20.103685]
2025-05-06 21:40:47,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [31.0, 195.0, 240.0, 224.0, 27.0, 418.0, 277.0, 229.0, 422.0, 32.0]
2025-05-06 21:40:47,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 5 minutes, 16 seconds)
2025-05-06 21:43:47,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 21:43:52,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 554.83429 ± 324.297
2025-05-06 21:43:52,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [1424.4512, 502.86282, 127.10599, 694.3586, 610.7972, 481.70343, 507.83505, 371.089, 358.15866, 469.98093]
2025-05-06 21:43:52,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [543.0, 216.0, 81.0, 276.0, 248.0, 187.0, 191.0, 150.0, 254.0, 168.0]
2025-05-06 21:43:52,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (554.83) for latency SparseU15
2025-05-06 21:43:52,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 21:43:52,981 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-walker2d/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 21:43:52,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 2 minutes, 9 seconds)
2025-05-06 21:46:56,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 21:47:00,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 351.67044 ± 230.922
2025-05-06 21:47:00,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [246.6131, 370.55676, 114.292625, 512.6509, 531.47345, 458.05978, 728.6242, 521.77246, 18.744808, 13.916199]
2025-05-06 21:47:00,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [190.0, 150.0, 74.0, 207.0, 218.0, 178.0, 452.0, 224.0, 28.0, 25.0]
2025-05-06 21:47:00,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 82/100 (estimated time remaining: 59 minutes, 1 second)
2025-05-06 21:50:03,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 21:50:07,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 416.05194 ± 307.798
2025-05-06 21:50:07,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [109.769005, 476.6934, 304.0233, 15.002835, 111.253944, 469.83337, 493.76578, 1162.6361, 501.65472, 515.887]
2025-05-06 21:50:07,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [72.0, 179.0, 132.0, 28.0, 74.0, 182.0, 194.0, 488.0, 207.0, 209.0]
2025-05-06 21:50:07,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 83/100 (estimated time remaining: 55 minutes, 56 seconds)
2025-05-06 21:53:04,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 21:53:08,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 371.69260 ± 299.031
2025-05-06 21:53:08,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [494.11868, 455.81247, 709.6297, 19.035307, 490.4902, 486.20752, 18.946865, 910.5696, 114.49483, 17.620747]
2025-05-06 21:53:08,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [197.0, 183.0, 253.0, 30.0, 185.0, 191.0, 28.0, 335.0, 75.0, 31.0]
2025-05-06 21:53:08,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 84/100 (estimated time remaining: 52 minutes, 39 seconds)
2025-05-06 21:56:11,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 21:56:16,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 559.90656 ± 229.003
2025-05-06 21:56:16,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [278.54608, 555.2859, 488.8176, 407.4774, 1069.3247, 821.4496, 722.3907, 455.35526, 418.92184, 381.4959]
2025-05-06 21:56:16,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [204.0, 220.0, 194.0, 194.0, 387.0, 285.0, 306.0, 178.0, 171.0, 167.0]
2025-05-06 21:56:16,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (559.91) for latency SparseU15
2025-05-06 21:56:16,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 21:56:16,744 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-walker2d/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 21:56:16,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 85/100 (estimated time remaining: 49 minutes, 32 seconds)
2025-05-06 21:59:18,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 21:59:25,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 664.18433 ± 485.436
2025-05-06 21:59:25,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [1927.2355, 753.77167, 707.1826, 346.45282, 401.7007, 19.885443, 894.76666, 402.71512, 726.2967, 461.83582]
2025-05-06 21:59:25,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [734.0, 274.0, 254.0, 138.0, 165.0, 31.0, 341.0, 161.0, 259.0, 221.0]
2025-05-06 21:59:25,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (664.18) for latency SparseU15
2025-05-06 21:59:25,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 21:59:25,364 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-walker2d/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 21:59:25,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 86/100 (estimated time remaining: 46 minutes, 37 seconds)
2025-05-06 22:02:26,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 22:02:32,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 604.06818 ± 310.078
2025-05-06 22:02:32,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [832.7874, 309.67026, 446.56613, 1228.2751, 147.03906, 846.57654, 421.29517, 513.03644, 865.51337, 429.92245]
2025-05-06 22:02:32,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [301.0, 133.0, 168.0, 477.0, 239.0, 306.0, 171.0, 202.0, 315.0, 166.0]
2025-05-06 22:02:32,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 87/100 (estimated time remaining: 43 minutes, 29 seconds)
2025-05-06 22:05:37,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 22:05:44,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 582.20862 ± 402.099
2025-05-06 22:05:44,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [890.708, 19.187037, 1136.1438, 913.4805, 273.06808, 17.534725, 338.2789, 859.3588, 359.05176, 1015.27496]
2025-05-06 22:05:44,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [411.0, 30.0, 462.0, 419.0, 232.0, 27.0, 301.0, 345.0, 152.0, 431.0]
2025-05-06 22:05:44,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 88/100 (estimated time remaining: 40 minutes, 35 seconds)
2025-05-06 22:08:45,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 22:08:50,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 477.12970 ± 217.270
2025-05-06 22:08:50,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [462.68256, 430.61115, 403.95645, 21.173399, 814.6403, 451.9382, 729.75256, 710.2617, 340.7745, 405.5059]
2025-05-06 22:08:50,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [176.0, 168.0, 160.0, 31.0, 275.0, 170.0, 260.0, 244.0, 154.0, 186.0]
2025-05-06 22:08:50,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 89/100 (estimated time remaining: 37 minutes, 40 seconds)
2025-05-06 22:11:44,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 22:11:49,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 549.57800 ± 446.706
2025-05-06 22:11:49,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [644.51514, 384.56604, 659.4552, 358.63623, 110.61329, 107.171585, 611.6662, 1769.1066, 416.57196, 433.47763]
2025-05-06 22:11:49,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [289.0, 156.0, 270.0, 147.0, 73.0, 70.0, 243.0, 675.0, 162.0, 170.0]
2025-05-06 22:11:49,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 90/100 (estimated time remaining: 34 minutes, 12 seconds)
2025-05-06 22:14:52,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 22:14:57,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 554.00537 ± 407.806
2025-05-06 22:14:57,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [491.29477, 421.47794, 1322.1196, 103.713455, 979.5477, 21.253616, 708.2251, 792.50336, 681.5838, 18.33411]
2025-05-06 22:14:57,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [183.0, 162.0, 497.0, 74.0, 333.0, 31.0, 254.0, 272.0, 384.0, 30.0]
2025-05-06 22:14:57,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 91/100 (estimated time remaining: 31 minutes, 4 seconds)
2025-05-06 22:17:59,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 22:18:04,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 465.16739 ± 359.973
2025-05-06 22:18:04,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [17.99994, 1039.5624, 15.8073225, 16.037697, 381.3114, 749.9998, 531.23486, 443.35825, 472.33115, 984.0315]
2025-05-06 22:18:04,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [30.0, 415.0, 31.0, 30.0, 149.0, 287.0, 226.0, 177.0, 274.0, 356.0]
2025-05-06 22:18:04,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 92/100 (estimated time remaining: 27 minutes, 57 seconds)
2025-05-06 22:21:09,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 22:21:14,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 492.67133 ± 344.479
2025-05-06 22:21:14,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [793.81946, 117.37013, 478.82208, 398.0976, 11.394299, 744.344, 1078.5282, 500.69754, 783.4025, 20.237564]
2025-05-06 22:21:14,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [306.0, 75.0, 172.0, 207.0, 22.0, 266.0, 372.0, 181.0, 258.0, 32.0]
2025-05-06 22:21:14,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 93/100 (estimated time remaining: 24 minutes, 48 seconds)
2025-05-06 22:24:14,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 22:24:20,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 636.12616 ± 369.522
2025-05-06 22:24:20,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [150.72064, 1073.3943, 20.616184, 727.6616, 384.0501, 1009.1058, 552.3568, 483.67084, 1182.5863, 777.09937]
2025-05-06 22:24:20,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [87.0, 361.0, 33.0, 265.0, 151.0, 429.0, 245.0, 184.0, 410.0, 253.0]
2025-05-06 22:24:20,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 94/100 (estimated time remaining: 21 minutes, 42 seconds)
2025-05-06 22:27:20,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 22:27:25,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 538.11035 ± 261.111
2025-05-06 22:27:25,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [444.07346, 383.3638, 688.83405, 798.26556, 266.0341, 894.577, 18.348589, 735.73334, 425.21066, 726.66284]
2025-05-06 22:27:25,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [162.0, 144.0, 240.0, 266.0, 119.0, 312.0, 29.0, 262.0, 164.0, 261.0]
2025-05-06 22:27:25,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 95/100 (estimated time remaining: 18 minutes, 42 seconds)
2025-05-06 22:30:30,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 22:30:34,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 332.06870 ± 229.163
2025-05-06 22:30:34,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [18.38181, 154.32593, 774.48944, 126.15976, 503.80167, 534.958, 117.60216, 439.36267, 445.77982, 205.82559]
2025-05-06 22:30:34,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [32.0, 90.0, 286.0, 81.0, 184.0, 198.0, 80.0, 170.0, 165.0, 103.0]
2025-05-06 22:30:34,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 96/100 (estimated time remaining: 15 minutes, 36 seconds)
2025-05-06 22:33:36,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 22:33:45,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 847.60645 ± 946.819
2025-05-06 22:33:45,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [2499.0056, 412.9342, 359.65616, 795.296, 16.0819, 305.91577, 18.441013, 24.444292, 1431.0792, 2613.2104]
2025-05-06 22:33:45,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 156.0, 151.0, 295.0, 28.0, 204.0, 29.0, 33.0, 628.0, 1000.0]
2025-05-06 22:33:45,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (847.61) for latency SparseU15
2025-05-06 22:33:45,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 22:33:45,541 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-walker2d/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 22:33:45,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 97/100 (estimated time remaining: 12 minutes, 32 seconds)
2025-05-06 22:36:46,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 22:36:53,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 676.99182 ± 324.815
2025-05-06 22:36:53,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [119.47263, 997.81934, 461.98163, 350.18094, 451.6907, 870.95184, 1064.1566, 809.03174, 513.8477, 1130.785]
2025-05-06 22:36:53,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [83.0, 424.0, 182.0, 139.0, 173.0, 352.0, 365.0, 278.0, 190.0, 436.0]
2025-05-06 22:36:53,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 98/100 (estimated time remaining: 9 minutes, 23 seconds)
2025-05-06 22:39:54,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 22:40:02,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 875.08691 ± 747.723
2025-05-06 22:40:02,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [1406.1218, 771.9816, 2781.1814, 363.848, 443.9239, 449.69464, 297.80383, 356.00467, 478.81097, 1401.4984]
2025-05-06 22:40:02,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [495.0, 256.0, 1000.0, 146.0, 175.0, 179.0, 132.0, 148.0, 183.0, 472.0]
2025-05-06 22:40:02,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1124 [INFO]: New best (875.09) for latency SparseU15
2025-05-06 22:40:02,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 22:40:02,864 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-walker2d/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 22:40:02,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 16 seconds)
2025-05-06 22:43:07,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 22:43:13,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 633.84180 ± 318.887
2025-05-06 22:43:13,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [551.3023, 570.2359, 752.6282, 1043.5182, 20.209885, 691.0698, 277.5279, 459.89413, 1125.6461, 846.38574]
2025-05-06 22:43:13,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [230.0, 199.0, 285.0, 331.0, 31.0, 252.0, 155.0, 184.0, 379.0, 329.0]
2025-05-06 22:43:13,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1097 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 9 seconds)
2025-05-06 22:46:18,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 22:46:24,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1119 [DEBUG]: Total Reward: 601.72400 ± 283.092
2025-05-06 22:46:24,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1120 [DEBUG]: All rewards: [548.06055, 422.67828, 278.08978, 389.31442, 1115.6434, 1055.202, 591.5686, 657.7991, 717.88226, 241.00078]
2025-05-06 22:46:24,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [278.0, 168.0, 171.0, 155.0, 386.0, 363.0, 247.0, 255.0, 271.0, 188.0]
2025-05-06 22:46:24,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1149 [DEBUG]: Training session finished
