2025-05-06 12:58:46,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-bpql-mem32
2025-05-06 12:58:46,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc3/noisy-hopper/SparseU15-bpql-mem32
2025-05-06 12:58:46,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1008 [DEBUG]: args.trainer_eval_latencies: {'SparseU15': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7f566d3cca00>}
2025-05-06 12:58:46,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1009 [DEBUG]: using device: cpu
2025-05-06 12:58:46,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1031 [INFO]: Creating new trainer
2025-05-06 12:58:46,712 baseline-bpql-noisy-hopper:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=107, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-05-06 12:58:46,712 baseline-bpql-noisy-hopper:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-06 12:58:46,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1092 [DEBUG]: Starting training session...
2025-05-06 12:58:46,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 1/100
2025-05-06 13:01:17,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:01:18,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 84.30641 ± 17.168
2025-05-06 13:01:18,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [62.411648, 101.92851, 64.279915, 103.19993, 69.897255, 77.786, 105.96363, 73.4655, 77.195694, 106.93609]
2025-05-06 13:01:18,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [47.0, 73.0, 49.0, 71.0, 54.0, 60.0, 73.0, 56.0, 56.0, 74.0]
2025-05-06 13:01:18,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (84.31) for latency SparseU15
2025-05-06 13:01:18,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-06 13:01:18,895 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 13:01:18,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 10 minutes, 46 seconds)
2025-05-06 13:03:59,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:04:02,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 115.47550 ± 95.214
2025-05-06 13:04:02,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [102.369896, 212.84828, 46.70753, 29.799309, 41.08703, 138.54338, 86.452614, 345.69623, 129.69809, 21.55259]
2025-05-06 13:04:02,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [130.0, 231.0, 63.0, 33.0, 42.0, 161.0, 110.0, 359.0, 154.0, 24.0]
2025-05-06 13:04:02,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (115.48) for latency SparseU15
2025-05-06 13:04:02,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-06 13:04:02,941 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 13:04:02,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 18 minutes, 5 seconds)
2025-05-06 13:06:45,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:06:47,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 111.52287 ± 99.497
2025-05-06 13:06:47,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [332.07825, 52.63414, 29.259365, 37.698223, 259.49033, 73.45523, 60.866375, 43.71009, 158.59015, 67.44659]
2025-05-06 13:06:47,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [288.0, 54.0, 32.0, 40.0, 225.0, 74.0, 54.0, 47.0, 125.0, 62.0]
2025-05-06 13:06:47,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 19 minutes, 7 seconds)
2025-05-06 13:09:26,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:09:28,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 127.50309 ± 77.260
2025-05-06 13:09:28,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [70.3966, 200.36401, 61.00745, 71.46516, 125.49738, 57.985283, 122.56606, 62.75713, 208.14684, 294.84494]
2025-05-06 13:09:28,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [60.0, 125.0, 52.0, 60.0, 91.0, 54.0, 84.0, 53.0, 112.0, 154.0]
2025-05-06 13:09:28,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (127.50) for latency SparseU15
2025-05-06 13:09:28,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-06 13:09:28,411 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 13:09:28,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 16 minutes, 36 seconds)
2025-05-06 13:12:11,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:12:13,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 122.10008 ± 60.070
2025-05-06 13:12:13,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [45.86414, 29.058609, 231.939, 136.18842, 85.676636, 88.08476, 100.94454, 178.29195, 163.69592, 161.25693]
2025-05-06 13:12:13,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [45.0, 32.0, 114.0, 87.0, 60.0, 62.0, 67.0, 100.0, 99.0, 101.0]
2025-05-06 13:12:13,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 15 minutes, 23 seconds)
2025-05-06 13:14:51,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:14:53,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 102.59001 ± 102.292
2025-05-06 13:14:53,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [72.09284, 78.814865, 59.57582, 155.23438, 68.20259, 390.67612, 59.57209, 24.037039, 23.51699, 94.177414]
2025-05-06 13:14:53,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [64.0, 64.0, 56.0, 114.0, 57.0, 210.0, 56.0, 33.0, 28.0, 79.0]
2025-05-06 13:14:53,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 15 minutes, 16 seconds)
2025-05-06 13:17:34,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:17:36,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 118.15383 ± 17.274
2025-05-06 13:17:36,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [104.08665, 114.26631, 131.25531, 111.63078, 113.98095, 86.25848, 146.36784, 125.7585, 141.2939, 106.63951]
2025-05-06 13:17:36,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [68.0, 89.0, 89.0, 83.0, 82.0, 65.0, 109.0, 83.0, 80.0, 68.0]
2025-05-06 13:17:36,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 12 minutes, 15 seconds)
2025-05-06 13:20:19,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:20:21,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 129.63370 ± 52.475
2025-05-06 13:20:21,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [115.47122, 65.2184, 64.1658, 231.90321, 113.968124, 161.60687, 75.09161, 122.815865, 186.87215, 159.22375]
2025-05-06 13:20:21,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [76.0, 54.0, 51.0, 145.0, 77.0, 111.0, 59.0, 87.0, 103.0, 101.0]
2025-05-06 13:20:21,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (129.63) for latency SparseU15
2025-05-06 13:20:21,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-06 13:20:21,412 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 13:20:21,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 9 minutes, 31 seconds)
2025-05-06 13:23:05,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:23:08,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 188.68129 ± 60.327
2025-05-06 13:23:08,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [175.20576, 89.076324, 236.918, 208.29152, 217.36932, 305.08624, 166.45358, 124.7576, 134.22665, 229.42775]
2025-05-06 13:23:08,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [111.0, 62.0, 136.0, 140.0, 123.0, 152.0, 95.0, 82.0, 106.0, 123.0]
2025-05-06 13:23:08,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (188.68) for latency SparseU15
2025-05-06 13:23:08,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-06 13:23:08,339 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 13:23:08,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 8 minutes, 42 seconds)
2025-05-06 13:25:53,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:25:55,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 100.78339 ± 83.446
2025-05-06 13:25:55,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [28.083328, 25.159275, 65.161766, 287.29712, 116.11803, 81.0486, 84.14363, 69.85767, 26.755373, 224.20908]
2025-05-06 13:25:55,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [31.0, 31.0, 56.0, 153.0, 94.0, 66.0, 61.0, 59.0, 30.0, 131.0]
2025-05-06 13:25:55,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 6 minutes, 39 seconds)
2025-05-06 13:28:38,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:28:40,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 88.73613 ± 35.390
2025-05-06 13:28:40,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [60.54959, 110.36693, 63.132618, 50.654446, 137.19728, 96.0919, 131.07301, 55.46822, 135.2348, 47.592514]
2025-05-06 13:28:40,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [52.0, 84.0, 54.0, 47.0, 84.0, 77.0, 111.0, 50.0, 92.0, 50.0]
2025-05-06 13:28:40,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 5 minutes, 16 seconds)
2025-05-06 13:31:22,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:31:24,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 106.76204 ± 75.818
2025-05-06 13:31:24,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [122.93182, 91.3018, 87.927795, 55.2429, 118.70158, 119.49746, 38.94462, 312.96542, 93.12396, 26.982918]
2025-05-06 13:31:24,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [87.0, 72.0, 65.0, 52.0, 123.0, 85.0, 40.0, 160.0, 71.0, 29.0]
2025-05-06 13:31:24,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 2 minutes, 45 seconds)
2025-05-06 13:34:11,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:34:15,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 162.11943 ± 128.894
2025-05-06 13:34:15,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [98.6366, 166.15094, 201.65804, 110.679596, 124.6413, 520.2289, 53.12218, 54.623486, 99.57624, 191.87712]
2025-05-06 13:34:15,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [103.0, 152.0, 199.0, 122.0, 151.0, 557.0, 58.0, 52.0, 120.0, 207.0]
2025-05-06 13:34:15,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 2 minutes)
2025-05-06 13:37:00,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:37:02,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 77.73020 ± 68.645
2025-05-06 13:37:02,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [82.64818, 43.572426, 24.018139, 41.87371, 103.71107, 30.326805, 271.1421, 78.01053, 51.003258, 50.995773]
2025-05-06 13:37:02,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [72.0, 49.0, 27.0, 44.0, 77.0, 32.0, 174.0, 61.0, 45.0, 48.0]
2025-05-06 13:37:02,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 15/100 (estimated time remaining: 3 hours, 59 minutes, 4 seconds)
2025-05-06 13:39:47,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:39:49,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 112.66652 ± 33.166
2025-05-06 13:39:49,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [74.043816, 65.21816, 133.7531, 127.19653, 117.79319, 66.02243, 133.83452, 124.22702, 174.39891, 110.17748]
2025-05-06 13:39:49,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [61.0, 64.0, 88.0, 103.0, 79.0, 56.0, 101.0, 106.0, 110.0, 83.0]
2025-05-06 13:39:49,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 16/100 (estimated time remaining: 3 hours, 56 minutes, 21 seconds)
2025-05-06 13:42:33,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:42:35,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 98.70681 ± 49.609
2025-05-06 13:42:35,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [65.6211, 67.752884, 161.7418, 176.46268, 53.58422, 41.512108, 175.9262, 80.0416, 95.927315, 68.49821]
2025-05-06 13:42:35,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [55.0, 59.0, 98.0, 101.0, 48.0, 43.0, 108.0, 61.0, 64.0, 53.0]
2025-05-06 13:42:35,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 53 minutes, 46 seconds)
2025-05-06 13:45:20,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:45:21,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 104.63301 ± 25.640
2025-05-06 13:45:21,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [148.83847, 97.08239, 81.19433, 92.371254, 96.39486, 64.88501, 93.383354, 100.14982, 143.05331, 128.97734]
2025-05-06 13:45:21,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [94.0, 71.0, 58.0, 70.0, 75.0, 56.0, 70.0, 65.0, 90.0, 89.0]
2025-05-06 13:45:21,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 51 minutes, 43 seconds)
2025-05-06 13:48:04,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:48:06,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 107.16775 ± 65.431
2025-05-06 13:48:06,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [119.893074, 24.288757, 56.214123, 135.9595, 130.60884, 42.97915, 91.14688, 65.49322, 141.31537, 263.77853]
2025-05-06 13:48:06,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [93.0, 33.0, 53.0, 87.0, 87.0, 38.0, 65.0, 52.0, 89.0, 140.0]
2025-05-06 13:48:06,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 47 minutes, 5 seconds)
2025-05-06 13:50:51,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:50:53,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 111.15680 ± 54.456
2025-05-06 13:50:53,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [167.13568, 26.000814, 104.50314, 24.686237, 129.42538, 204.54953, 93.84427, 85.42636, 150.03362, 125.962906]
2025-05-06 13:50:53,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [104.0, 28.0, 74.0, 30.0, 93.0, 115.0, 67.0, 64.0, 88.0, 78.0]
2025-05-06 13:50:53,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 44 minutes, 24 seconds)
2025-05-06 13:53:37,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:53:40,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 166.60394 ± 58.982
2025-05-06 13:53:40,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [102.16463, 129.70094, 288.5792, 141.07094, 139.78044, 181.83575, 84.775696, 212.41637, 229.05014, 156.66533]
2025-05-06 13:53:40,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [82.0, 89.0, 155.0, 95.0, 94.0, 115.0, 72.0, 118.0, 125.0, 107.0]
2025-05-06 13:53:40,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 41 minutes, 30 seconds)
2025-05-06 13:56:25,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:56:27,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 147.27231 ± 89.953
2025-05-06 13:56:27,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [28.543335, 185.61546, 29.796953, 112.721664, 335.7027, 162.64694, 92.85537, 100.54041, 235.03932, 189.26105]
2025-05-06 13:56:27,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [29.0, 117.0, 32.0, 75.0, 168.0, 102.0, 74.0, 74.0, 127.0, 109.0]
2025-05-06 13:56:27,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 39 minutes, 6 seconds)
2025-05-06 13:59:12,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:59:14,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 143.17331 ± 74.905
2025-05-06 13:59:14,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [98.57992, 104.46429, 107.58668, 137.08997, 85.39261, 146.72462, 285.77856, 83.39358, 92.84077, 289.8821]
2025-05-06 13:59:14,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [78.0, 66.0, 80.0, 93.0, 66.0, 94.0, 145.0, 60.0, 69.0, 148.0]
2025-05-06 13:59:14,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 36 minutes, 26 seconds)
2025-05-06 14:01:59,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:02:02,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 210.96672 ± 64.822
2025-05-06 14:02:02,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [108.62413, 232.7388, 207.85182, 310.4145, 312.52682, 223.61386, 245.02051, 172.65952, 138.95132, 157.26598]
2025-05-06 14:02:02,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [79.0, 127.0, 117.0, 150.0, 154.0, 135.0, 134.0, 101.0, 93.0, 107.0]
2025-05-06 14:02:02,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (210.97) for latency SparseU15
2025-05-06 14:02:02,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-06 14:02:02,597 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 14:02:02,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 34 minutes, 32 seconds)
2025-05-06 14:04:51,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:04:53,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 184.47639 ± 99.900
2025-05-06 14:04:53,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [24.247997, 28.901659, 224.99828, 168.51886, 249.6668, 294.4517, 187.28194, 352.29102, 189.03035, 125.37539]
2025-05-06 14:04:53,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [28.0, 30.0, 124.0, 95.0, 142.0, 152.0, 109.0, 167.0, 115.0, 94.0]
2025-05-06 14:04:53,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 32 minutes, 55 seconds)
2025-05-06 14:07:40,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:07:42,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 187.80498 ± 59.413
2025-05-06 14:07:42,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [172.59213, 113.26956, 161.44803, 244.45563, 163.24823, 333.51584, 158.28052, 173.08128, 143.49182, 214.6668]
2025-05-06 14:07:42,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [98.0, 81.0, 105.0, 137.0, 93.0, 173.0, 99.0, 111.0, 94.0, 116.0]
2025-05-06 14:07:42,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 30 minutes, 37 seconds)
2025-05-06 14:10:28,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:10:30,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 149.12123 ± 35.737
2025-05-06 14:10:30,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [203.06952, 139.8509, 116.21199, 169.58014, 142.16394, 121.86044, 83.731735, 148.62137, 161.82155, 204.3007]
2025-05-06 14:10:30,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [114.0, 85.0, 74.0, 102.0, 91.0, 84.0, 59.0, 92.0, 97.0, 119.0]
2025-05-06 14:10:30,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 27 minutes, 58 seconds)
2025-05-06 14:13:17,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:13:19,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 169.20792 ± 82.086
2025-05-06 14:13:19,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [166.08551, 168.65703, 296.60104, 171.10043, 188.56206, 270.9049, 164.98488, 26.749083, 32.29599, 206.13821]
2025-05-06 14:13:19,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [112.0, 107.0, 150.0, 106.0, 110.0, 158.0, 99.0, 28.0, 31.0, 116.0]
2025-05-06 14:13:19,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 25 minutes, 46 seconds)
2025-05-06 14:16:06,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:16:08,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 172.73744 ± 114.570
2025-05-06 14:16:08,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [105.0251, 27.201166, 217.62083, 314.89127, 22.8158, 329.94446, 79.716644, 137.50366, 333.95822, 158.69711]
2025-05-06 14:16:08,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [82.0, 30.0, 124.0, 155.0, 30.0, 160.0, 57.0, 95.0, 153.0, 92.0]
2025-05-06 14:16:08,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 23 minutes)
2025-05-06 14:18:56,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:18:58,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 163.14716 ± 46.471
2025-05-06 14:18:58,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [140.23451, 149.69318, 113.118454, 196.15874, 167.57793, 229.0659, 127.783714, 144.28152, 254.68837, 108.869225]
2025-05-06 14:18:58,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [86.0, 93.0, 78.0, 116.0, 105.0, 121.0, 80.0, 93.0, 139.0, 74.0]
2025-05-06 14:18:58,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 19 minutes, 50 seconds)
2025-05-06 14:21:46,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:21:48,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 167.50877 ± 88.752
2025-05-06 14:21:48,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [29.145424, 237.28514, 180.59645, 372.69705, 88.132324, 171.53773, 162.69595, 159.12364, 182.77528, 91.0988]
2025-05-06 14:21:48,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [31.0, 129.0, 107.0, 179.0, 65.0, 98.0, 102.0, 92.0, 105.0, 64.0]
2025-05-06 14:21:48,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 17 minutes, 17 seconds)
2025-05-06 14:24:36,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:24:39,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 172.55124 ± 51.117
2025-05-06 14:24:39,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [231.75267, 188.67, 186.70363, 183.03072, 201.01631, 31.16007, 160.30461, 202.6925, 160.5288, 179.65302]
2025-05-06 14:24:39,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [126.0, 117.0, 120.0, 108.0, 107.0, 30.0, 109.0, 120.0, 100.0, 111.0]
2025-05-06 14:24:39,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 15 minutes, 14 seconds)
2025-05-06 14:27:28,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:27:31,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 169.90819 ± 87.889
2025-05-06 14:27:31,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [166.05107, 144.05746, 146.8994, 121.291504, 138.04152, 351.65747, 301.9333, 154.1152, 22.852926, 152.18216]
2025-05-06 14:27:31,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [104.0, 89.0, 91.0, 77.0, 88.0, 157.0, 151.0, 94.0, 30.0, 89.0]
2025-05-06 14:27:31,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 12 minutes, 57 seconds)
2025-05-06 14:30:18,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:30:20,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 174.01675 ± 76.066
2025-05-06 14:30:20,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [251.70543, 343.50378, 95.89201, 127.8482, 143.68065, 235.13524, 97.359436, 157.5381, 177.13402, 110.3707]
2025-05-06 14:30:20,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [138.0, 151.0, 67.0, 84.0, 98.0, 120.0, 68.0, 91.0, 100.0, 77.0]
2025-05-06 14:30:20,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 10 minutes, 16 seconds)
2025-05-06 14:33:10,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:33:12,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 182.39403 ± 87.343
2025-05-06 14:33:12,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [248.38316, 164.3938, 147.97763, 165.40396, 105.5156, 192.60605, 349.13596, 134.86554, 285.9954, 29.66322]
2025-05-06 14:33:12,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [132.0, 99.0, 100.0, 98.0, 70.0, 105.0, 161.0, 84.0, 147.0, 32.0]
2025-05-06 14:33:12,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 7 minutes, 55 seconds)
2025-05-06 14:36:00,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:36:01,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 112.49965 ± 22.553
2025-05-06 14:36:01,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [133.56714, 89.88243, 111.36268, 121.61398, 108.50037, 156.38016, 111.78114, 78.33566, 127.66745, 85.90553]
2025-05-06 14:36:01,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [88.0, 63.0, 75.0, 82.0, 71.0, 100.0, 75.0, 53.0, 85.0, 62.0]
2025-05-06 14:36:01,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 4 minutes, 55 seconds)
2025-05-06 14:38:51,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:38:53,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 144.89841 ± 79.719
2025-05-06 14:38:53,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [163.77936, 343.4755, 76.0403, 172.13742, 69.930725, 176.10228, 76.32068, 77.69395, 181.81308, 111.69085]
2025-05-06 14:38:53,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [102.0, 164.0, 56.0, 114.0, 51.0, 108.0, 57.0, 59.0, 104.0, 84.0]
2025-05-06 14:38:53,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 2 minutes, 16 seconds)
2025-05-06 14:41:40,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:41:42,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 144.80672 ± 94.920
2025-05-06 14:41:42,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [104.459885, 127.01894, 21.351828, 106.08473, 252.91183, 148.419, 339.7413, 22.083242, 108.26505, 217.73134]
2025-05-06 14:41:42,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [73.0, 81.0, 24.0, 80.0, 126.0, 90.0, 153.0, 24.0, 76.0, 123.0]
2025-05-06 14:41:42,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 58 minutes, 47 seconds)
2025-05-06 14:44:30,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:44:32,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 204.85141 ± 92.238
2025-05-06 14:44:32,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [269.4833, 28.786932, 345.81503, 346.43253, 200.99156, 169.81245, 217.49844, 188.88058, 136.97386, 143.83946]
2025-05-06 14:44:32,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [131.0, 32.0, 160.0, 161.0, 115.0, 100.0, 110.0, 104.0, 94.0, 95.0]
2025-05-06 14:44:32,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 56 minutes, 9 seconds)
2025-05-06 14:47:21,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:47:24,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 182.88568 ± 44.459
2025-05-06 14:47:24,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [106.560646, 248.00952, 181.05434, 155.31099, 216.1604, 154.86987, 150.88391, 157.71584, 205.62897, 252.66245]
2025-05-06 14:47:24,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [75.0, 133.0, 105.0, 91.0, 117.0, 102.0, 96.0, 95.0, 116.0, 134.0]
2025-05-06 14:47:24,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 53 minutes, 10 seconds)
2025-05-06 14:50:12,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:50:14,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 169.73416 ± 58.880
2025-05-06 14:50:14,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [96.912445, 98.90481, 226.14842, 182.19264, 229.42728, 288.63464, 164.18176, 142.17323, 131.17482, 137.59154]
2025-05-06 14:50:14,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [70.0, 71.0, 120.0, 116.0, 126.0, 150.0, 97.0, 84.0, 86.0, 95.0]
2025-05-06 14:50:14,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 50 minutes, 33 seconds)
2025-05-06 14:53:04,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:53:07,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 192.60684 ± 34.435
2025-05-06 14:53:07,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [230.21571, 161.06728, 169.1334, 204.90697, 208.61961, 245.99527, 144.81143, 163.82779, 233.75421, 163.73697]
2025-05-06 14:53:07,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [119.0, 92.0, 112.0, 132.0, 118.0, 138.0, 94.0, 107.0, 130.0, 101.0]
2025-05-06 14:53:07,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 47 minutes, 52 seconds)
2025-05-06 14:55:54,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:55:57,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 160.49661 ± 83.684
2025-05-06 14:55:57,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [194.26132, 31.543703, 301.56085, 111.49114, 205.4893, 140.90173, 186.18475, 246.93298, 165.37653, 21.223585]
2025-05-06 14:55:57,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [117.0, 33.0, 172.0, 74.0, 118.0, 87.0, 119.0, 143.0, 100.0, 25.0]
2025-05-06 14:55:57,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 45 minutes, 13 seconds)
2025-05-06 14:58:46,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:58:49,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 191.77014 ± 66.695
2025-05-06 14:58:49,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [152.84598, 165.18433, 256.61176, 25.464064, 176.29922, 213.36504, 251.72865, 202.41739, 205.9683, 267.81677]
2025-05-06 14:58:49,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [92.0, 101.0, 124.0, 30.0, 109.0, 125.0, 141.0, 119.0, 114.0, 145.0]
2025-05-06 14:58:49,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 42 minutes, 46 seconds)
2025-05-06 15:01:38,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:01:41,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 216.39127 ± 92.617
2025-05-06 15:01:41,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [154.48788, 236.7773, 401.76892, 172.29921, 154.77562, 229.92407, 96.35421, 344.32242, 252.52766, 120.67518]
2025-05-06 15:01:41,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [98.0, 132.0, 196.0, 100.0, 100.0, 124.0, 73.0, 155.0, 138.0, 77.0]
2025-05-06 15:01:41,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (216.39) for latency SparseU15
2025-05-06 15:01:41,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-06 15:01:41,262 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 15:01:41,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 39 minutes, 57 seconds)
2025-05-06 15:04:31,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:04:33,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 159.16971 ± 86.849
2025-05-06 15:04:33,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [168.22101, 25.066118, 196.48941, 246.88443, 25.213041, 159.26903, 182.07097, 144.22006, 118.076126, 326.18677]
2025-05-06 15:04:33,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [104.0, 25.0, 120.0, 157.0, 29.0, 105.0, 109.0, 89.0, 83.0, 182.0]
2025-05-06 15:04:33,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 37 minutes, 28 seconds)
2025-05-06 15:07:24,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:07:26,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 201.57388 ± 100.577
2025-05-06 15:07:26,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [364.7812, 189.22183, 156.54898, 125.07376, 167.204, 173.79689, 189.39856, 22.900126, 374.43854, 252.375]
2025-05-06 15:07:26,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [187.0, 121.0, 93.0, 96.0, 102.0, 105.0, 123.0, 27.0, 171.0, 132.0]
2025-05-06 15:07:26,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 34 minutes, 44 seconds)
2025-05-06 15:10:15,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:10:17,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 191.60733 ± 101.195
2025-05-06 15:10:17,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [141.53728, 373.6026, 238.58643, 256.93604, 118.87529, 89.6671, 116.654686, 91.686325, 352.59186, 135.93562]
2025-05-06 15:10:17,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [92.0, 198.0, 125.0, 167.0, 79.0, 66.0, 75.0, 68.0, 176.0, 86.0]
2025-05-06 15:10:17,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 32 minutes, 1 second)
2025-05-06 15:13:07,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:13:10,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 216.63980 ± 73.614
2025-05-06 15:13:10,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [125.6224, 202.52408, 286.89706, 196.38354, 120.11873, 116.23087, 246.68521, 250.44092, 286.2031, 335.2919]
2025-05-06 15:13:10,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [84.0, 114.0, 147.0, 116.0, 91.0, 81.0, 127.0, 169.0, 162.0, 156.0]
2025-05-06 15:13:10,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (216.64) for latency SparseU15
2025-05-06 15:13:10,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-06 15:13:10,465 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 15:13:10,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 29 minutes, 13 seconds)
2025-05-06 15:15:58,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:16:01,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 264.71103 ± 113.856
2025-05-06 15:16:01,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [412.4622, 176.46512, 154.73064, 424.76688, 389.275, 177.77554, 380.4867, 192.12685, 134.6712, 204.35045]
2025-05-06 15:16:01,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [182.0, 103.0, 105.0, 183.0, 180.0, 103.0, 181.0, 113.0, 94.0, 116.0]
2025-05-06 15:16:01,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (264.71) for latency SparseU15
2025-05-06 15:16:01,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-06 15:16:01,729 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 15:16:01,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 26 minutes, 16 seconds)
2025-05-06 15:18:52,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:18:55,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 231.90068 ± 77.207
2025-05-06 15:18:55,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [290.85007, 155.72084, 275.07144, 419.06473, 215.05727, 237.17847, 182.17993, 176.45996, 145.40047, 222.02374]
2025-05-06 15:18:55,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [174.0, 100.0, 158.0, 206.0, 128.0, 142.0, 119.0, 99.0, 92.0, 133.0]
2025-05-06 15:18:55,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 23 minutes, 39 seconds)
2025-05-06 15:21:43,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:21:45,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 175.89017 ± 96.953
2025-05-06 15:21:45,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [119.52201, 154.77385, 19.628336, 266.17856, 145.68686, 89.05984, 338.98853, 322.02463, 141.27736, 161.76158]
2025-05-06 15:21:45,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [83.0, 101.0, 23.0, 136.0, 87.0, 67.0, 164.0, 159.0, 103.0, 93.0]
2025-05-06 15:21:45,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 20 minutes, 17 seconds)
2025-05-06 15:24:35,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:24:38,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 212.19705 ± 81.871
2025-05-06 15:24:38,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [249.14633, 32.554516, 94.18469, 242.07648, 241.02599, 252.66075, 334.96048, 223.6272, 242.13231, 209.60161]
2025-05-06 15:24:38,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [130.0, 32.0, 71.0, 139.0, 123.0, 122.0, 179.0, 130.0, 138.0, 113.0]
2025-05-06 15:24:38,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 17 minutes, 45 seconds)
2025-05-06 15:27:28,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:27:31,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 247.93282 ± 97.758
2025-05-06 15:27:31,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [259.04526, 126.00726, 321.23538, 167.69571, 250.1033, 493.35034, 178.29703, 229.10344, 192.09369, 262.39673]
2025-05-06 15:27:31,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [136.0, 81.0, 168.0, 99.0, 138.0, 258.0, 107.0, 122.0, 112.0, 140.0]
2025-05-06 15:27:31,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 14 minutes, 56 seconds)
2025-05-06 15:30:24,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:30:27,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 219.22668 ± 116.338
2025-05-06 15:30:27,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [466.3167, 201.49301, 232.0646, 135.69359, 214.23311, 170.03397, 290.71674, 127.1848, 331.0285, 23.501646]
2025-05-06 15:30:27,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [204.0, 131.0, 138.0, 84.0, 119.0, 107.0, 178.0, 83.0, 179.0, 27.0]
2025-05-06 15:30:27,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 12 minutes, 43 seconds)
2025-05-06 15:33:10,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:33:12,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 212.76680 ± 131.175
2025-05-06 15:33:12,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [460.02072, 173.58067, 22.473598, 121.85865, 337.59796, 130.89883, 299.60992, 344.48657, 118.48438, 118.65683]
2025-05-06 15:33:12,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [221.0, 109.0, 28.0, 90.0, 180.0, 84.0, 150.0, 183.0, 90.0, 83.0]
2025-05-06 15:33:12,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 8 minutes, 35 seconds)
2025-05-06 15:36:01,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:36:04,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 194.60568 ± 87.166
2025-05-06 15:36:04,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [224.43163, 239.03317, 360.25305, 208.1596, 264.4136, 117.65549, 118.756096, 186.123, 24.863142, 202.36809]
2025-05-06 15:36:04,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [137.0, 128.0, 174.0, 125.0, 129.0, 85.0, 81.0, 106.0, 29.0, 116.0]
2025-05-06 15:36:04,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 5 minutes, 55 seconds)
2025-05-06 15:38:53,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:38:56,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 201.85410 ± 101.902
2025-05-06 15:38:56,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [145.9396, 146.44444, 192.72389, 281.5737, 226.52637, 179.00732, 264.93298, 134.14977, 424.32404, 22.918962]
2025-05-06 15:38:56,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [96.0, 94.0, 118.0, 158.0, 139.0, 108.0, 155.0, 88.0, 213.0, 29.0]
2025-05-06 15:38:56,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 2 minutes, 52 seconds)
2025-05-06 15:41:40,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:41:43,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 277.56061 ± 149.198
2025-05-06 15:41:43,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [218.95587, 253.70534, 614.43634, 138.54938, 171.01012, 495.53018, 173.40775, 160.36536, 308.9358, 240.70996]
2025-05-06 15:41:43,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [118.0, 131.0, 262.0, 85.0, 91.0, 235.0, 99.0, 102.0, 153.0, 127.0]
2025-05-06 15:41:43,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (277.56) for latency SparseU15
2025-05-06 15:41:43,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-06 15:41:43,844 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 15:41:43,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 59 minutes, 17 seconds)
2025-05-06 15:44:30,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:44:34,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 246.91182 ± 111.797
2025-05-06 15:44:34,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [480.5607, 124.526405, 241.3974, 148.47246, 223.8392, 226.35625, 164.22246, 434.1309, 196.58545, 229.02682]
2025-05-06 15:44:34,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [286.0, 84.0, 139.0, 102.0, 149.0, 129.0, 103.0, 254.0, 116.0, 147.0]
2025-05-06 15:44:34,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 55 minutes, 47 seconds)
2025-05-06 15:47:21,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:47:24,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 273.57944 ± 138.894
2025-05-06 15:47:24,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [271.2286, 287.4217, 501.6394, 420.78766, 122.08658, 18.37742, 319.51968, 199.04832, 402.99936, 192.6857]
2025-05-06 15:47:24,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [150.0, 148.0, 213.0, 186.0, 85.0, 25.0, 150.0, 112.0, 169.0, 108.0]
2025-05-06 15:47:24,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 53 minutes, 31 seconds)
2025-05-06 15:50:13,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:50:16,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 216.17326 ± 136.623
2025-05-06 15:50:16,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [252.84415, 98.16275, 346.14844, 212.84573, 228.93115, 482.5258, 26.19443, 307.02863, 186.14075, 20.911009]
2025-05-06 15:50:16,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [144.0, 69.0, 184.0, 119.0, 133.0, 254.0, 32.0, 177.0, 115.0, 27.0]
2025-05-06 15:50:16,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 50 minutes, 44 seconds)
2025-05-06 15:53:01,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:53:05,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 311.37860 ± 131.877
2025-05-06 15:53:05,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [206.8126, 448.83014, 311.25784, 567.9215, 228.65079, 146.90613, 361.1744, 439.32834, 175.36588, 227.53809]
2025-05-06 15:53:05,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [117.0, 194.0, 158.0, 217.0, 129.0, 88.0, 166.0, 199.0, 104.0, 124.0]
2025-05-06 15:53:05,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (311.38) for latency SparseU15
2025-05-06 15:53:05,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-06 15:53:05,336 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 15:53:05,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 47 minutes, 34 seconds)
2025-05-06 15:55:53,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:55:56,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 215.26318 ± 77.939
2025-05-06 15:55:56,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [270.1291, 320.98862, 242.59752, 30.239235, 188.94113, 255.02672, 142.37616, 274.5966, 198.6313, 229.10558]
2025-05-06 15:55:56,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [159.0, 207.0, 146.0, 32.0, 123.0, 155.0, 95.0, 160.0, 133.0, 157.0]
2025-05-06 15:55:56,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 45 minutes, 12 seconds)
2025-05-06 15:58:43,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:58:46,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 308.52435 ± 177.592
2025-05-06 15:58:46,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [350.1541, 395.1411, 742.84534, 335.70407, 276.63052, 263.17, 335.17505, 180.77391, 25.202578, 180.44675]
2025-05-06 15:58:46,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [151.0, 193.0, 263.0, 177.0, 155.0, 134.0, 180.0, 105.0, 30.0, 96.0]
2025-05-06 15:58:46,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 42 minutes, 15 seconds)
2025-05-06 16:01:33,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:01:37,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 297.39026 ± 199.138
2025-05-06 16:01:37,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [124.613525, 206.57622, 365.93906, 304.43015, 25.683565, 794.2393, 133.19382, 355.85297, 343.87054, 319.50308]
2025-05-06 16:01:37,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [80.0, 116.0, 179.0, 169.0, 27.0, 340.0, 83.0, 171.0, 178.0, 172.0]
2025-05-06 16:01:37,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 39 minutes, 29 seconds)
2025-05-06 16:04:25,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:04:28,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 185.54758 ± 90.899
2025-05-06 16:04:28,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [287.44098, 139.5949, 137.97572, 167.3096, 186.13193, 154.69469, 200.38553, 163.2396, 389.25842, 29.444485]
2025-05-06 16:04:28,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [143.0, 90.0, 91.0, 101.0, 106.0, 98.0, 117.0, 94.0, 169.0, 32.0]
2025-05-06 16:04:28,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 36 minutes, 33 seconds)
2025-05-06 16:07:13,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:07:17,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 276.17651 ± 282.931
2025-05-06 16:07:17,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [583.64075, 400.5096, 111.19868, 28.597107, 111.536064, 29.914362, 98.55134, 356.25592, 943.0466, 98.514534]
2025-05-06 16:07:17,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [246.0, 205.0, 86.0, 30.0, 85.0, 32.0, 72.0, 183.0, 430.0, 74.0]
2025-05-06 16:07:17,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 33 minutes, 41 seconds)
2025-05-06 16:10:04,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:10:08,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 347.66595 ± 154.305
2025-05-06 16:10:08,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [217.12444, 338.50595, 728.85596, 372.78928, 145.60223, 274.1595, 409.65714, 203.59488, 387.6672, 398.70322]
2025-05-06 16:10:08,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [138.0, 175.0, 348.0, 197.0, 99.0, 149.0, 201.0, 126.0, 191.0, 193.0]
2025-05-06 16:10:08,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (347.67) for latency SparseU15
2025-05-06 16:10:08,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-06 16:10:08,400 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 16:10:08,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 30 minutes, 49 seconds)
2025-05-06 16:12:56,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:12:59,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 276.72852 ± 98.800
2025-05-06 16:12:59,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [410.52625, 250.31065, 189.68849, 341.31152, 352.0884, 328.30118, 411.1936, 152.46748, 185.09584, 146.3015]
2025-05-06 16:12:59,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [216.0, 139.0, 126.0, 196.0, 180.0, 171.0, 193.0, 98.0, 103.0, 97.0]
2025-05-06 16:12:59,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 28 minutes, 8 seconds)
2025-05-06 16:15:47,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:15:50,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 244.42368 ± 169.107
2025-05-06 16:15:50,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [289.62125, 679.56525, 143.09857, 28.279776, 298.44666, 159.29414, 341.59286, 151.83363, 168.47206, 184.0324]
2025-05-06 16:15:50,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [149.0, 316.0, 92.0, 32.0, 162.0, 104.0, 183.0, 95.0, 105.0, 114.0]
2025-05-06 16:15:50,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 25 minutes, 21 seconds)
2025-05-06 16:18:38,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:18:42,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 330.29657 ± 145.979
2025-05-06 16:18:42,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [331.86658, 393.89502, 243.06029, 491.2425, 132.02342, 618.0104, 365.4667, 110.4623, 346.73132, 270.20715]
2025-05-06 16:18:42,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [168.0, 166.0, 147.0, 220.0, 88.0, 262.0, 184.0, 72.0, 175.0, 146.0]
2025-05-06 16:18:42,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 22 minutes, 32 seconds)
2025-05-06 16:21:28,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:21:32,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 334.21207 ± 157.294
2025-05-06 16:21:32,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [503.45123, 377.8691, 189.0576, 26.416964, 538.02045, 377.3623, 262.79178, 209.97792, 330.32156, 526.85187]
2025-05-06 16:21:32,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [211.0, 189.0, 113.0, 31.0, 261.0, 196.0, 129.0, 118.0, 170.0, 241.0]
2025-05-06 16:21:32,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 19 minutes, 48 seconds)
2025-05-06 16:24:19,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:24:22,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 215.05276 ± 86.621
2025-05-06 16:24:22,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [289.4387, 156.97499, 154.29591, 397.59302, 174.23082, 174.62065, 195.98946, 313.31436, 89.85054, 204.21901]
2025-05-06 16:24:22,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [135.0, 102.0, 99.0, 190.0, 111.0, 107.0, 116.0, 135.0, 70.0, 116.0]
2025-05-06 16:24:22,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 16 minutes, 52 seconds)
2025-05-06 16:27:10,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:27:13,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 267.21817 ± 272.432
2025-05-06 16:27:13,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [28.687157, 220.091, 209.37051, 927.09235, 548.2702, 24.354258, 199.17485, 97.02993, 390.6634, 27.44805]
2025-05-06 16:27:13,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [31.0, 113.0, 125.0, 389.0, 197.0, 30.0, 121.0, 74.0, 190.0, 32.0]
2025-05-06 16:27:13,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 14 minutes, 2 seconds)
2025-05-06 16:30:01,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:30:03,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 182.32809 ± 94.950
2025-05-06 16:30:03,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [170.2082, 136.0048, 265.03668, 189.49876, 412.9526, 137.09807, 189.49304, 148.16273, 144.42448, 30.401627]
2025-05-06 16:30:03,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [96.0, 91.0, 134.0, 110.0, 191.0, 86.0, 108.0, 93.0, 86.0, 32.0]
2025-05-06 16:30:03,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 11 minutes, 5 seconds)
2025-05-06 16:32:52,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:32:56,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 277.82623 ± 165.560
2025-05-06 16:32:56,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [163.69943, 373.69727, 593.13007, 112.61749, 504.63147, 274.23367, 245.90541, 26.769102, 301.27136, 182.30713]
2025-05-06 16:32:56,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [105.0, 188.0, 209.0, 79.0, 264.0, 154.0, 127.0, 31.0, 152.0, 103.0]
2025-05-06 16:32:56,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 8 minutes, 19 seconds)
2025-05-06 16:35:41,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:35:44,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 274.40421 ± 207.092
2025-05-06 16:35:44,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [166.73587, 271.68024, 415.86597, 825.9304, 144.03294, 168.8726, 153.72614, 81.26806, 337.14627, 178.78351]
2025-05-06 16:35:44,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [95.0, 138.0, 179.0, 284.0, 93.0, 106.0, 95.0, 57.0, 158.0, 105.0]
2025-05-06 16:35:44,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 5 minutes, 20 seconds)
2025-05-06 16:38:32,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:38:36,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 292.03180 ± 138.029
2025-05-06 16:38:36,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [192.55598, 123.58846, 176.58945, 433.46643, 136.88808, 249.4344, 403.6633, 474.66113, 229.36662, 500.10422]
2025-05-06 16:38:36,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [105.0, 84.0, 113.0, 190.0, 91.0, 137.0, 193.0, 197.0, 134.0, 239.0]
2025-05-06 16:38:36,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 2 minutes, 37 seconds)
2025-05-06 16:41:23,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:41:27,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 360.14560 ± 270.902
2025-05-06 16:41:27,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [367.3541, 29.97969, 475.48, 29.766005, 390.03577, 924.5169, 240.60295, 299.33954, 141.76988, 702.611]
2025-05-06 16:41:27,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [189.0, 30.0, 221.0, 29.0, 179.0, 381.0, 123.0, 178.0, 90.0, 257.0]
2025-05-06 16:41:27,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (360.15) for latency SparseU15
2025-05-06 16:41:27,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-06 16:41:27,901 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 16:41:27,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 80/100 (estimated time remaining: 59 minutes, 46 seconds)
2025-05-06 16:44:15,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:44:18,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 284.40146 ± 210.377
2025-05-06 16:44:18,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [156.97151, 508.86185, 25.53741, 31.899754, 246.80891, 275.73056, 352.865, 765.6317, 198.19563, 281.51257]
2025-05-06 16:44:18,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [100.0, 207.0, 28.0, 32.0, 123.0, 139.0, 163.0, 284.0, 129.0, 141.0]
2025-05-06 16:44:18,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 81/100 (estimated time remaining: 56 minutes, 58 seconds)
2025-05-06 16:47:06,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:47:09,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 295.84085 ± 337.401
2025-05-06 16:47:09,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [189.82474, 84.03343, 26.44915, 231.17818, 418.90494, 184.77869, 417.06464, 1228.5083, 26.419802, 151.24641]
2025-05-06 16:47:09,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [115.0, 61.0, 28.0, 127.0, 194.0, 106.0, 201.0, 423.0, 28.0, 89.0]
2025-05-06 16:47:09,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 82/100 (estimated time remaining: 54 minutes, 3 seconds)
2025-05-06 16:49:55,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:49:59,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 293.71152 ± 104.728
2025-05-06 16:49:59,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [350.26025, 394.3753, 236.64488, 153.42847, 172.60892, 177.9863, 427.81058, 224.30202, 396.42142, 403.27698]
2025-05-06 16:49:59,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [168.0, 177.0, 116.0, 97.0, 99.0, 103.0, 194.0, 120.0, 170.0, 190.0]
2025-05-06 16:49:59,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 83/100 (estimated time remaining: 51 minutes, 17 seconds)
2025-05-06 16:52:52,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:52:55,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 260.75751 ± 162.458
2025-05-06 16:52:55,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [350.7353, 193.53412, 350.30798, 24.162766, 166.28453, 631.6659, 172.28928, 167.8075, 384.75613, 166.03142]
2025-05-06 16:52:55,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [188.0, 110.0, 164.0, 28.0, 105.0, 262.0, 92.0, 109.0, 204.0, 101.0]
2025-05-06 16:52:55,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 84/100 (estimated time remaining: 48 minutes, 42 seconds)
2025-05-06 16:55:39,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:55:42,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 217.10959 ± 133.990
2025-05-06 16:55:42,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [355.5323, 170.6032, 24.817965, 388.0662, 27.927233, 440.313, 163.65813, 184.22554, 174.12173, 241.83078]
2025-05-06 16:55:42,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [164.0, 108.0, 29.0, 203.0, 32.0, 195.0, 98.0, 111.0, 104.0, 137.0]
2025-05-06 16:55:42,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 85/100 (estimated time remaining: 45 minutes, 33 seconds)
2025-05-06 16:58:29,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:58:34,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 423.97833 ± 265.748
2025-05-06 16:58:34,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [231.37411, 603.91064, 443.65247, 311.93164, 891.3776, 628.7812, 22.810911, 726.52576, 165.56711, 213.8514]
2025-05-06 16:58:34,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [131.0, 265.0, 200.0, 160.0, 347.0, 261.0, 28.0, 291.0, 97.0, 130.0]
2025-05-06 16:58:34,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (423.98) for latency SparseU15
2025-05-06 16:58:34,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-06 16:58:34,279 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 16:58:34,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 86/100 (estimated time remaining: 42 minutes, 47 seconds)
2025-05-06 17:01:24,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:01:28,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 433.29306 ± 381.788
2025-05-06 17:01:28,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [171.78722, 154.36475, 1033.7914, 181.19829, 264.3217, 787.09595, 195.61725, 1145.3064, 374.35974, 25.08783]
2025-05-06 17:01:28,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [99.0, 91.0, 362.0, 100.0, 130.0, 284.0, 109.0, 367.0, 167.0, 31.0]
2025-05-06 17:01:28,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (433.29) for latency SparseU15
2025-05-06 17:01:28,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-06 17:01:28,525 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 17:01:28,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 87/100 (estimated time remaining: 40 minutes, 5 seconds)
2025-05-06 17:04:16,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:04:19,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 295.14792 ± 203.320
2025-05-06 17:04:19,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [270.38635, 402.5154, 186.77148, 430.3729, 656.62775, 565.2539, 220.51797, 169.57574, 25.162094, 24.295643]
2025-05-06 17:04:19,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [138.0, 182.0, 101.0, 168.0, 260.0, 221.0, 120.0, 96.0, 29.0, 27.0]
2025-05-06 17:04:19,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 88/100 (estimated time remaining: 37 minutes, 16 seconds)
2025-05-06 17:07:10,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:07:14,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 378.05280 ± 353.933
2025-05-06 17:07:14,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [208.11224, 195.64854, 311.8221, 26.848133, 1102.4717, 970.3036, 203.50217, 78.333916, 162.40672, 521.07874]
2025-05-06 17:07:14,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [113.0, 113.0, 174.0, 30.0, 427.0, 339.0, 115.0, 54.0, 100.0, 231.0]
2025-05-06 17:07:14,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 89/100 (estimated time remaining: 34 minutes, 20 seconds)
2025-05-06 17:09:59,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:10:03,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 371.16217 ± 239.942
2025-05-06 17:10:03,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [550.41315, 483.5401, 321.54575, 180.44008, 21.492352, 191.52783, 404.74188, 128.79694, 859.258, 569.8654]
2025-05-06 17:10:03,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [238.0, 212.0, 161.0, 106.0, 26.0, 109.0, 174.0, 88.0, 325.0, 228.0]
2025-05-06 17:10:03,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 90/100 (estimated time remaining: 31 minutes, 34 seconds)
2025-05-06 17:12:51,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:12:56,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 391.23193 ± 197.518
2025-05-06 17:12:56,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [419.82092, 221.92546, 329.2694, 883.74506, 169.27596, 296.24612, 575.0983, 338.06924, 422.96732, 255.9016]
2025-05-06 17:12:56,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [196.0, 129.0, 159.0, 365.0, 96.0, 147.0, 255.0, 164.0, 209.0, 135.0]
2025-05-06 17:12:56,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 91/100 (estimated time remaining: 28 minutes, 43 seconds)
2025-05-06 17:15:42,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:15:45,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 263.49838 ± 97.613
2025-05-06 17:15:45,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [156.14981, 162.3261, 234.85541, 397.27884, 218.25742, 354.14743, 428.0935, 236.65181, 141.31079, 305.91272]
2025-05-06 17:15:45,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [103.0, 101.0, 127.0, 181.0, 126.0, 161.0, 192.0, 117.0, 82.0, 148.0]
2025-05-06 17:15:45,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 92/100 (estimated time remaining: 25 minutes, 43 seconds)
2025-05-06 17:18:33,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:18:37,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 415.07812 ± 248.021
2025-05-06 17:18:37,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [514.8708, 146.04182, 940.03265, 679.30304, 557.57086, 384.28223, 385.41946, 169.17197, 166.87097, 207.21713]
2025-05-06 17:18:37,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [237.0, 92.0, 329.0, 278.0, 246.0, 179.0, 187.0, 104.0, 99.0, 117.0]
2025-05-06 17:18:37,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 93/100 (estimated time remaining: 22 minutes, 53 seconds)
2025-05-06 17:21:25,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:21:30,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 373.23032 ± 158.019
2025-05-06 17:21:30,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [292.77637, 320.1452, 192.82332, 357.9511, 646.2468, 347.313, 348.64487, 668.8767, 394.00098, 163.52487]
2025-05-06 17:21:30,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [158.0, 154.0, 122.0, 174.0, 286.0, 166.0, 169.0, 277.0, 198.0, 94.0]
2025-05-06 17:21:30,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 94/100 (estimated time remaining: 19 minutes, 57 seconds)
2025-05-06 17:24:17,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:24:20,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 281.64502 ± 192.236
2025-05-06 17:24:20,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [108.7825, 694.7299, 377.85242, 247.29498, 541.3564, 205.27037, 23.665625, 186.79765, 237.35443, 193.3458]
2025-05-06 17:24:20,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [77.0, 308.0, 180.0, 130.0, 257.0, 119.0, 28.0, 110.0, 131.0, 109.0]
2025-05-06 17:24:20,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 95/100 (estimated time remaining: 17 minutes, 9 seconds)
2025-05-06 17:27:11,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:27:14,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 302.14764 ± 225.332
2025-05-06 17:27:14,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [267.52322, 161.32898, 151.17216, 175.18796, 954.0729, 351.7668, 192.92079, 291.5346, 232.50441, 243.46461]
2025-05-06 17:27:14,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [140.0, 102.0, 100.0, 102.0, 357.0, 172.0, 110.0, 153.0, 133.0, 145.0]
2025-05-06 17:27:14,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 96/100 (estimated time remaining: 14 minutes, 18 seconds)
2025-05-06 17:30:01,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:30:04,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 304.10934 ± 113.148
2025-05-06 17:30:04,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [502.2565, 298.1567, 207.92136, 153.45238, 283.8021, 432.57202, 154.6377, 341.58713, 248.48537, 418.2224]
2025-05-06 17:30:04,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [255.0, 153.0, 114.0, 99.0, 165.0, 195.0, 106.0, 163.0, 143.0, 203.0]
2025-05-06 17:30:04,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 97/100 (estimated time remaining: 11 minutes, 27 seconds)
2025-05-06 17:32:54,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:32:57,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 208.02576 ± 144.726
2025-05-06 17:32:57,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [102.33381, 223.78407, 217.60806, 343.7248, 103.40785, 214.8093, 22.152838, 501.57938, 24.280159, 326.57736]
2025-05-06 17:32:57,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [73.0, 129.0, 118.0, 172.0, 74.0, 132.0, 25.0, 232.0, 26.0, 171.0]
2025-05-06 17:32:57,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 98/100 (estimated time remaining: 8 minutes, 35 seconds)
2025-05-06 17:35:44,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:35:48,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 372.59094 ± 331.095
2025-05-06 17:35:48,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [122.51862, 25.489275, 258.8084, 27.06968, 1006.2119, 566.3142, 172.6886, 524.75494, 164.29509, 857.7585]
2025-05-06 17:35:48,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [80.0, 29.0, 149.0, 30.0, 333.0, 249.0, 105.0, 219.0, 106.0, 279.0]
2025-05-06 17:35:48,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 43 seconds)
2025-05-06 17:38:36,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:38:40,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 293.99387 ± 223.816
2025-05-06 17:38:40,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [75.73935, 25.230433, 203.85448, 313.87003, 136.69429, 730.95105, 352.13544, 76.95827, 437.70303, 586.8022]
2025-05-06 17:38:40,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [53.0, 30.0, 116.0, 164.0, 87.0, 283.0, 179.0, 53.0, 207.0, 266.0]
2025-05-06 17:38:40,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 51 seconds)
2025-05-06 17:41:29,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:41:33,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 383.62857 ± 161.400
2025-05-06 17:41:33,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [648.87085, 352.642, 498.69153, 638.19476, 250.91695, 273.5243, 337.95038, 213.75778, 169.66756, 452.06995]
2025-05-06 17:41:33,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [282.0, 169.0, 206.0, 275.0, 128.0, 156.0, 172.0, 125.0, 99.0, 227.0]
2025-05-06 17:41:33,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1149 [DEBUG]: Training session finished
