2025-05-08 04:43:00,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-bpql-mem4
2025-05-08 04:43:00,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-bpql-mem4
2025-05-08 04:43:00,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1008 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7a52477cba90>}
2025-05-08 04:43:00,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1009 [DEBUG]: using device: cpu
2025-05-08 04:43:00,585 baseline-bpql-noisy-hopper:77 [WARNING]: args.assumed_delay != args.horizon: 4 != 32
2025-05-08 04:43:00,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1031 [INFO]: Creating new trainer
2025-05-08 04:43:00,594 baseline-bpql-noisy-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=23, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-05-08 04:43:00,595 baseline-bpql-noisy-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-08 04:43:00,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1092 [DEBUG]: Starting training session...
2025-05-08 04:43:00,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 1/100
2025-05-08 04:45:33,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:45:34,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 104.09592 ± 34.898
2025-05-08 04:45:34,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [88.07072, 89.04943, 88.55566, 88.6743, 89.62139, 90.44987, 197.53366, 140.277, 83.43405, 85.29308]
2025-05-08 04:45:34,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [54.0, 54.0, 53.0, 53.0, 54.0, 54.0, 88.0, 82.0, 51.0, 52.0]
2025-05-08 04:45:34,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (104.10) for latency ExtremeSparseL4U32
2025-05-08 04:45:34,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-08 04:45:34,139 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 04:45:34,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 12 minutes, 58 seconds)
2025-05-08 04:48:19,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:48:22,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 355.28412 ± 95.581
2025-05-08 04:48:22,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [475.56046, 360.2236, 367.58426, 394.8808, 367.11185, 378.4937, 85.15497, 386.81244, 377.3095, 359.7095]
2025-05-08 04:48:22,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [285.0, 192.0, 201.0, 218.0, 194.0, 203.0, 53.0, 217.0, 203.0, 193.0]
2025-05-08 04:48:22,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (355.28) for latency ExtremeSparseL4U32
2025-05-08 04:48:22,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-08 04:48:22,354 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 04:48:22,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 22 minutes, 35 seconds)
2025-05-08 04:51:08,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:51:10,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 265.91800 ± 107.276
2025-05-08 04:51:10,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [310.80048, 329.56558, 142.98332, 360.30212, 328.07205, 334.40018, 131.65454, 312.9659, 48.354233, 360.0815]
2025-05-08 04:51:10,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [141.0, 144.0, 75.0, 180.0, 147.0, 143.0, 69.0, 143.0, 35.0, 181.0]
2025-05-08 04:51:10,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 23 minutes, 40 seconds)
2025-05-08 04:54:01,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:54:05,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 337.97336 ± 291.436
2025-05-08 04:54:05,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [1057.7443, 106.8434, 273.36404, 265.6322, 93.65915, 162.62317, 423.83005, 615.33014, 348.6727, 32.03471]
2025-05-08 04:54:05,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [1000.0, 101.0, 252.0, 230.0, 84.0, 140.0, 403.0, 574.0, 317.0, 36.0]
2025-05-08 04:54:05,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 26 minutes, 3 seconds)
2025-05-08 04:56:58,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:57:00,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 302.70895 ± 108.954
2025-05-08 04:57:00,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [363.4136, 369.25336, 363.4106, 355.29785, 367.81088, 98.64012, 368.04178, 341.51843, 74.69145, 325.01157]
2025-05-08 04:57:00,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [188.0, 195.0, 190.0, 179.0, 191.0, 57.0, 243.0, 174.0, 42.0, 159.0]
2025-05-08 04:57:00,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 25 minutes, 50 seconds)
2025-05-08 04:59:45,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:59:46,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 258.49127 ± 141.873
2025-05-08 04:59:46,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [391.84164, 397.2603, 418.5127, 380.5652, 96.25597, 42.336563, 352.18207, 223.52463, 222.39467, 60.03898]
2025-05-08 04:59:46,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [152.0, 151.0, 154.0, 148.0, 53.0, 43.0, 163.0, 124.0, 103.0, 45.0]
2025-05-08 04:59:46,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 27 minutes, 11 seconds)
2025-05-08 05:02:28,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:02:30,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 306.77957 ± 175.759
2025-05-08 05:02:30,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [33.12827, 490.14133, 409.41214, 454.95486, 132.63791, 518.4644, 101.495995, 134.43503, 347.4994, 445.62622]
2025-05-08 05:02:30,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [25.0, 220.0, 177.0, 192.0, 72.0, 267.0, 55.0, 73.0, 154.0, 190.0]
2025-05-08 05:02:30,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 22 minutes, 54 seconds)
2025-05-08 05:05:15,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:05:17,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 305.90115 ± 119.378
2025-05-08 05:05:17,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [145.19142, 333.51917, 174.67004, 427.34158, 330.75098, 384.43033, 343.49673, 386.07214, 84.74971, 448.78955]
2025-05-08 05:05:17,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [99.0, 165.0, 106.0, 160.0, 167.0, 136.0, 172.0, 137.0, 50.0, 176.0]
2025-05-08 05:05:17,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 19 minutes, 45 seconds)
2025-05-08 05:07:57,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:07:58,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 277.09430 ± 110.282
2025-05-08 05:07:58,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [316.47772, 305.44495, 307.90836, 311.5641, 83.71777, 40.715324, 318.22107, 355.5125, 353.03735, 378.34378]
2025-05-08 05:07:58,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [120.0, 117.0, 118.0, 118.0, 55.0, 45.0, 133.0, 129.0, 129.0, 135.0]
2025-05-08 05:07:58,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 12 minutes, 34 seconds)
2025-05-08 05:10:40,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:10:41,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 368.28964 ± 94.130
2025-05-08 05:10:41,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [384.1164, 113.37841, 448.8197, 379.6205, 422.3408, 308.50903, 440.873, 404.59692, 427.11475, 353.5268]
2025-05-08 05:10:41,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [139.0, 69.0, 194.0, 139.0, 151.0, 122.0, 175.0, 151.0, 152.0, 134.0]
2025-05-08 05:10:41,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (368.29) for latency ExtremeSparseL4U32
2025-05-08 05:10:41,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-08 05:10:41,823 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 05:10:41,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 6 minutes, 26 seconds)
2025-05-08 05:13:23,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:13:25,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 226.67558 ± 167.818
2025-05-08 05:13:25,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [15.983927, 33.62187, 476.7509, 374.72192, 84.81198, 172.43549, 30.238104, 359.17075, 382.01318, 337.00778]
2025-05-08 05:13:25,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [17.0, 48.0, 245.0, 143.0, 49.0, 88.0, 33.0, 139.0, 150.0, 135.0]
2025-05-08 05:13:25,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 2 minutes, 45 seconds)
2025-05-08 05:16:07,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:16:08,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 429.75195 ± 17.952
2025-05-08 05:16:08,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [414.2057, 448.34387, 451.17377, 421.22748, 422.43597, 429.81583, 449.78543, 389.91614, 435.6005, 435.01447]
2025-05-08 05:16:08,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [147.0, 158.0, 159.0, 147.0, 149.0, 152.0, 163.0, 143.0, 154.0, 155.0]
2025-05-08 05:16:08,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (429.75) for latency ExtremeSparseL4U32
2025-05-08 05:16:08,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-08 05:16:08,969 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 05:16:08,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 6 seconds)
2025-05-08 05:18:51,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:18:53,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 316.59497 ± 182.327
2025-05-08 05:18:53,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [386.89993, 387.0159, 367.31207, 86.81111, 34.42591, 387.8099, 28.617647, 514.39404, 523.67145, 448.9918]
2025-05-08 05:18:53,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [137.0, 136.0, 132.0, 52.0, 49.0, 137.0, 38.0, 194.0, 197.0, 152.0]
2025-05-08 05:18:53,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 14/100 (estimated time remaining: 3 hours, 56 minutes, 38 seconds)
2025-05-08 05:21:36,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:21:38,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 439.82324 ± 136.017
2025-05-08 05:21:38,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [504.09494, 463.25528, 442.76196, 493.38467, 50.605465, 395.5539, 559.38226, 501.34174, 499.62192, 488.23038]
2025-05-08 05:21:38,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [179.0, 162.0, 190.0, 166.0, 48.0, 156.0, 211.0, 172.0, 172.0, 162.0]
2025-05-08 05:21:38,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (439.82) for latency ExtremeSparseL4U32
2025-05-08 05:21:38,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-08 05:21:38,374 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 05:21:38,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 15/100 (estimated time remaining: 3 hours, 55 minutes)
2025-05-08 05:24:22,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:24:24,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 218.77010 ± 229.576
2025-05-08 05:24:24,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [10.384982, 34.27693, 555.8987, 12.779541, 432.45782, 33.18899, 228.24486, 641.89734, 204.36855, 34.2031]
2025-05-08 05:24:24,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [14.0, 37.0, 195.0, 14.0, 152.0, 34.0, 183.0, 267.0, 89.0, 40.0]
2025-05-08 05:24:24,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 16/100 (estimated time remaining: 3 hours, 52 minutes, 58 seconds)
2025-05-08 05:27:08,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:27:09,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 398.81909 ± 210.424
2025-05-08 05:27:09,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [16.77967, 274.42886, 559.2262, 573.00104, 547.9254, 535.6506, 498.788, 26.5489, 370.1294, 585.713]
2025-05-08 05:27:09,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [17.0, 131.0, 184.0, 185.0, 179.0, 175.0, 169.0, 34.0, 145.0, 207.0]
2025-05-08 05:27:09,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 50 minutes, 55 seconds)
2025-05-08 05:29:53,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:29:54,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 372.25446 ± 197.739
2025-05-08 05:29:54,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [548.2819, 121.7653, 175.50502, 23.816767, 583.20264, 484.50085, 536.3113, 352.00656, 305.25192, 591.90216]
2025-05-08 05:29:54,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [204.0, 62.0, 83.0, 29.0, 232.0, 207.0, 189.0, 143.0, 120.0, 231.0]
2025-05-08 05:29:54,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 48 minutes, 31 seconds)
2025-05-08 05:32:37,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:32:38,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 260.70389 ± 195.867
2025-05-08 05:32:38,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [470.17807, 506.70438, 60.16928, 204.56029, 327.23346, 537.1335, 372.06033, 49.28623, 53.840317, 25.872774]
2025-05-08 05:32:38,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [154.0, 164.0, 37.0, 101.0, 154.0, 196.0, 132.0, 52.0, 55.0, 30.0]
2025-05-08 05:32:38,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 45 minutes, 38 seconds)
2025-05-08 05:35:21,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:35:23,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 429.11362 ± 176.671
2025-05-08 05:35:23,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [545.91376, 519.7692, 540.6435, 534.27844, 460.76962, 527.29834, 529.9916, 127.21837, 37.553646, 467.69965]
2025-05-08 05:35:23,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [190.0, 169.0, 175.0, 172.0, 154.0, 171.0, 173.0, 83.0, 44.0, 200.0]
2025-05-08 05:35:23,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 42 minutes, 42 seconds)
2025-05-08 05:38:06,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:38:08,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 480.02411 ± 126.080
2025-05-08 05:38:08,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [239.62042, 222.53014, 548.2163, 529.049, 555.6627, 511.9614, 569.4611, 520.5396, 575.9124, 527.2878]
2025-05-08 05:38:08,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [117.0, 115.0, 184.0, 174.0, 187.0, 170.0, 191.0, 172.0, 198.0, 172.0]
2025-05-08 05:38:08,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (480.02) for latency ExtremeSparseL4U32
2025-05-08 05:38:08,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-08 05:38:08,548 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 05:38:08,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 39 minutes, 51 seconds)
2025-05-08 05:40:53,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:40:54,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 430.85236 ± 149.812
2025-05-08 05:40:54,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [582.7306, 254.651, 525.81036, 552.6323, 528.5384, 525.46545, 538.0689, 166.43607, 418.7988, 215.39172]
2025-05-08 05:40:54,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [195.0, 107.0, 168.0, 179.0, 169.0, 168.0, 173.0, 77.0, 188.0, 95.0]
2025-05-08 05:40:54,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 37 minutes, 14 seconds)
2025-05-08 05:43:38,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:43:40,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 290.21753 ± 223.297
2025-05-08 05:43:40,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [530.7831, 272.65924, 566.452, 108.466606, 73.30967, 170.31046, 540.74634, 36.97442, 568.1932, 34.280308]
2025-05-08 05:43:40,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [190.0, 109.0, 191.0, 57.0, 45.0, 80.0, 198.0, 29.0, 209.0, 44.0]
2025-05-08 05:43:40,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 34 minutes, 30 seconds)
2025-05-08 05:46:28,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:46:31,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 547.39172 ± 275.897
2025-05-08 05:46:31,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [515.32825, 593.4941, 675.5306, 610.9387, 511.3733, 825.03046, 830.21106, 821.3479, 49.18094, 41.48186]
2025-05-08 05:46:31,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [213.0, 204.0, 266.0, 218.0, 193.0, 332.0, 319.0, 297.0, 47.0, 52.0]
2025-05-08 05:46:31,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (547.39) for latency ExtremeSparseL4U32
2025-05-08 05:46:31,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-08 05:46:31,489 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 05:46:31,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 33 minutes, 45 seconds)
2025-05-08 05:49:12,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:49:15,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 531.03931 ± 184.147
2025-05-08 05:49:15,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [538.9367, 406.12112, 521.3015, 749.81244, 656.19, 603.14404, 558.377, 43.28136, 642.24945, 590.9797]
2025-05-08 05:49:15,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [199.0, 160.0, 200.0, 303.0, 238.0, 217.0, 202.0, 43.0, 232.0, 210.0]
2025-05-08 05:49:15,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 30 minutes, 46 seconds)
2025-05-08 05:51:58,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:52:01,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 646.64142 ± 292.123
2025-05-08 05:52:01,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [873.0, 76.95405, 425.5568, 680.1255, 676.315, 785.5839, 835.7369, 874.0612, 1018.56744, 220.5136]
2025-05-08 05:52:01,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [294.0, 52.0, 199.0, 281.0, 237.0, 260.0, 287.0, 294.0, 371.0, 117.0]
2025-05-08 05:52:01,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (646.64) for latency ExtremeSparseL4U32
2025-05-08 05:52:01,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-08 05:52:01,460 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 05:52:01,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 28 minutes, 13 seconds)
2025-05-08 05:54:45,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:54:46,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 269.68625 ± 234.101
2025-05-08 05:54:46,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [239.30104, 43.119354, 295.84772, 53.574425, 583.5596, 184.1545, 739.77454, 443.27722, 80.32728, 33.926765]
2025-05-08 05:54:46,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [131.0, 49.0, 145.0, 59.0, 199.0, 91.0, 257.0, 166.0, 59.0, 39.0]
2025-05-08 05:54:46,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 25 minutes, 7 seconds)
2025-05-08 05:57:30,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:57:33,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 698.84894 ± 441.778
2025-05-08 05:57:33,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [1301.574, 564.82166, 232.43379, 1109.2482, 305.47855, 29.580645, 1265.4612, 630.8586, 1124.2928, 424.73962]
2025-05-08 05:57:33,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [469.0, 198.0, 119.0, 424.0, 123.0, 34.0, 475.0, 212.0, 400.0, 175.0]
2025-05-08 05:57:33,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (698.85) for latency ExtremeSparseL4U32
2025-05-08 05:57:33,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-08 05:57:33,525 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 05:57:33,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 22 minutes, 49 seconds)
2025-05-08 06:00:23,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:00:25,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 349.29758 ± 244.538
2025-05-08 06:00:25,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [631.463, 218.53906, 674.7743, 22.69841, 38.64095, 214.3176, 630.69104, 604.782, 241.21207, 215.85733]
2025-05-08 06:00:25,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [207.0, 95.0, 272.0, 27.0, 39.0, 143.0, 245.0, 198.0, 139.0, 93.0]
2025-05-08 06:00:25,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 20 minutes, 10 seconds)
2025-05-08 06:03:07,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:03:09,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 565.06219 ± 446.753
2025-05-08 06:03:09,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [856.37933, 406.09778, 609.6568, 867.5305, 1514.9723, 83.81909, 438.68573, 27.679264, 27.190454, 818.61127]
2025-05-08 06:03:09,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [285.0, 152.0, 204.0, 281.0, 523.0, 48.0, 197.0, 31.0, 32.0, 276.0]
2025-05-08 06:03:09,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 17 minutes, 27 seconds)
2025-05-08 06:05:53,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:05:55,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 431.66949 ± 347.259
2025-05-08 06:05:55,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [624.7948, 57.370594, 655.27344, 646.4393, 1012.93134, 39.42501, 162.85037, 243.53839, 29.03504, 845.03656]
2025-05-08 06:05:55,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [204.0, 55.0, 226.0, 215.0, 346.0, 33.0, 90.0, 103.0, 30.0, 269.0]
2025-05-08 06:05:55,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 14 minutes, 39 seconds)
2025-05-08 06:08:37,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:08:40,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 568.05505 ± 300.199
2025-05-08 06:08:40,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [549.90546, 742.60175, 766.87024, 778.7351, 1095.668, 611.486, 124.337746, 108.984245, 639.6296, 262.33258]
2025-05-08 06:08:40,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [226.0, 306.0, 277.0, 264.0, 412.0, 206.0, 63.0, 85.0, 209.0, 106.0]
2025-05-08 06:08:40,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 11 minutes, 48 seconds)
2025-05-08 06:11:29,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:11:31,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 680.08734 ± 618.714
2025-05-08 06:11:31,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [254.8356, 720.6071, 2375.4229, 709.2554, 836.7319, 623.0274, 409.7096, 138.45955, 66.8056, 666.018]
2025-05-08 06:11:31,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [107.0, 233.0, 810.0, 219.0, 260.0, 210.0, 171.0, 69.0, 77.0, 207.0]
2025-05-08 06:11:31,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 10 minutes, 3 seconds)
2025-05-08 06:14:27,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:14:29,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 272.18179 ± 326.794
2025-05-08 06:14:29,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [278.89862, 1037.8739, 38.211205, 720.1593, 148.67471, 318.65897, 29.02082, 83.01323, 39.671898, 27.635366]
2025-05-08 06:14:29,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [117.0, 400.0, 42.0, 240.0, 73.0, 162.0, 31.0, 84.0, 51.0, 31.0]
2025-05-08 06:14:29,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 8 minutes, 24 seconds)
2025-05-08 06:17:23,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:17:26,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 660.73987 ± 358.321
2025-05-08 06:17:26,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [736.9617, 759.17395, 1101.426, 57.09993, 1230.7848, 233.03362, 414.0343, 390.11322, 873.0503, 811.72095]
2025-05-08 06:17:26,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [231.0, 233.0, 358.0, 72.0, 395.0, 100.0, 156.0, 147.0, 289.0, 254.0]
2025-05-08 06:17:26,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 8 minutes, 31 seconds)
2025-05-08 06:20:08,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:20:10,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 519.33905 ± 261.506
2025-05-08 06:20:10,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [797.705, 591.0455, 657.05396, 631.5845, 12.198275, 225.35065, 693.7817, 773.9423, 176.44885, 634.27997]
2025-05-08 06:20:10,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [253.0, 211.0, 235.0, 201.0, 14.0, 114.0, 271.0, 242.0, 79.0, 203.0]
2025-05-08 06:20:10,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 5 minutes, 6 seconds)
2025-05-08 06:22:57,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:22:59,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 540.33545 ± 443.945
2025-05-08 06:22:59,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [1014.92554, 473.85535, 878.2386, 68.57472, 49.171085, 828.63104, 1387.768, 407.92084, 279.8714, 14.398117]
2025-05-08 06:22:59,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [341.0, 181.0, 324.0, 44.0, 49.0, 284.0, 484.0, 161.0, 116.0, 16.0]
2025-05-08 06:22:59,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 3 minutes, 15 seconds)
2025-05-08 06:25:46,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:25:49,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 545.91815 ± 292.951
2025-05-08 06:25:49,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [624.6611, 382.6256, 1025.8911, 384.5543, 30.354258, 560.89795, 265.20044, 1011.15515, 633.21405, 540.62775]
2025-05-08 06:25:49,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [215.0, 152.0, 394.0, 165.0, 34.0, 238.0, 142.0, 338.0, 220.0, 191.0]
2025-05-08 06:25:49,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 59 minutes, 59 seconds)
2025-05-08 06:28:32,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:28:34,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 310.13837 ± 228.813
2025-05-08 06:28:34,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [41.52866, 435.10632, 22.838152, 484.91647, 74.103966, 323.80548, 50.463554, 447.93823, 602.19244, 618.4902]
2025-05-08 06:28:34,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [48.0, 151.0, 26.0, 175.0, 61.0, 129.0, 58.0, 166.0, 199.0, 212.0]
2025-05-08 06:28:34,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 54 minutes, 36 seconds)
2025-05-08 06:31:21,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:31:23,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 428.32880 ± 390.742
2025-05-08 06:31:23,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [465.57, 844.90424, 314.295, 334.344, 658.9301, 125.60202, 50.013397, 1322.8315, 128.14882, 38.64879]
2025-05-08 06:31:23,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [175.0, 271.0, 124.0, 138.0, 219.0, 65.0, 53.0, 460.0, 64.0, 37.0]
2025-05-08 06:31:23,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 50 minutes, 14 seconds)
2025-05-08 06:34:18,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:34:20,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 433.26010 ± 299.292
2025-05-08 06:34:20,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [934.3514, 736.39984, 495.72916, 261.1112, 511.52988, 111.48846, 427.3924, 42.17372, 763.37213, 49.052948]
2025-05-08 06:34:20,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [303.0, 230.0, 170.0, 143.0, 199.0, 111.0, 168.0, 34.0, 262.0, 34.0]
2025-05-08 06:34:20,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 50 minutes, 1 second)
2025-05-08 06:37:15,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:37:18,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 582.38641 ± 890.024
2025-05-08 06:37:18,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [47.78025, 41.79452, 42.04526, 31.531452, 1050.2667, 56.614056, 53.400047, 1646.0885, 120.64775, 2733.6963]
2025-05-08 06:37:18,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [49.0, 29.0, 47.0, 39.0, 410.0, 55.0, 49.0, 566.0, 64.0, 1000.0]
2025-05-08 06:37:18,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 48 minutes, 54 seconds)
2025-05-08 06:40:11,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:40:14,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 538.74805 ± 643.076
2025-05-08 06:40:14,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [225.87906, 635.7265, 1044.1687, 164.29454, 33.877853, 49.620823, 50.9146, 51.431633, 2091.7612, 1039.8058]
2025-05-08 06:40:14,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [99.0, 243.0, 365.0, 92.0, 41.0, 59.0, 58.0, 31.0, 768.0, 339.0]
2025-05-08 06:40:14,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 47 minutes, 20 seconds)
2025-05-08 06:43:01,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:43:08,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 1243.58105 ± 704.527
2025-05-08 06:43:08,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [968.8462, 2241.9812, 1860.542, 1005.9843, 476.58215, 278.92868, 1882.4985, 2232.3691, 902.72174, 585.3564]
2025-05-08 06:43:08,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [336.0, 781.0, 694.0, 361.0, 209.0, 130.0, 709.0, 795.0, 307.0, 237.0]
2025-05-08 06:43:08,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1124 [INFO]: New best (1243.58) for latency ExtremeSparseL4U32
2025-05-08 06:43:08,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1127 [INFO]: saving network
2025-05-08 06:43:08,062 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 06:43:08,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 46 minutes, 3 seconds)
2025-05-08 06:46:15,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:46:17,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 282.41666 ± 341.048
2025-05-08 06:46:17,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [9.825175, 27.583363, 82.880455, 177.683, 1062.3489, 669.97974, 567.70526, 117.16294, 86.23171, 22.76606]
2025-05-08 06:46:17,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [14.0, 28.0, 49.0, 82.0, 360.0, 209.0, 180.0, 60.0, 75.0, 28.0]
2025-05-08 06:46:17,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 46 minutes, 49 seconds)
2025-05-08 06:49:17,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:49:20,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 692.76434 ± 610.251
2025-05-08 06:49:20,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [821.77637, 1571.5308, 1432.1085, 1268.316, 422.78854, 60.663597, 39.66902, 36.409977, 36.629444, 1237.7517]
2025-05-08 06:49:20,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [321.0, 584.0, 518.0, 445.0, 184.0, 54.0, 45.0, 38.0, 35.0, 474.0]
2025-05-08 06:49:20,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 44 minutes, 59 seconds)
2025-05-08 06:51:58,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:52:00,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 424.00821 ± 522.527
2025-05-08 06:52:00,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [50.454662, 199.21472, 35.96432, 51.458286, 718.5356, 527.0371, 30.540634, 1837.1283, 387.42612, 402.32205]
2025-05-08 06:52:00,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [55.0, 94.0, 34.0, 53.0, 264.0, 202.0, 31.0, 663.0, 133.0, 158.0]
2025-05-08 06:52:00,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 38 minutes, 47 seconds)
2025-05-08 06:54:41,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:54:43,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 520.37469 ± 383.828
2025-05-08 06:54:43,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [36.75106, 617.83014, 1374.755, 254.08992, 817.7484, 616.9901, 468.99692, 36.860355, 274.31543, 705.4098]
2025-05-08 06:54:43,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [46.0, 203.0, 478.0, 107.0, 272.0, 200.0, 188.0, 44.0, 117.0, 287.0]
2025-05-08 06:54:43,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 33 minutes, 34 seconds)
2025-05-08 06:57:24,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:57:26,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 481.61346 ± 281.701
2025-05-08 06:57:26,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [289.8263, 37.29706, 880.5905, 815.18097, 490.33807, 795.6918, 473.4425, 585.4261, 377.11646, 71.22575]
2025-05-08 06:57:26,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [116.0, 31.0, 297.0, 253.0, 167.0, 245.0, 169.0, 188.0, 142.0, 86.0]
2025-05-08 06:57:26,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 28 minutes, 46 seconds)
2025-05-08 07:00:10,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:00:14,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 845.77179 ± 950.478
2025-05-08 07:00:14,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [35.838654, 67.611946, 658.8437, 615.7777, 2579.0706, 34.641624, 541.38354, 2787.7773, 520.7403, 616.03296]
2025-05-08 07:00:14,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [39.0, 66.0, 237.0, 193.0, 980.0, 41.0, 173.0, 1000.0, 166.0, 194.0]
2025-05-08 07:00:14,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 22 minutes, 13 seconds)
2025-05-08 07:02:56,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:02:59,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 690.56677 ± 504.336
2025-05-08 07:02:59,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [277.3557, 17.749748, 1488.5757, 118.68235, 919.8956, 1043.4128, 958.09546, 828.476, 1209.9866, 43.43814]
2025-05-08 07:02:59,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [121.0, 21.0, 552.0, 63.0, 384.0, 359.0, 361.0, 308.0, 443.0, 59.0]
2025-05-08 07:02:59,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 16 minutes, 29 seconds)
2025-05-08 07:05:37,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:05:40,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 647.56793 ± 470.157
2025-05-08 07:05:40,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [81.55301, 730.817, 650.2969, 815.86084, 987.49835, 992.6771, 514.652, 49.33523, 58.181614, 1594.8073]
2025-05-08 07:05:40,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [48.0, 260.0, 221.0, 271.0, 324.0, 324.0, 182.0, 59.0, 61.0, 615.0]
2025-05-08 07:05:40,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 13 minutes, 54 seconds)
2025-05-08 07:08:20,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:08:22,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 357.52112 ± 362.435
2025-05-08 07:08:22,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [214.73376, 1058.5457, 36.816723, 745.92236, 300.7065, 43.428127, 167.326, 127.80473, 25.449608, 854.47754]
2025-05-08 07:08:22,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [101.0, 383.0, 39.0, 246.0, 121.0, 47.0, 86.0, 74.0, 31.0, 293.0]
2025-05-08 07:08:22,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 10 minutes, 57 seconds)
2025-05-08 07:11:01,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:11:03,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 448.29486 ± 298.586
2025-05-08 07:11:03,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [276.93625, 857.2724, 402.6292, 83.09793, 258.76553, 31.950678, 257.35797, 769.91644, 767.08624, 777.93646]
2025-05-08 07:11:03,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [113.0, 285.0, 151.0, 65.0, 117.0, 32.0, 124.0, 278.0, 239.0, 248.0]
2025-05-08 07:11:03,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 7 minutes, 56 seconds)
2025-05-08 07:13:44,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:13:46,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 470.56543 ± 381.697
2025-05-08 07:13:46,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [236.38567, 1237.7213, 745.63104, 30.804607, 27.735195, 671.7502, 832.59625, 454.97894, 413.91415, 54.137405]
2025-05-08 07:13:46,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [107.0, 442.0, 276.0, 33.0, 30.0, 209.0, 270.0, 161.0, 161.0, 60.0]
2025-05-08 07:13:46,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 4 minutes, 30 seconds)
2025-05-08 07:16:26,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:16:28,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 397.26056 ± 349.246
2025-05-08 07:16:28,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [30.613718, 35.299244, 460.2409, 303.69043, 35.067127, 470.29617, 923.8929, 842.2516, 839.81696, 31.436401]
2025-05-08 07:16:28,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [38.0, 49.0, 172.0, 128.0, 41.0, 173.0, 361.0, 276.0, 308.0, 41.0]
2025-05-08 07:16:28,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 1 minute, 25 seconds)
2025-05-08 07:19:13,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:19:15,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 547.19012 ± 398.616
2025-05-08 07:19:15,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [720.7542, 9.012078, 263.2031, 29.840364, 635.98566, 272.61484, 764.6355, 1416.1736, 681.79913, 677.88275]
2025-05-08 07:19:15,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [238.0, 13.0, 111.0, 33.0, 225.0, 124.0, 252.0, 504.0, 233.0, 263.0]
2025-05-08 07:19:15,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 59 minutes, 36 seconds)
2025-05-08 07:22:01,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:22:03,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 587.51520 ± 436.287
2025-05-08 07:22:03,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [507.23923, 893.8153, 1417.2151, 787.46796, 47.195957, 425.7906, 40.715836, 34.346706, 895.98004, 825.38477]
2025-05-08 07:22:03,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [197.0, 314.0, 517.0, 312.0, 49.0, 193.0, 46.0, 40.0, 354.0, 288.0]
2025-05-08 07:22:04,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 57 minutes, 45 seconds)
2025-05-08 07:24:45,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:24:50,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 904.18799 ± 650.021
2025-05-08 07:24:50,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [193.739, 807.9263, 418.95142, 1380.7484, 1221.3478, 1353.1956, 637.8525, 2338.8027, 35.377693, 653.9376]
2025-05-08 07:24:50,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [98.0, 272.0, 139.0, 505.0, 410.0, 449.0, 247.0, 890.0, 38.0, 264.0]
2025-05-08 07:24:50,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 55 minutes, 46 seconds)
2025-05-08 07:27:34,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:27:36,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 535.90015 ± 262.063
2025-05-08 07:27:36,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [824.62866, 752.3913, 876.42706, 277.6332, 399.34116, 639.054, 744.5815, 312.54874, 496.0655, 36.330757]
2025-05-08 07:27:36,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [278.0, 237.0, 286.0, 113.0, 149.0, 210.0, 239.0, 123.0, 176.0, 47.0]
2025-05-08 07:27:36,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 53 minutes, 25 seconds)
2025-05-08 07:30:12,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:30:15,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 819.08594 ± 565.594
2025-05-08 07:30:15,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [190.1636, 771.34827, 1296.1882, 427.3078, 929.89294, 774.618, 100.30134, 646.50104, 2181.5942, 872.94366]
2025-05-08 07:30:15,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [94.0, 280.0, 424.0, 162.0, 389.0, 250.0, 81.0, 238.0, 793.0, 324.0]
2025-05-08 07:30:15,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 50 minutes, 18 seconds)
2025-05-08 07:32:55,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:32:57,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 590.11310 ± 413.935
2025-05-08 07:32:57,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [748.44525, 711.8843, 425.43332, 214.95674, 43.072475, 102.39685, 579.6453, 751.84216, 1541.7217, 781.73236]
2025-05-08 07:32:57,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [233.0, 224.0, 152.0, 94.0, 49.0, 57.0, 233.0, 252.0, 573.0, 245.0]
2025-05-08 07:32:57,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 46 minutes, 51 seconds)
2025-05-08 07:35:38,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:35:40,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 562.41431 ± 382.233
2025-05-08 07:35:40,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [378.04874, 40.750538, 35.64246, 1276.6681, 908.55096, 670.3262, 815.3714, 174.32373, 687.3585, 637.10254]
2025-05-08 07:35:40,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [142.0, 40.0, 37.0, 442.0, 299.0, 230.0, 276.0, 82.0, 230.0, 227.0]
2025-05-08 07:35:40,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 43 minutes, 28 seconds)
2025-05-08 07:38:19,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:38:21,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 531.92023 ± 223.252
2025-05-08 07:38:21,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [639.07367, 554.7494, 27.493544, 637.5462, 735.5855, 740.42566, 324.4531, 681.183, 672.252, 306.44012]
2025-05-08 07:38:21,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [220.0, 189.0, 29.0, 262.0, 236.0, 237.0, 127.0, 250.0, 222.0, 133.0]
2025-05-08 07:38:21,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 40 minutes, 7 seconds)
2025-05-08 07:41:01,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:41:04,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 578.58667 ± 557.251
2025-05-08 07:41:04,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [79.01131, 1632.731, 593.24884, 61.67543, 990.3044, 1253.0437, 169.52592, 9.668723, 915.93945, 80.7184]
2025-05-08 07:41:04,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [87.0, 614.0, 232.0, 39.0, 352.0, 462.0, 100.0, 13.0, 330.0, 79.0]
2025-05-08 07:41:04,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 36 minutes, 58 seconds)
2025-05-08 07:43:45,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:43:47,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 390.82657 ± 335.064
2025-05-08 07:43:47,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [56.167217, 983.2872, 270.57626, 79.2406, 676.2486, 236.62067, 765.0141, 700.6315, 96.06629, 44.41331]
2025-05-08 07:43:47,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [32.0, 362.0, 113.0, 85.0, 264.0, 112.0, 286.0, 272.0, 53.0, 26.0]
2025-05-08 07:43:47,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 34 minutes, 40 seconds)
2025-05-08 07:46:26,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:46:29,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 676.52069 ± 778.341
2025-05-08 07:46:29,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [190.22339, 437.20428, 36.745033, 838.07495, 972.6532, 2756.8643, 524.2344, 43.19322, 27.886423, 938.1278]
2025-05-08 07:46:29,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [91.0, 175.0, 41.0, 286.0, 327.0, 1000.0, 203.0, 25.0, 31.0, 336.0]
2025-05-08 07:46:29,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 32 minutes, 1 second)
2025-05-08 07:49:11,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:49:13,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 496.16147 ± 582.955
2025-05-08 07:49:13,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [524.8112, 1939.8666, 45.14422, 458.5685, 135.31097, 1043.5698, 705.4306, 38.106735, 48.681427, 22.12469]
2025-05-08 07:49:13,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [230.0, 719.0, 51.0, 189.0, 69.0, 409.0, 291.0, 28.0, 47.0, 30.0]
2025-05-08 07:49:13,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 29 minutes, 22 seconds)
2025-05-08 07:51:56,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:51:59,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 759.45557 ± 618.707
2025-05-08 07:51:59,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [403.07422, 66.08407, 1921.204, 1724.9955, 431.68845, 60.792778, 1102.0068, 292.18567, 749.7487, 842.7752]
2025-05-08 07:51:59,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [148.0, 38.0, 677.0, 590.0, 170.0, 50.0, 359.0, 118.0, 238.0, 271.0]
2025-05-08 07:51:59,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 27 minutes, 10 seconds)
2025-05-08 07:54:41,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:54:43,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 379.57193 ± 266.833
2025-05-08 07:54:43,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [438.8779, 24.272646, 32.32951, 162.40547, 417.53226, 416.6411, 527.83185, 180.48346, 782.59576, 812.7496]
2025-05-08 07:54:43,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [169.0, 38.0, 36.0, 81.0, 162.0, 182.0, 199.0, 81.0, 257.0, 261.0]
2025-05-08 07:54:43,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 24 minutes, 39 seconds)
2025-05-08 07:57:21,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:57:25,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 961.79065 ± 782.015
2025-05-08 07:57:25,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [2269.6646, 485.46356, 1664.9734, 338.24167, 316.98483, 967.0331, 2276.0293, 824.3903, 442.3264, 32.799076]
2025-05-08 07:57:25,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [774.0, 173.0, 596.0, 150.0, 129.0, 310.0, 784.0, 288.0, 170.0, 36.0]
2025-05-08 07:57:25,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 21 minutes, 51 seconds)
2025-05-08 08:00:07,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:00:10,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 539.25977 ± 463.932
2025-05-08 08:00:10,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [1384.727, 770.20074, 103.64134, 849.74536, 182.34886, 864.3502, 85.49657, 1008.65894, 99.19382, 44.234863]
2025-05-08 08:00:10,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [471.0, 297.0, 90.0, 320.0, 85.0, 342.0, 83.0, 393.0, 83.0, 43.0]
2025-05-08 08:00:10,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 19 minutes, 19 seconds)
2025-05-08 08:02:50,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:02:51,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 453.97623 ± 346.039
2025-05-08 08:02:51,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [130.52963, 323.59012, 447.55145, 854.2514, 1035.6262, 677.7659, 34.008675, 58.742897, 798.797, 178.89879]
2025-05-08 08:02:51,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [66.0, 135.0, 186.0, 287.0, 326.0, 239.0, 42.0, 39.0, 263.0, 80.0]
2025-05-08 08:02:51,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 16 minutes, 23 seconds)
2025-05-08 08:05:30,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:05:33,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 576.60187 ± 838.828
2025-05-08 08:05:33,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [64.93284, 512.3792, 40.282375, 31.941559, 355.123, 41.596066, 420.15994, 913.164, 2961.959, 424.48044]
2025-05-08 08:05:33,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [67.0, 240.0, 44.0, 38.0, 178.0, 45.0, 179.0, 322.0, 1000.0, 171.0]
2025-05-08 08:05:33,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 13 minutes, 16 seconds)
2025-05-08 08:08:16,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:08:18,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 569.64655 ± 333.251
2025-05-08 08:08:18,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [211.62851, 751.80383, 29.66401, 656.3082, 303.5124, 1254.9408, 527.8516, 630.69696, 872.37683, 457.6822]
2025-05-08 08:08:18,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [93.0, 277.0, 36.0, 267.0, 135.0, 408.0, 197.0, 235.0, 282.0, 177.0]
2025-05-08 08:08:18,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 10 minutes, 40 seconds)
2025-05-08 08:10:57,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:11:00,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 631.66577 ± 461.846
2025-05-08 08:11:00,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [91.27718, 26.325903, 408.82925, 572.8739, 756.19495, 1327.4813, 200.22122, 1019.8023, 1388.9897, 524.662]
2025-05-08 08:11:00,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [52.0, 28.0, 183.0, 220.0, 295.0, 446.0, 91.0, 363.0, 466.0, 191.0]
2025-05-08 08:11:00,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 7 minutes, 50 seconds)
2025-05-08 08:13:42,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:13:45,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 714.26202 ± 516.877
2025-05-08 08:13:45,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [734.502, 833.7121, 73.86668, 43.271564, 863.83203, 646.3018, 469.27423, 2018.5458, 666.5213, 792.7926]
2025-05-08 08:13:45,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [229.0, 262.0, 46.0, 46.0, 301.0, 211.0, 173.0, 703.0, 226.0, 245.0]
2025-05-08 08:13:45,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 5 minutes, 10 seconds)
2025-05-08 08:16:29,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:16:34,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 1102.55017 ± 862.853
2025-05-08 08:16:34,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [659.89746, 893.9566, 864.6391, 389.0751, 2761.4053, 62.057026, 909.1221, 2733.2415, 819.21814, 932.88995]
2025-05-08 08:16:34,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [218.0, 318.0, 279.0, 147.0, 987.0, 37.0, 307.0, 1000.0, 272.0, 334.0]
2025-05-08 08:16:34,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 3 minutes, 5 seconds)
2025-05-08 08:19:24,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:19:26,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 542.26788 ± 479.154
2025-05-08 08:19:26,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [31.303299, 510.8476, 251.43031, 341.62112, 741.9083, 792.26404, 828.94946, 1700.5222, 190.86308, 32.968937]
2025-05-08 08:19:26,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [31.0, 200.0, 111.0, 141.0, 256.0, 252.0, 266.0, 606.0, 86.0, 37.0]
2025-05-08 08:19:26,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 1 minute, 7 seconds)
2025-05-08 08:22:27,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:22:31,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 884.50378 ± 778.245
2025-05-08 08:22:31,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [931.1378, 276.617, 1541.5586, 867.22925, 113.38643, 2814.8933, 1009.1162, 849.7424, 350.77148, 90.584595]
2025-05-08 08:22:31,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [310.0, 114.0, 538.0, 311.0, 62.0, 957.0, 338.0, 282.0, 136.0, 110.0]
2025-05-08 08:22:31,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 80/100 (estimated time remaining: 59 minutes, 41 seconds)
2025-05-08 08:25:25,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:25:29,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 740.80695 ± 501.367
2025-05-08 08:25:29,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [1022.5722, 57.64308, 1762.795, 574.76276, 1078.2667, 211.408, 255.2847, 470.88968, 798.3024, 1176.145]
2025-05-08 08:25:29,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [335.0, 65.0, 635.0, 229.0, 385.0, 112.0, 107.0, 204.0, 251.0, 432.0]
2025-05-08 08:25:29,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 81/100 (estimated time remaining: 57 minutes, 57 seconds)
2025-05-08 08:28:11,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:28:14,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 613.87695 ± 309.938
2025-05-08 08:28:14,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [799.8624, 445.79636, 854.4384, 827.97894, 765.04, 926.5858, 36.74736, 709.2632, 48.982464, 724.07465]
2025-05-08 08:28:14,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [278.0, 176.0, 277.0, 261.0, 237.0, 334.0, 36.0, 236.0, 62.0, 246.0]
2025-05-08 08:28:14,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 82/100 (estimated time remaining: 55 minutes, 2 seconds)
2025-05-08 08:31:06,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:31:09,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 624.18951 ± 349.425
2025-05-08 08:31:09,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [887.29285, 828.9371, 1018.06476, 209.41019, 917.05646, 760.1196, 69.27582, 747.5373, 760.30115, 43.899387]
2025-05-08 08:31:09,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [298.0, 276.0, 356.0, 91.0, 294.0, 239.0, 42.0, 246.0, 288.0, 52.0]
2025-05-08 08:31:09,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 83/100 (estimated time remaining: 52 minutes, 28 seconds)
2025-05-08 08:33:59,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:34:05,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 1178.33398 ± 956.443
2025-05-08 08:34:05,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [447.65076, 2421.2231, 76.591064, 2426.244, 37.011898, 76.753426, 2456.283, 1559.8638, 1126.3779, 1155.3407]
2025-05-08 08:34:05,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [176.0, 824.0, 47.0, 863.0, 39.0, 80.0, 875.0, 568.0, 449.0, 427.0]
2025-05-08 08:34:05,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 84/100 (estimated time remaining: 49 minutes, 46 seconds)
2025-05-08 08:37:00,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:37:02,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 551.95996 ± 283.805
2025-05-08 08:37:02,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [344.00336, 864.1931, 805.7018, 721.2549, 696.9906, 364.46973, 134.54047, 54.565266, 740.3928, 793.48755]
2025-05-08 08:37:02,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [135.0, 285.0, 249.0, 222.0, 216.0, 146.0, 86.0, 58.0, 228.0, 244.0]
2025-05-08 08:37:02,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 85/100 (estimated time remaining: 46 minutes, 26 seconds)
2025-05-08 08:39:52,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:39:57,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 964.42175 ± 640.485
2025-05-08 08:39:57,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [425.3701, 10.110193, 381.77466, 959.839, 1837.3267, 276.25613, 1597.8473, 920.39624, 1803.8086, 1431.489]
2025-05-08 08:39:57,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [168.0, 13.0, 147.0, 351.0, 647.0, 117.0, 579.0, 339.0, 613.0, 511.0]
2025-05-08 08:39:57,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 86/100 (estimated time remaining: 43 minutes, 23 seconds)
2025-05-08 08:42:45,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:42:49,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 672.80164 ± 821.098
2025-05-08 08:42:49,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [706.77136, 179.50104, 30.79634, 576.50385, 1176.332, 284.25427, 2938.6116, 173.1772, 496.0069, 166.06216]
2025-05-08 08:42:49,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [267.0, 124.0, 32.0, 225.0, 437.0, 129.0, 1000.0, 80.0, 180.0, 80.0]
2025-05-08 08:42:49,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 87/100 (estimated time remaining: 40 minutes, 50 seconds)
2025-05-08 08:45:43,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:45:45,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 546.02222 ± 506.174
2025-05-08 08:45:45,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [1429.3143, 645.9366, 1050.173, 281.9008, 1263.2399, 535.5897, 63.354477, 37.29102, 101.31985, 52.102383]
2025-05-08 08:45:45,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [508.0, 253.0, 412.0, 115.0, 419.0, 195.0, 50.0, 43.0, 74.0, 60.0]
2025-05-08 08:45:45,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 88/100 (estimated time remaining: 37 minutes, 58 seconds)
2025-05-08 08:48:36,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:48:38,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 405.73120 ± 383.307
2025-05-08 08:48:38,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [48.294025, 747.89276, 52.0974, 666.54095, 943.97675, 937.9065, 577.3471, 12.546957, 33.882496, 36.82683]
2025-05-08 08:48:38,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [60.0, 260.0, 36.0, 238.0, 306.0, 319.0, 194.0, 13.0, 37.0, 43.0]
2025-05-08 08:48:38,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 89/100 (estimated time remaining: 34 minutes, 54 seconds)
2025-05-08 08:51:31,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:51:37,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 1082.45996 ± 817.389
2025-05-08 08:51:37,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [806.12695, 1007.41345, 214.70729, 991.9576, 2304.392, 2832.9163, 1128.6556, 824.71265, 74.6393, 639.0778]
2025-05-08 08:51:37,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [256.0, 310.0, 94.0, 318.0, 828.0, 1000.0, 424.0, 255.0, 58.0, 233.0]
2025-05-08 08:51:37,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 90/100 (estimated time remaining: 32 minutes, 4 seconds)
2025-05-08 08:54:26,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:54:27,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 187.42592 ± 228.557
2025-05-08 08:54:27,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [28.61671, 699.07904, 55.320473, 39.959663, 53.727905, 245.60924, 107.56955, 52.268986, 44.387352, 547.7202]
2025-05-08 08:54:27,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [31.0, 233.0, 35.0, 48.0, 51.0, 109.0, 82.0, 55.0, 55.0, 205.0]
2025-05-08 08:54:27,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 91/100 (estimated time remaining: 29 minutes)
2025-05-08 08:57:21,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:57:25,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 683.44464 ± 581.141
2025-05-08 08:57:25,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [43.860954, 588.0093, 384.89932, 216.34233, 637.6855, 657.1397, 1009.2982, 422.54623, 2255.2664, 619.3987]
2025-05-08 08:57:25,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [48.0, 210.0, 144.0, 154.0, 275.0, 256.0, 375.0, 163.0, 812.0, 230.0]
2025-05-08 08:57:25,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 92/100 (estimated time remaining: 26 minutes, 16 seconds)
2025-05-08 09:00:15,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:00:18,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 769.11285 ± 306.880
2025-05-08 09:00:18,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [562.9253, 855.0114, 210.82394, 1296.9462, 788.91125, 847.8794, 913.8931, 1162.8912, 544.2025, 507.6435]
2025-05-08 09:00:18,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [214.0, 274.0, 93.0, 465.0, 259.0, 257.0, 330.0, 382.0, 199.0, 193.0]
2025-05-08 09:00:18,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 93/100 (estimated time remaining: 23 minutes, 16 seconds)
2025-05-08 09:03:11,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:03:14,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 618.30792 ± 527.012
2025-05-08 09:03:14,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [708.6049, 52.205227, 26.62429, 1453.7163, 175.301, 568.2108, 771.709, 1623.6321, 209.39505, 593.68]
2025-05-08 09:03:14,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [275.0, 50.0, 32.0, 550.0, 87.0, 222.0, 300.0, 575.0, 118.0, 228.0]
2025-05-08 09:03:14,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 94/100 (estimated time remaining: 20 minutes, 26 seconds)
2025-05-08 09:06:18,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:06:20,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 479.39365 ± 435.547
2025-05-08 09:06:20,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [945.3934, 91.202736, 65.91002, 1092.8053, 472.57953, 50.075798, 1107.8418, 787.2725, 141.57271, 39.282578]
2025-05-08 09:06:20,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [309.0, 52.0, 67.0, 356.0, 178.0, 51.0, 374.0, 287.0, 86.0, 40.0]
2025-05-08 09:06:20,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 95/100 (estimated time remaining: 17 minutes, 39 seconds)
2025-05-08 09:09:03,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:09:05,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 462.84750 ± 448.566
2025-05-08 09:09:05,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [363.78763, 790.8459, 25.303867, 36.94342, 181.84277, 1295.8175, 59.935753, 45.904827, 860.4832, 967.61017]
2025-05-08 09:09:05,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [149.0, 285.0, 27.0, 39.0, 135.0, 427.0, 37.0, 54.0, 328.0, 320.0]
2025-05-08 09:09:05,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 96/100 (estimated time remaining: 14 minutes, 38 seconds)
2025-05-08 09:11:56,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:11:59,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 679.37537 ± 469.595
2025-05-08 09:11:59,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [615.24225, 1277.354, 1168.2849, 41.634945, 44.540936, 961.2796, 168.80133, 880.2796, 384.7523, 1251.5841]
2025-05-08 09:11:59,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [227.0, 393.0, 367.0, 48.0, 52.0, 297.0, 78.0, 329.0, 146.0, 448.0]
2025-05-08 09:11:59,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 97/100 (estimated time remaining: 11 minutes, 39 seconds)
2025-05-08 09:14:53,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:14:57,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 1006.03778 ± 705.537
2025-05-08 09:14:57,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [2834.5254, 1311.6274, 669.5108, 1266.7743, 848.19476, 928.71216, 926.17346, 13.800853, 797.898, 463.15976]
2025-05-08 09:14:57,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [1000.0, 493.0, 249.0, 468.0, 292.0, 302.0, 347.0, 16.0, 252.0, 163.0]
2025-05-08 09:14:57,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 98/100 (estimated time remaining: 8 minutes, 47 seconds)
2025-05-08 09:17:44,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:17:48,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 696.56250 ± 553.194
2025-05-08 09:17:48,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [124.00998, 1337.7482, 325.752, 165.35379, 732.07635, 1994.0669, 880.18555, 545.06757, 333.77682, 527.58746]
2025-05-08 09:17:48,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [86.0, 480.0, 156.0, 96.0, 294.0, 736.0, 276.0, 211.0, 147.0, 199.0]
2025-05-08 09:17:48,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 49 seconds)
2025-05-08 09:20:37,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:20:40,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 597.51056 ± 681.844
2025-05-08 09:20:40,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [2264.4355, 26.454897, 718.34424, 191.6993, 180.26552, 62.020344, 505.21707, 34.25167, 635.2725, 1357.1445]
2025-05-08 09:20:40,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [855.0, 28.0, 253.0, 88.0, 82.0, 66.0, 207.0, 38.0, 247.0, 501.0]
2025-05-08 09:20:40,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1097 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 52 seconds)
2025-05-08 09:23:27,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:23:29,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1119 [DEBUG]: Total Reward: 314.19086 ± 227.174
2025-05-08 09:23:29,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1120 [DEBUG]: All rewards: [821.0869, 213.50671, 459.80194, 522.9592, 413.80402, 145.33902, 36.866047, 156.66318, 260.63242, 111.24913]
2025-05-08 09:23:29,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [306.0, 101.0, 182.0, 232.0, 152.0, 83.0, 47.0, 90.0, 110.0, 59.0]
2025-05-08 09:23:29,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1149 [DEBUG]: Training session finished
