2025-05-11 03:29:44,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mem2
2025-05-11 03:29:44,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mem2
2025-05-11 03:29:44,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7fc7f53cde80>}
2025-05-11 03:29:44,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1111 [DEBUG]: using device: cpu
2025-05-11 03:29:44,008 baseline-bpql-noisy-humanoid:77 [WARNING]: args.assumed_delay != args.horizon: 2 != 24
2025-05-11 03:29:44,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1133 [INFO]: Creating new trainer
2025-05-11 03:29:44,023 baseline-bpql-noisy-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=410, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-05-11 03:29:44,023 baseline-bpql-noisy-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-11 03:29:47,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1194 [DEBUG]: Starting training session...
2025-05-11 03:29:47,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 1/100
2025-05-11 03:33:24,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:33:25,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 198.12463 ± 44.945
2025-05-11 03:33:25,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [196.17715, 206.45598, 186.70164, 166.8288, 130.39957, 317.2449, 187.72188, 198.73366, 187.78229, 203.20038]
2025-05-11 03:33:25,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [43.0, 45.0, 42.0, 32.0, 25.0, 67.0, 43.0, 44.0, 43.0, 45.0]
2025-05-11 03:33:25,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (198.12) for latency ExtremeClogL1U23
2025-05-11 03:33:25,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 03:33:25,721 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 03:33:25,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 6 hours, 44 seconds)
2025-05-11 03:37:33,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:37:35,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 371.85672 ± 128.386
2025-05-11 03:37:35,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [84.54504, 357.26688, 445.1426, 384.98413, 321.25134, 430.2593, 248.06958, 392.2431, 470.7196, 584.08563]
2025-05-11 03:37:35,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 74.0, 97.0, 74.0, 62.0, 96.0, 49.0, 77.0, 92.0, 119.0]
2025-05-11 03:37:35,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (371.86) for latency ExtremeClogL1U23
2025-05-11 03:37:35,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 03:37:35,591 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 03:37:35,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 22 minutes, 37 seconds)
2025-05-11 03:41:39,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:41:41,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 425.42871 ± 119.266
2025-05-11 03:41:41,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [599.1971, 421.17328, 375.1662, 437.44882, 321.82632, 506.04352, 642.9796, 245.86333, 321.81027, 382.77872]
2025-05-11 03:41:41,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 80.0, 72.0, 87.0, 68.0, 110.0, 142.0, 50.0, 61.0, 72.0]
2025-05-11 03:41:41,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (425.43) for latency ExtremeClogL1U23
2025-05-11 03:41:41,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 03:41:41,637 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 03:41:41,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 6 hours, 25 minutes, 3 seconds)
2025-05-11 03:45:44,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:45:46,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 371.24847 ± 103.989
2025-05-11 03:45:46,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [198.8964, 382.7688, 574.74097, 414.69025, 326.09692, 372.1829, 431.25983, 405.30392, 205.90642, 400.63834]
2025-05-11 03:45:46,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 73.0, 118.0, 78.0, 65.0, 82.0, 81.0, 76.0, 44.0, 74.0]
2025-05-11 03:45:46,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 6 hours, 23 minutes, 39 seconds)
2025-05-11 03:49:49,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:49:51,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 455.11710 ± 60.047
2025-05-11 03:49:51,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [553.2576, 328.1966, 465.51465, 447.437, 465.02835, 412.77176, 431.7259, 530.287, 426.56042, 490.39136]
2025-05-11 03:49:51,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 62.0, 88.0, 92.0, 87.0, 82.0, 82.0, 101.0, 81.0, 95.0]
2025-05-11 03:49:51,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (455.12) for latency ExtremeClogL1U23
2025-05-11 03:49:51,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 03:49:51,351 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 03:49:51,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 6 hours, 21 minutes, 21 seconds)
2025-05-11 03:53:54,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:53:56,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 403.61618 ± 75.556
2025-05-11 03:53:56,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [368.4268, 340.64886, 409.395, 354.58295, 423.7143, 284.5559, 427.5433, 572.5736, 379.29498, 475.42606]
2025-05-11 03:53:56,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 76.0, 82.0, 76.0, 92.0, 62.0, 86.0, 113.0, 79.0, 107.0]
2025-05-11 03:53:56,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 6 hours, 25 minutes, 31 seconds)
2025-05-11 03:57:59,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:58:01,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 386.80832 ± 97.232
2025-05-11 03:58:01,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [200.4731, 420.47882, 418.50705, 340.17026, 568.5084, 424.4207, 311.3827, 412.27454, 471.7735, 300.0945]
2025-05-11 03:58:01,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [39.0, 92.0, 81.0, 64.0, 110.0, 82.0, 60.0, 78.0, 89.0, 61.0]
2025-05-11 03:58:01,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 6 hours, 19 minutes, 56 seconds)
2025-05-11 04:02:07,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:02:09,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 415.53802 ± 101.965
2025-05-11 04:02:09,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [389.96277, 403.73505, 459.7614, 379.10907, 269.15445, 325.29153, 335.21738, 634.6515, 541.4563, 417.04074]
2025-05-11 04:02:09,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 85.0, 99.0, 80.0, 55.0, 69.0, 70.0, 120.0, 117.0, 93.0]
2025-05-11 04:02:09,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 6 hours, 16 minutes, 28 seconds)
2025-05-11 04:06:27,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:06:29,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 522.91193 ± 303.849
2025-05-11 04:06:29,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [394.89062, 281.9852, 145.66058, 558.9451, 718.80164, 519.7485, 73.67117, 659.9014, 1163.6244, 711.8905]
2025-05-11 04:06:29,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 58.0, 28.0, 110.0, 137.0, 112.0, 16.0, 130.0, 237.0, 137.0]
2025-05-11 04:06:29,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (522.91) for latency ExtremeClogL1U23
2025-05-11 04:06:29,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 04:06:29,480 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 04:06:29,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 17 minutes, 6 seconds)
2025-05-11 04:10:45,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:10:47,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 533.27625 ± 131.253
2025-05-11 04:10:47,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [755.46716, 689.25543, 449.87482, 515.4272, 588.06866, 566.51605, 388.18323, 622.32117, 451.2847, 306.364]
2025-05-11 04:10:47,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [148.0, 148.0, 98.0, 99.0, 110.0, 109.0, 74.0, 139.0, 90.0, 60.0]
2025-05-11 04:10:47,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (533.28) for latency ExtremeClogL1U23
2025-05-11 04:10:47,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 04:10:47,485 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 04:10:47,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 6 hours, 16 minutes, 50 seconds)
2025-05-11 04:14:57,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:15:00,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 490.80649 ± 111.048
2025-05-11 04:15:00,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [493.9575, 595.46716, 364.60464, 506.86127, 570.63324, 386.19995, 551.4759, 353.28036, 383.33557, 702.2497]
2025-05-11 04:15:00,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 131.0, 79.0, 109.0, 121.0, 81.0, 110.0, 77.0, 85.0, 143.0]
2025-05-11 04:15:00,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 6 hours, 15 minutes)
2025-05-11 04:19:07,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:19:09,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 456.61133 ± 174.264
2025-05-11 04:19:09,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [538.8111, 473.3852, 130.16444, 480.32147, 388.08643, 155.25871, 514.4402, 613.4106, 664.62823, 607.6067]
2025-05-11 04:19:09,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 106.0, 25.0, 105.0, 83.0, 30.0, 98.0, 114.0, 129.0, 117.0]
2025-05-11 04:19:09,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 6 hours, 12 minutes, 1 second)
2025-05-11 04:23:20,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:23:23,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 578.47601 ± 178.739
2025-05-11 04:23:23,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [522.7874, 605.0073, 580.80865, 536.1034, 436.84824, 166.2759, 711.7543, 612.6335, 781.8424, 830.69934]
2025-05-11 04:23:23,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 116.0, 109.0, 103.0, 83.0, 32.0, 152.0, 118.0, 157.0, 162.0]
2025-05-11 04:23:23,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (578.48) for latency ExtremeClogL1U23
2025-05-11 04:23:23,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 04:23:23,306 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 04:23:23,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 6 hours, 9 minutes, 28 seconds)
2025-05-11 04:27:36,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:27:38,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 412.02628 ± 230.208
2025-05-11 04:27:38,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [646.83826, 499.038, 146.63043, 763.1882, 436.1308, 72.86724, 232.1391, 155.29588, 569.53125, 598.60376]
2025-05-11 04:27:38,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 94.0, 31.0, 146.0, 81.0, 15.0, 48.0, 30.0, 123.0, 114.0]
2025-05-11 04:27:38,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 6 hours, 3 minutes, 42 seconds)
2025-05-11 04:31:50,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:31:53,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 541.67365 ± 176.842
2025-05-11 04:31:53,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [592.64905, 672.75653, 670.0745, 521.7992, 747.7865, 603.19086, 495.42712, 558.13226, 67.9687, 486.95163]
2025-05-11 04:31:53,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 132.0, 130.0, 108.0, 161.0, 133.0, 107.0, 110.0, 14.0, 102.0]
2025-05-11 04:31:53,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 58 minutes, 35 seconds)
2025-05-11 04:36:04,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:36:07,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 587.94257 ± 220.194
2025-05-11 04:36:07,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [793.17474, 744.9319, 601.4425, 148.11682, 207.49924, 635.5791, 609.6381, 763.3006, 802.45966, 573.283]
2025-05-11 04:36:07,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [171.0, 142.0, 131.0, 29.0, 40.0, 136.0, 120.0, 145.0, 153.0, 110.0]
2025-05-11 04:36:07,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (587.94) for latency ExtremeClogL1U23
2025-05-11 04:36:07,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 04:36:07,382 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 04:36:07,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 54 minutes, 48 seconds)
2025-05-11 04:40:16,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:40:19,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 565.13153 ± 189.547
2025-05-11 04:40:19,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [603.7896, 479.56125, 533.8857, 717.185, 551.74963, 84.26663, 590.01324, 853.32117, 566.1803, 671.3632]
2025-05-11 04:40:19,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 93.0, 99.0, 136.0, 108.0, 17.0, 114.0, 169.0, 105.0, 128.0]
2025-05-11 04:40:19,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 51 minutes, 13 seconds)
2025-05-11 04:44:35,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:44:38,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 638.20685 ± 151.060
2025-05-11 04:44:38,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [846.29987, 832.72424, 686.2145, 514.3053, 331.93643, 649.44055, 561.89154, 544.11, 632.856, 782.28986]
2025-05-11 04:44:38,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 161.0, 128.0, 101.0, 66.0, 122.0, 106.0, 101.0, 119.0, 149.0]
2025-05-11 04:44:38,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (638.21) for latency ExtremeClogL1U23
2025-05-11 04:44:38,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 04:44:38,093 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 04:44:38,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 48 minutes, 26 seconds)
2025-05-11 04:48:58,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:49:00,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 553.10748 ± 265.288
2025-05-11 04:49:00,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [495.5292, 566.79895, 339.85608, 278.91284, 527.0631, 911.3661, 1074.6449, 135.90367, 604.8678, 596.13214]
2025-05-11 04:49:00,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 112.0, 69.0, 52.0, 102.0, 180.0, 207.0, 26.0, 116.0, 112.0]
2025-05-11 04:49:00,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 46 minutes, 20 seconds)
2025-05-11 04:53:07,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:53:10,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 682.36829 ± 204.397
2025-05-11 04:53:10,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1124.8274, 686.60406, 783.25073, 673.8058, 784.0471, 613.3645, 633.33606, 360.11166, 394.67477, 769.661]
2025-05-11 04:53:10,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [223.0, 133.0, 148.0, 132.0, 143.0, 120.0, 122.0, 74.0, 73.0, 146.0]
2025-05-11 04:53:10,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (682.37) for latency ExtremeClogL1U23
2025-05-11 04:53:10,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 04:53:10,629 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 04:53:10,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 40 minutes, 40 seconds)
2025-05-11 04:57:32,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:57:34,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 551.92224 ± 268.134
2025-05-11 04:57:34,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1179.1499, 592.48157, 525.82635, 693.35785, 595.41156, 233.40585, 367.43567, 641.6849, 155.05513, 535.4136]
2025-05-11 04:57:34,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [227.0, 111.0, 101.0, 137.0, 117.0, 48.0, 68.0, 124.0, 30.0, 103.0]
2025-05-11 04:57:34,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 38 minutes, 59 seconds)
2025-05-11 05:01:47,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:01:51,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 756.17749 ± 265.622
2025-05-11 05:01:51,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [977.6583, 419.63492, 768.43335, 355.30127, 990.8009, 507.9663, 1098.7249, 776.1938, 1094.316, 572.7453]
2025-05-11 05:01:51,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [205.0, 78.0, 165.0, 67.0, 214.0, 105.0, 229.0, 164.0, 212.0, 110.0]
2025-05-11 05:01:51,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (756.18) for latency ExtremeClogL1U23
2025-05-11 05:01:51,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 05:01:51,238 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 05:01:51,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 35 minutes, 58 seconds)
2025-05-11 05:06:05,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:06:07,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 583.04083 ± 200.464
2025-05-11 05:06:07,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [514.5044, 730.17035, 744.18396, 785.4134, 565.0015, 366.64212, 157.23347, 672.81134, 825.8689, 468.57834]
2025-05-11 05:06:07,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 134.0, 143.0, 152.0, 122.0, 73.0, 30.0, 143.0, 166.0, 98.0]
2025-05-11 05:06:07,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 30 minutes, 59 seconds)
2025-05-11 05:10:19,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:10:21,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 580.84161 ± 227.559
2025-05-11 05:10:21,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [865.5605, 664.1014, 914.2759, 506.49496, 152.2648, 385.9475, 657.6658, 448.37604, 424.29633, 789.4333]
2025-05-11 05:10:21,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 136.0, 173.0, 97.0, 31.0, 73.0, 126.0, 89.0, 78.0, 149.0]
2025-05-11 05:10:21,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 24 minutes, 25 seconds)
2025-05-11 05:14:34,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:14:37,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 658.09528 ± 260.495
2025-05-11 05:14:37,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [638.7097, 479.9192, 289.14578, 783.88086, 467.10114, 339.8394, 998.65283, 634.82306, 1115.9896, 832.8908]
2025-05-11 05:14:37,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 90.0, 59.0, 158.0, 90.0, 65.0, 196.0, 134.0, 222.0, 165.0]
2025-05-11 05:14:37,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 21 minutes, 45 seconds)
2025-05-11 05:18:44,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:18:47,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 549.89832 ± 321.433
2025-05-11 05:18:47,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [176.03473, 460.86313, 820.1042, 171.0363, 185.67366, 592.59143, 937.05365, 356.75385, 1126.2733, 672.5985]
2025-05-11 05:18:47,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 92.0, 163.0, 33.0, 36.0, 128.0, 189.0, 73.0, 238.0, 134.0]
2025-05-11 05:18:47,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 13 minutes, 51 seconds)
2025-05-11 05:23:01,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:23:05,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 680.42908 ± 163.473
2025-05-11 05:23:05,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [963.1145, 710.0322, 412.19397, 679.6282, 587.68225, 788.6744, 512.2662, 909.73926, 565.1349, 675.8245]
2025-05-11 05:23:05,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [188.0, 147.0, 80.0, 152.0, 114.0, 157.0, 113.0, 178.0, 123.0, 133.0]
2025-05-11 05:23:05,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 9 minutes, 57 seconds)
2025-05-11 05:27:23,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:27:26,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 615.53119 ± 249.465
2025-05-11 05:27:26,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [872.77075, 822.90686, 735.5823, 150.02858, 626.6848, 756.2362, 664.1106, 121.126976, 698.03516, 707.8298]
2025-05-11 05:27:26,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 172.0, 141.0, 29.0, 112.0, 146.0, 133.0, 26.0, 132.0, 138.0]
2025-05-11 05:27:26,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 5 hours, 6 minutes, 52 seconds)
2025-05-11 05:31:50,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:31:54,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 743.75793 ± 445.978
2025-05-11 05:31:54,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [791.9399, 585.9859, 139.63348, 518.8376, 1266.0775, 1647.0416, 653.805, 998.99335, 710.4651, 124.79959]
2025-05-11 05:31:54,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 119.0, 27.0, 95.0, 249.0, 321.0, 123.0, 195.0, 148.0, 24.0]
2025-05-11 05:31:54,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 5 hours, 5 minutes, 55 seconds)
2025-05-11 05:36:26,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:36:29,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 775.67853 ± 293.962
2025-05-11 05:36:29,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [501.4185, 476.42377, 630.05536, 783.4007, 840.1119, 512.2796, 1456.052, 610.1122, 1107.6465, 839.28467]
2025-05-11 05:36:29,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 92.0, 125.0, 150.0, 160.0, 98.0, 305.0, 130.0, 217.0, 158.0]
2025-05-11 05:36:29,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (775.68) for latency ExtremeClogL1U23
2025-05-11 05:36:29,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 05:36:29,880 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 05:36:29,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 5 hours, 6 minutes, 11 seconds)
2025-05-11 05:40:41,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:40:45,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 812.60327 ± 177.316
2025-05-11 05:40:45,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [680.55884, 1045.0531, 887.8906, 661.7669, 1013.1054, 815.3307, 1015.7401, 756.5131, 452.5001, 797.5736]
2025-05-11 05:40:45,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 202.0, 175.0, 130.0, 198.0, 158.0, 205.0, 154.0, 85.0, 154.0]
2025-05-11 05:40:45,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (812.60) for latency ExtremeClogL1U23
2025-05-11 05:40:45,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 05:40:45,412 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 05:40:45,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 5 hours, 3 minutes, 13 seconds)
2025-05-11 05:45:09,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:45:12,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 546.07965 ± 266.232
2025-05-11 05:45:12,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [731.255, 907.7891, 193.62961, 141.53606, 821.8652, 622.9195, 535.31866, 643.0383, 694.5548, 168.89008]
2025-05-11 05:45:12,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 179.0, 37.0, 27.0, 176.0, 117.0, 116.0, 140.0, 134.0, 36.0]
2025-05-11 05:45:12,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 5 hours, 47 seconds)
2025-05-11 05:49:18,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:49:21,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 654.47070 ± 192.006
2025-05-11 05:49:21,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [652.1109, 756.52783, 734.9105, 547.4627, 675.6333, 834.0338, 146.24356, 871.11865, 696.8177, 629.8484]
2025-05-11 05:49:21,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 146.0, 142.0, 107.0, 132.0, 159.0, 28.0, 183.0, 151.0, 121.0]
2025-05-11 05:49:21,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 53 minutes, 44 seconds)
2025-05-11 05:53:33,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:53:36,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 713.52600 ± 289.002
2025-05-11 05:53:36,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [755.672, 852.21844, 188.00807, 835.7801, 813.28534, 677.6845, 1065.4348, 727.1781, 181.98445, 1038.0148]
2025-05-11 05:53:36,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [162.0, 185.0, 36.0, 160.0, 157.0, 135.0, 202.0, 156.0, 35.0, 211.0]
2025-05-11 05:53:36,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 46 minutes, 34 seconds)
2025-05-11 05:57:54,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:57:58,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 776.36804 ± 448.790
2025-05-11 05:57:58,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [754.0551, 202.30937, 1992.1152, 926.9151, 653.7782, 864.44104, 629.9367, 636.5639, 648.5606, 455.0053]
2025-05-11 05:57:58,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 39.0, 395.0, 197.0, 137.0, 166.0, 124.0, 129.0, 134.0, 92.0]
2025-05-11 05:57:58,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 39 minutes, 8 seconds)
2025-05-11 06:02:14,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:02:18,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 854.65479 ± 290.710
2025-05-11 06:02:18,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [616.8188, 656.8719, 964.3022, 629.7591, 614.4926, 1269.3595, 879.22644, 1400.5094, 1021.9719, 493.2361]
2025-05-11 06:02:18,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 141.0, 194.0, 124.0, 128.0, 254.0, 174.0, 275.0, 193.0, 98.0]
2025-05-11 06:02:18,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (854.65) for latency ExtremeClogL1U23
2025-05-11 06:02:18,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 06:02:18,571 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 06:02:18,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 35 minutes, 52 seconds)
2025-05-11 06:06:34,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:06:40,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1111.68579 ± 658.050
2025-05-11 06:06:40,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [705.6732, 2275.0178, 838.2214, 2015.478, 363.90906, 1580.8326, 202.52814, 1319.6901, 600.9899, 1214.5173]
2025-05-11 06:06:40,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 458.0, 177.0, 405.0, 74.0, 319.0, 39.0, 254.0, 118.0, 259.0]
2025-05-11 06:06:40,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (1111.69) for latency ExtremeClogL1U23
2025-05-11 06:06:40,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 06:06:40,243 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 06:06:40,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 30 minutes, 30 seconds)
2025-05-11 06:10:56,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:11:00,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 797.97156 ± 362.831
2025-05-11 06:11:00,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1062.8962, 732.1831, 820.5806, 235.53171, 1332.7666, 1148.9039, 601.54974, 1194.3961, 571.5154, 279.3922]
2025-05-11 06:11:00,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [214.0, 158.0, 177.0, 48.0, 262.0, 222.0, 125.0, 239.0, 122.0, 56.0]
2025-05-11 06:11:00,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 28 minutes, 29 seconds)
2025-05-11 06:15:25,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:15:29,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 714.26874 ± 198.488
2025-05-11 06:15:29,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [680.978, 687.0337, 849.1375, 1044.2765, 530.6521, 799.4447, 976.603, 692.6988, 389.21564, 492.64722]
2025-05-11 06:15:29,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 131.0, 165.0, 203.0, 104.0, 152.0, 186.0, 132.0, 76.0, 99.0]
2025-05-11 06:15:29,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 26 minutes, 48 seconds)
2025-05-11 06:19:51,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:19:55,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 837.24963 ± 357.367
2025-05-11 06:19:55,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [974.1792, 1693.5043, 480.68222, 993.1582, 658.53156, 377.89145, 659.871, 1101.449, 731.11804, 702.11084]
2025-05-11 06:19:55,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [202.0, 354.0, 88.0, 212.0, 126.0, 81.0, 135.0, 214.0, 146.0, 137.0]
2025-05-11 06:19:55,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 23 minutes, 25 seconds)
2025-05-11 06:24:13,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:24:16,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 681.79132 ± 390.544
2025-05-11 06:24:16,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1045.444, 445.66776, 589.9192, 699.0644, 521.70636, 609.587, 188.73856, 1507.9617, 1029.2859, 180.53825]
2025-05-11 06:24:16,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [203.0, 87.0, 114.0, 138.0, 109.0, 120.0, 40.0, 295.0, 200.0, 35.0]
2025-05-11 06:24:16,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 19 minutes, 14 seconds)
2025-05-11 06:28:35,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:28:40,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 962.90686 ± 504.551
2025-05-11 06:28:40,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [972.9811, 970.1358, 943.7365, 1892.5137, 269.2225, 392.1041, 359.64395, 901.1572, 1511.2755, 1416.2977]
2025-05-11 06:28:40,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [193.0, 186.0, 203.0, 375.0, 51.0, 80.0, 68.0, 175.0, 314.0, 280.0]
2025-05-11 06:28:40,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 15 minutes, 12 seconds)
2025-05-11 06:32:50,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:32:55,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 967.18103 ± 563.232
2025-05-11 06:32:55,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [597.8125, 846.5697, 1433.4545, 2439.918, 731.05023, 863.0318, 658.8025, 422.99677, 570.48114, 1107.6936]
2025-05-11 06:32:55,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 160.0, 289.0, 510.0, 156.0, 179.0, 126.0, 91.0, 113.0, 231.0]
2025-05-11 06:32:55,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 4 hours, 9 minutes, 49 seconds)
2025-05-11 06:37:11,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:37:16,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 947.47327 ± 451.183
2025-05-11 06:37:16,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [735.1769, 712.49146, 467.5293, 849.50867, 1882.1594, 1501.4059, 653.842, 1363.425, 449.4628, 859.7309]
2025-05-11 06:37:16,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 154.0, 96.0, 168.0, 370.0, 313.0, 127.0, 272.0, 97.0, 177.0]
2025-05-11 06:37:16,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 4 hours, 4 minutes)
2025-05-11 06:41:35,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:41:40,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1097.40039 ± 638.989
2025-05-11 06:41:40,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [130.8052, 755.8643, 972.6439, 1792.3187, 634.5649, 808.86615, 837.57715, 990.7808, 1558.7441, 2491.8381]
2025-05-11 06:41:40,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 161.0, 187.0, 354.0, 139.0, 174.0, 181.0, 194.0, 328.0, 494.0]
2025-05-11 06:41:40,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 59 minutes, 17 seconds)
2025-05-11 06:45:56,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:46:00,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 833.54968 ± 311.530
2025-05-11 06:46:00,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [919.76483, 749.16235, 1329.4473, 912.68286, 285.16898, 737.3762, 709.66187, 725.0766, 1384.6232, 582.5331]
2025-05-11 06:46:00,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [181.0, 147.0, 263.0, 200.0, 52.0, 160.0, 155.0, 142.0, 270.0, 113.0]
2025-05-11 06:46:00,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 54 minutes, 43 seconds)
2025-05-11 06:50:20,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:50:25,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 983.54846 ± 397.016
2025-05-11 06:50:25,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1037.9166, 654.2777, 1152.9509, 1749.8223, 632.97577, 966.7094, 658.5589, 1574.3439, 943.5403, 464.38885]
2025-05-11 06:50:25,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [223.0, 125.0, 235.0, 344.0, 120.0, 195.0, 123.0, 314.0, 184.0, 92.0]
2025-05-11 06:50:25,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 50 minutes, 31 seconds)
2025-05-11 06:54:39,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:54:44,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1043.95837 ± 481.535
2025-05-11 06:54:44,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [870.2095, 774.0225, 186.4979, 1444.9185, 854.83203, 1336.629, 2110.2302, 988.21686, 804.0624, 1069.9644]
2025-05-11 06:54:44,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [181.0, 162.0, 36.0, 310.0, 185.0, 265.0, 422.0, 189.0, 157.0, 231.0]
2025-05-11 06:54:44,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 46 minutes, 56 seconds)
2025-05-11 06:58:57,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:59:01,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 894.75964 ± 329.150
2025-05-11 06:59:01,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1364.3124, 1111.4227, 368.2819, 1437.1755, 949.76965, 623.3109, 601.7566, 956.58075, 616.16205, 918.824]
2025-05-11 06:59:01,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [267.0, 219.0, 70.0, 289.0, 208.0, 118.0, 117.0, 183.0, 117.0, 174.0]
2025-05-11 06:59:01,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 41 minutes, 51 seconds)
2025-05-11 07:03:15,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:03:19,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 943.26740 ± 551.042
2025-05-11 07:03:19,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1999.2422, 1815.4719, 1101.5488, 156.05519, 1120.7241, 809.024, 609.562, 687.19916, 511.33572, 622.5098]
2025-05-11 07:03:19,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [398.0, 366.0, 232.0, 30.0, 232.0, 156.0, 124.0, 128.0, 104.0, 115.0]
2025-05-11 07:03:19,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 36 minutes, 32 seconds)
2025-05-11 07:07:44,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:07:49,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 960.58527 ± 661.653
2025-05-11 07:07:49,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1059.7366, 878.1526, 1966.094, 931.8932, 394.9321, 328.58267, 146.91177, 1584.3136, 2023.9929, 291.24307]
2025-05-11 07:07:49,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [229.0, 175.0, 394.0, 178.0, 87.0, 65.0, 28.0, 342.0, 397.0, 63.0]
2025-05-11 07:07:49,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 33 minutes, 45 seconds)
2025-05-11 07:12:01,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:12:05,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 974.13232 ± 608.948
2025-05-11 07:12:05,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [223.53395, 1034.7981, 756.1403, 951.6865, 1862.5469, 776.6795, 1199.797, 2173.3953, 190.42471, 572.3201]
2025-05-11 07:12:05,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [43.0, 208.0, 159.0, 190.0, 367.0, 147.0, 232.0, 426.0, 38.0, 119.0]
2025-05-11 07:12:05,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 28 minutes, 6 seconds)
2025-05-11 07:16:24,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:16:30,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1203.15601 ± 546.094
2025-05-11 07:16:30,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1552.8549, 668.97266, 787.8147, 1448.2263, 1009.9794, 2604.6482, 942.47687, 983.7617, 738.34393, 1294.4819]
2025-05-11 07:16:30,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [305.0, 128.0, 148.0, 281.0, 194.0, 551.0, 198.0, 211.0, 150.0, 275.0]
2025-05-11 07:16:30,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (1203.16) for latency ExtremeClogL1U23
2025-05-11 07:16:30,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 07:16:30,884 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 07:16:30,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 24 minutes, 37 seconds)
2025-05-11 07:20:43,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:20:48,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1106.15515 ± 737.770
2025-05-11 07:20:48,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [977.86237, 424.96967, 925.0294, 3023.5435, 906.0661, 560.3316, 886.7806, 631.6106, 857.4564, 1867.9021]
2025-05-11 07:20:48,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [189.0, 80.0, 181.0, 592.0, 182.0, 113.0, 174.0, 123.0, 168.0, 366.0]
2025-05-11 07:20:48,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 20 minutes, 28 seconds)
2025-05-11 07:25:11,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:25:16,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1155.34399 ± 472.513
2025-05-11 07:25:16,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [556.52155, 523.9846, 2166.043, 1187.8682, 1078.984, 1263.7507, 1650.2015, 800.306, 983.6169, 1342.1633]
2025-05-11 07:25:16,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 100.0, 434.0, 250.0, 209.0, 251.0, 354.0, 162.0, 187.0, 267.0]
2025-05-11 07:25:16,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 17 minutes, 33 seconds)
2025-05-11 07:29:31,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:29:37,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1175.57581 ± 786.752
2025-05-11 07:29:37,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [863.3857, 630.2399, 2892.5647, 145.32335, 140.34276, 1976.7579, 1392.606, 1203.0225, 1330.4341, 1181.0817]
2025-05-11 07:29:37,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [179.0, 135.0, 585.0, 28.0, 27.0, 391.0, 273.0, 238.0, 255.0, 232.0]
2025-05-11 07:29:37,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 11 minutes, 50 seconds)
2025-05-11 07:34:01,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:34:08,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1394.47534 ± 863.139
2025-05-11 07:34:08,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [140.38019, 1524.9712, 1494.9089, 566.97375, 1216.1385, 588.373, 1759.5391, 1324.3818, 3412.717, 1916.3699]
2025-05-11 07:34:08,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 291.0, 289.0, 115.0, 248.0, 110.0, 341.0, 254.0, 690.0, 377.0]
2025-05-11 07:34:08,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (1394.48) for latency ExtremeClogL1U23
2025-05-11 07:34:08,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 07:34:08,429 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 07:34:08,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 3 hours, 9 minutes, 33 seconds)
2025-05-11 07:38:38,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:38:44,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1104.35059 ± 725.870
2025-05-11 07:38:44,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [827.96484, 1610.673, 145.49446, 619.85065, 756.22485, 1533.5071, 1558.5477, 2769.9417, 676.3678, 544.9334]
2025-05-11 07:38:44,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 308.0, 28.0, 119.0, 142.0, 301.0, 301.0, 542.0, 133.0, 98.0]
2025-05-11 07:38:44,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 3 hours, 6 minutes, 40 seconds)
2025-05-11 07:42:55,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:42:59,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1062.14624 ± 503.180
2025-05-11 07:42:59,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1409.2739, 799.3403, 632.219, 564.1954, 1294.8058, 243.72935, 1003.00336, 2096.2146, 1220.1249, 1358.5566]
2025-05-11 07:42:59,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [279.0, 167.0, 123.0, 125.0, 255.0, 51.0, 196.0, 416.0, 235.0, 265.0]
2025-05-11 07:42:59,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 3 hours, 1 minute, 54 seconds)
2025-05-11 07:47:09,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:47:15,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1216.90625 ± 1304.406
2025-05-11 07:47:15,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [273.80618, 457.8328, 851.76373, 733.3884, 4893.2715, 530.7607, 924.1383, 592.4309, 905.5284, 2006.1418]
2025-05-11 07:47:15,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [57.0, 85.0, 173.0, 153.0, 1000.0, 102.0, 188.0, 113.0, 176.0, 414.0]
2025-05-11 07:47:15,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 55 minutes, 47 seconds)
2025-05-11 07:51:31,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:51:35,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1062.57776 ± 610.366
2025-05-11 07:51:35,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [341.16333, 1897.1342, 1629.4424, 860.76575, 910.4509, 1016.09045, 317.8986, 2072.396, 343.82593, 1236.6095]
2025-05-11 07:51:35,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 375.0, 317.0, 167.0, 181.0, 199.0, 59.0, 402.0, 65.0, 246.0]
2025-05-11 07:51:35,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 51 minutes, 24 seconds)
2025-05-11 07:55:55,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:56:01,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1317.38855 ± 693.356
2025-05-11 07:56:01,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2560.205, 1445.3702, 205.98914, 2120.004, 1047.4885, 621.9647, 2022.7455, 1261.904, 818.9787, 1069.2365]
2025-05-11 07:56:01,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [511.0, 283.0, 40.0, 424.0, 205.0, 116.0, 389.0, 246.0, 166.0, 207.0]
2025-05-11 07:56:01,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 46 minutes, 17 seconds)
2025-05-11 08:00:02,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:00:08,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1214.68616 ± 556.175
2025-05-11 08:00:08,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2187.249, 489.85532, 1133.0625, 1082.5995, 542.44086, 577.9155, 1439.4531, 2004.1659, 1217.0833, 1473.036]
2025-05-11 08:00:08,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [442.0, 95.0, 222.0, 229.0, 104.0, 110.0, 287.0, 407.0, 264.0, 291.0]
2025-05-11 08:00:08,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 38 minutes, 19 seconds)
2025-05-11 08:04:19,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:04:24,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 954.29358 ± 702.019
2025-05-11 08:04:24,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [808.71466, 462.7207, 1228.873, 1208.9967, 2531.9683, 123.07867, 878.60583, 1652.6818, 491.4672, 155.82932]
2025-05-11 08:04:24,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 85.0, 247.0, 237.0, 515.0, 26.0, 176.0, 335.0, 98.0, 30.0]
2025-05-11 08:04:24,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 34 minutes, 6 seconds)
2025-05-11 08:08:43,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:08:47,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 805.27307 ± 958.696
2025-05-11 08:08:47,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [377.3415, 2803.16, 145.53593, 150.43555, 150.40205, 135.5158, 212.10736, 2320.6382, 1433.6895, 323.9047]
2025-05-11 08:08:47,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 559.0, 28.0, 29.0, 29.0, 26.0, 41.0, 487.0, 278.0, 66.0]
2025-05-11 08:08:47,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 30 minutes, 45 seconds)
2025-05-11 08:12:52,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:13:00,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1592.93994 ± 1447.193
2025-05-11 08:13:00,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [170.00224, 1345.5513, 3548.3528, 809.0448, 1045.1309, 166.51015, 1175.2883, 1058.4054, 5002.396, 1608.7179]
2025-05-11 08:13:00,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 283.0, 707.0, 164.0, 211.0, 32.0, 231.0, 211.0, 1000.0, 334.0]
2025-05-11 08:13:00,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (1592.94) for latency ExtremeClogL1U23
2025-05-11 08:13:00,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 08:13:00,309 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 08:13:00,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 25 minutes, 33 seconds)
2025-05-11 08:17:15,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:17:28,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2469.95703 ± 1375.232
2025-05-11 08:17:28,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1220.173, 2326.1052, 1639.444, 4006.061, 2736.4072, 993.3766, 690.7645, 4056.4443, 2073.012, 4957.7837]
2025-05-11 08:17:28,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [238.0, 463.0, 323.0, 799.0, 575.0, 208.0, 131.0, 816.0, 418.0, 1000.0]
2025-05-11 08:17:28,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (2469.96) for latency ExtremeClogL1U23
2025-05-11 08:17:28,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 08:17:28,641 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 08:17:28,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 21 minutes, 36 seconds)
2025-05-11 08:21:42,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:21:50,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1580.21130 ± 1293.836
2025-05-11 08:21:50,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1987.0295, 4902.9443, 1624.6117, 1182.6753, 2442.863, 235.79779, 846.9962, 628.67615, 406.27316, 1544.2457]
2025-05-11 08:21:50,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [420.0, 1000.0, 332.0, 244.0, 496.0, 46.0, 173.0, 127.0, 74.0, 298.0]
2025-05-11 08:21:50,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 18 minutes, 56 seconds)
2025-05-11 08:26:03,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:26:09,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1380.00854 ± 964.644
2025-05-11 08:26:09,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [185.71135, 3676.1624, 1231.7429, 1793.2677, 498.4274, 1608.7467, 352.99994, 1603.4357, 1912.5271, 937.0633]
2025-05-11 08:26:09,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 725.0, 240.0, 350.0, 101.0, 327.0, 70.0, 319.0, 373.0, 186.0]
2025-05-11 08:26:09,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 14 minutes, 56 seconds)
2025-05-11 08:30:24,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:30:30,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1270.26318 ± 713.816
2025-05-11 08:30:30,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [585.7216, 1193.1129, 1010.6577, 2198.1958, 1523.0001, 2632.3247, 485.9758, 342.46857, 1739.1877, 991.9873]
2025-05-11 08:30:30,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 253.0, 197.0, 434.0, 302.0, 511.0, 102.0, 63.0, 355.0, 211.0]
2025-05-11 08:30:30,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 10 minutes, 20 seconds)
2025-05-11 08:34:52,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:35:01,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1674.80859 ± 1267.378
2025-05-11 08:35:01,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [270.50626, 4964.004, 1643.3314, 1299.8138, 1126.6112, 2652.394, 1480.7258, 1138.2377, 432.2549, 1740.2064]
2025-05-11 08:35:01,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 1000.0, 331.0, 252.0, 224.0, 528.0, 311.0, 221.0, 80.0, 341.0]
2025-05-11 08:35:01,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 7 minutes, 42 seconds)
2025-05-11 08:39:17,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:39:26,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1782.98755 ± 1332.290
2025-05-11 08:39:26,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [951.5821, 1812.3618, 534.381, 3878.1873, 1946.5679, 4262.84, 655.6739, 2508.9277, 1149.786, 129.56754]
2025-05-11 08:39:26,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [183.0, 359.0, 105.0, 784.0, 408.0, 862.0, 124.0, 502.0, 233.0, 25.0]
2025-05-11 08:39:26,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 2 hours, 3 minutes)
2025-05-11 08:43:52,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:43:57,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 999.99396 ± 497.052
2025-05-11 08:43:57,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2158.6326, 1118.4545, 362.54047, 1137.1898, 1338.3439, 544.9682, 1150.7546, 447.18195, 817.14374, 924.7293]
2025-05-11 08:43:57,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [427.0, 236.0, 72.0, 224.0, 270.0, 106.0, 223.0, 82.0, 163.0, 182.0]
2025-05-11 08:43:57,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 59 minutes, 26 seconds)
2025-05-11 08:48:03,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:48:09,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1120.81824 ± 1020.396
2025-05-11 08:48:09,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1452.4719, 137.84747, 917.2919, 3342.7861, 499.58356, 1341.3333, 416.10022, 154.65715, 413.6939, 2532.4167]
2025-05-11 08:48:09,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [286.0, 29.0, 175.0, 667.0, 90.0, 263.0, 91.0, 30.0, 76.0, 513.0]
2025-05-11 08:48:09,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 54 minutes, 20 seconds)
2025-05-11 08:52:26,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:52:30,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1026.42480 ± 531.804
2025-05-11 08:52:30,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2125.7542, 1244.0045, 456.06784, 1728.3464, 555.5488, 723.394, 1294.7148, 674.23376, 918.5427, 543.6406]
2025-05-11 08:52:30,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [409.0, 237.0, 98.0, 344.0, 111.0, 143.0, 252.0, 137.0, 195.0, 103.0]
2025-05-11 08:52:30,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 50 minutes)
2025-05-11 08:56:57,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:57:03,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1297.74329 ± 894.019
2025-05-11 08:57:03,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [963.41016, 1963.9823, 701.96655, 1174.8958, 155.3483, 1908.2031, 91.288345, 2501.799, 2773.1138, 743.4244]
2025-05-11 08:57:03,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [209.0, 382.0, 136.0, 234.0, 30.0, 383.0, 18.0, 488.0, 550.0, 145.0]
2025-05-11 08:57:03,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 45 minutes, 44 seconds)
2025-05-11 09:01:15,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:01:27,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2255.17700 ± 1600.561
2025-05-11 09:01:27,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [4404.636, 2905.2188, 4012.547, 2286.9004, 704.04004, 916.5888, 915.7578, 761.3452, 4898.305, 746.42883]
2025-05-11 09:01:27,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [902.0, 580.0, 812.0, 465.0, 140.0, 185.0, 179.0, 147.0, 1000.0, 141.0]
2025-05-11 09:01:27,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 41 minutes, 16 seconds)
2025-05-11 09:05:43,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:05:51,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1379.60120 ± 1393.227
2025-05-11 09:05:51,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1521.1296, 187.74908, 4989.3306, 1524.684, 539.961, 2540.9495, 557.2392, 541.5542, 1223.0435, 170.37062]
2025-05-11 09:05:51,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [298.0, 36.0, 1000.0, 307.0, 100.0, 524.0, 111.0, 117.0, 249.0, 33.0]
2025-05-11 09:05:51,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 36 minutes, 19 seconds)
2025-05-11 09:10:12,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:10:23,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2104.81201 ± 1230.124
2025-05-11 09:10:23,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2726.5986, 4219.337, 3505.0088, 480.56277, 3398.4749, 574.4093, 1019.16235, 1639.6992, 1775.5265, 1709.3406]
2025-05-11 09:10:23,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [570.0, 859.0, 703.0, 98.0, 677.0, 111.0, 209.0, 317.0, 360.0, 342.0]
2025-05-11 09:10:23,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 33 minutes, 25 seconds)
2025-05-11 09:14:55,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:15:02,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1326.70337 ± 1024.745
2025-05-11 09:15:02,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1459.0057, 3573.2202, 2243.4749, 564.866, 1135.254, 604.80005, 73.1954, 172.289, 1371.2098, 2069.7185]
2025-05-11 09:15:02,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [287.0, 709.0, 447.0, 110.0, 222.0, 115.0, 15.0, 33.0, 270.0, 401.0]
2025-05-11 09:15:02,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 30 minutes, 5 seconds)
2025-05-11 09:19:23,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:19:33,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1953.32385 ± 1530.577
2025-05-11 09:19:33,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [5036.071, 3418.3137, 3662.9937, 784.0361, 156.6672, 1844.0413, 219.2701, 1254.7781, 2063.91, 1093.1583]
2025-05-11 09:19:33,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 684.0, 730.0, 155.0, 30.0, 373.0, 43.0, 241.0, 405.0, 212.0]
2025-05-11 09:19:33,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 25 minutes, 30 seconds)
2025-05-11 09:23:49,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:23:56,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1462.54077 ± 741.640
2025-05-11 09:23:56,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [570.7547, 1059.2435, 1719.8851, 2418.1028, 625.0005, 2377.2537, 2645.5183, 951.2985, 862.9113, 1395.4392]
2025-05-11 09:23:56,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 221.0, 351.0, 483.0, 127.0, 487.0, 534.0, 189.0, 164.0, 261.0]
2025-05-11 09:23:56,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 20 minutes, 57 seconds)
2025-05-11 09:28:13,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:28:21,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1696.77869 ± 1100.299
2025-05-11 09:28:21,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1353.1477, 631.0979, 684.95953, 1042.7227, 191.91913, 3200.323, 2292.3977, 1865.4922, 3806.7273, 1898.9996]
2025-05-11 09:28:21,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [274.0, 120.0, 136.0, 210.0, 37.0, 635.0, 473.0, 378.0, 773.0, 396.0]
2025-05-11 09:28:21,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 16 minutes, 32 seconds)
2025-05-11 09:32:46,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:32:58,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2215.99072 ± 1391.235
2025-05-11 09:32:58,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [3407.4685, 1432.7421, 3269.277, 101.75808, 546.4389, 1866.3822, 1722.979, 3211.5015, 4920.2515, 1681.1082]
2025-05-11 09:32:58,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [679.0, 285.0, 649.0, 22.0, 107.0, 370.0, 340.0, 661.0, 1000.0, 348.0]
2025-05-11 09:32:58,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 12 minutes, 15 seconds)
2025-05-11 09:37:07,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:37:13,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1223.11108 ± 1230.470
2025-05-11 09:37:13,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2707.558, 336.62778, 1205.5714, 583.0291, 733.64056, 73.15395, 4226.8154, 161.319, 1099.5276, 1103.8684]
2025-05-11 09:37:13,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [533.0, 76.0, 243.0, 124.0, 142.0, 15.0, 871.0, 31.0, 215.0, 219.0]
2025-05-11 09:37:13,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 6 minutes, 34 seconds)
2025-05-11 09:41:34,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:41:41,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1467.34070 ± 1092.625
2025-05-11 09:41:41,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [582.5356, 2470.3472, 1898.4045, 634.8708, 1172.9108, 4251.553, 712.2172, 1296.2678, 746.9479, 907.35144]
2025-05-11 09:41:41,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 484.0, 380.0, 122.0, 227.0, 851.0, 131.0, 245.0, 143.0, 179.0]
2025-05-11 09:41:41,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 1 hour, 1 minute, 57 seconds)
2025-05-11 09:45:40,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:45:49,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1847.78674 ± 921.480
2025-05-11 09:45:49,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1589.6849, 1016.44794, 1885.8956, 3348.2114, 1906.7903, 2814.3276, 653.4145, 625.1725, 3086.3564, 1551.5664]
2025-05-11 09:45:49,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [306.0, 196.0, 379.0, 672.0, 402.0, 569.0, 131.0, 127.0, 618.0, 313.0]
2025-05-11 09:45:49,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 56 minutes, 53 seconds)
2025-05-11 09:50:02,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:50:15,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2370.72144 ± 1496.532
2025-05-11 09:50:15,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2099.9766, 558.24054, 868.62476, 1608.9647, 3881.0994, 4773.5435, 1821.6813, 4929.8906, 1419.4178, 1745.7748]
2025-05-11 09:50:15,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [429.0, 115.0, 182.0, 326.0, 787.0, 961.0, 368.0, 1000.0, 274.0, 348.0]
2025-05-11 09:50:15,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 52 minutes, 31 seconds)
2025-05-11 09:54:27,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:54:31,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 950.95282 ± 732.898
2025-05-11 09:54:31,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2958.4377, 941.2896, 1171.3691, 914.6328, 1011.8531, 79.49933, 572.2268, 426.80865, 703.44, 729.9703]
2025-05-11 09:54:31,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [607.0, 186.0, 228.0, 196.0, 198.0, 16.0, 115.0, 77.0, 139.0, 136.0]
2025-05-11 09:54:31,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 47 minutes, 25 seconds)
2025-05-11 09:58:43,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:58:52,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1972.64673 ± 1104.802
2025-05-11 09:58:52,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2879.316, 918.61255, 757.8746, 4472.6846, 1458.6566, 1953.5814, 2873.4868, 1438.5211, 2084.2197, 889.51447]
2025-05-11 09:58:52,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [563.0, 170.0, 148.0, 895.0, 291.0, 393.0, 570.0, 292.0, 409.0, 180.0]
2025-05-11 09:58:53,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 43 minutes, 18 seconds)
2025-05-11 10:03:07,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:03:14,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1507.60767 ± 744.647
2025-05-11 10:03:14,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1411.9323, 2808.0564, 1453.8582, 2359.6946, 1498.2019, 1467.1674, 2228.848, 987.1673, 392.05154, 469.0985]
2025-05-11 10:03:14,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [279.0, 572.0, 285.0, 455.0, 304.0, 279.0, 439.0, 203.0, 78.0, 88.0]
2025-05-11 10:03:14,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 38 minutes, 48 seconds)
2025-05-11 10:07:43,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:07:50,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1358.44189 ± 925.297
2025-05-11 10:07:50,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1274.5594, 890.7982, 1372.3066, 167.47624, 1398.4487, 3565.9915, 663.7155, 1020.0646, 2425.8438, 805.2136]
2025-05-11 10:07:50,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [253.0, 177.0, 276.0, 33.0, 274.0, 713.0, 136.0, 200.0, 490.0, 156.0]
2025-05-11 10:07:50,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 35 minutes, 12 seconds)
2025-05-11 10:12:05,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:12:12,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1487.50964 ± 1336.153
2025-05-11 10:12:12,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1946.778, 1141.0281, 1033.2147, 2428.8044, 4883.2217, 1691.0402, 215.59904, 407.661, 135.31851, 992.4303]
2025-05-11 10:12:12,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [385.0, 220.0, 201.0, 468.0, 1000.0, 357.0, 43.0, 85.0, 26.0, 201.0]
2025-05-11 10:12:12,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 30 minutes, 44 seconds)
2025-05-11 10:16:09,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:16:17,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1816.09351 ± 1679.528
2025-05-11 10:16:17,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [492.65286, 4997.874, 690.7537, 397.36343, 1607.283, 1958.4078, 4984.4175, 1511.2996, 1321.6423, 199.24094]
2025-05-11 10:16:17,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 1000.0, 136.0, 75.0, 311.0, 383.0, 1000.0, 306.0, 257.0, 39.0]
2025-05-11 10:16:17,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 26 minutes, 7 seconds)
2025-05-11 10:20:50,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:21:00,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1979.53711 ± 1486.456
2025-05-11 10:21:00,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2411.7507, 2112.6272, 2157.7483, 1583.0001, 1107.5314, 167.56914, 336.85028, 819.5009, 4098.8857, 4999.908]
2025-05-11 10:21:00,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [471.0, 429.0, 437.0, 309.0, 223.0, 32.0, 64.0, 176.0, 808.0, 1000.0]
2025-05-11 10:21:00,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 22 minutes, 7 seconds)
2025-05-11 10:25:08,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:25:17,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1631.96008 ± 999.157
2025-05-11 10:25:17,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [500.2687, 1297.5588, 830.5217, 1651.8765, 2410.9612, 3460.2156, 2952.3684, 1036.062, 1899.9066, 279.86096]
2025-05-11 10:25:17,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 254.0, 167.0, 329.0, 478.0, 690.0, 585.0, 212.0, 393.0, 51.0]
2025-05-11 10:25:17,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 17 minutes, 37 seconds)
2025-05-11 10:29:35,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:29:43,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1524.79761 ± 1165.242
2025-05-11 10:29:43,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [640.8261, 547.1317, 1185.1875, 386.812, 4184.1094, 1038.6187, 857.01825, 1948.8617, 1362.9746, 3096.4365]
2025-05-11 10:29:43,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 104.0, 240.0, 75.0, 833.0, 193.0, 171.0, 384.0, 275.0, 606.0]
2025-05-11 10:29:43,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 13 minutes, 7 seconds)
2025-05-11 10:33:53,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:34:02,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1672.34570 ± 1356.188
2025-05-11 10:34:02,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1024.1842, 4899.633, 1432.0635, 2126.4016, 577.3741, 160.77052, 280.65173, 2993.6096, 1859.3109, 1369.4592]
2025-05-11 10:34:02,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [207.0, 1000.0, 283.0, 419.0, 115.0, 31.0, 57.0, 593.0, 373.0, 270.0]
2025-05-11 10:34:02,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 8 minutes, 44 seconds)
2025-05-11 10:38:18,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:38:30,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2139.49487 ± 1484.676
2025-05-11 10:38:30,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [291.5945, 4907.532, 1177.1393, 1047.4542, 1987.9634, 786.4266, 1126.5336, 4237.05, 3340.7273, 2492.5276]
2025-05-11 10:38:30,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 1000.0, 226.0, 209.0, 404.0, 154.0, 220.0, 851.0, 684.0, 502.0]
2025-05-11 10:38:30,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 26 seconds)
2025-05-11 10:42:45,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:42:52,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1444.96216 ± 711.424
2025-05-11 10:42:52,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [971.47107, 1747.0448, 2177.256, 852.2189, 1602.1571, 1643.0052, 1186.5894, 1290.1045, 129.83408, 2849.94]
2025-05-11 10:42:52,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [189.0, 347.0, 428.0, 163.0, 312.0, 323.0, 221.0, 250.0, 25.0, 562.0]
2025-05-11 10:42:52,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1251 [DEBUG]: Training session finished
