2025-05-07 16:38:15,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mem4
2025-05-07 16:38:15,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mem4
2025-05-07 16:38:15,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1008 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7b18b5fcca90>}
2025-05-07 16:38:15,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1009 [DEBUG]: using device: cpu
2025-05-07 16:38:15,477 baseline-bpql-noisy-humanoid:77 [WARNING]: args.assumed_delay != args.horizon: 4 != 32
2025-05-07 16:38:15,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1031 [INFO]: Creating new trainer
2025-05-07 16:38:15,490 baseline-bpql-noisy-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=444, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-05-07 16:38:15,491 baseline-bpql-noisy-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-07 16:38:18,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1092 [DEBUG]: Starting training session...
2025-05-07 16:38:18,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 1/100
2025-05-07 16:41:47,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:41:48,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 307.17810 ± 99.531
2025-05-07 16:41:48,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [290.88602, 494.16916, 303.6661, 422.17023, 227.479, 262.7996, 317.43588, 387.75064, 231.17079, 134.2537]
2025-05-07 16:41:48,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [57.0, 95.0, 60.0, 92.0, 47.0, 52.0, 60.0, 76.0, 48.0, 26.0]
2025-05-07 16:41:48,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (307.18) for latency ExtremeSparseL4U32
2025-05-07 16:41:48,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 16:41:48,377 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 16:41:48,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 46 minutes, 32 seconds)
2025-05-07 16:45:42,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:45:43,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 406.71649 ± 70.874
2025-05-07 16:45:43,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [326.02106, 331.09637, 445.94812, 449.81644, 471.7468, 420.27594, 476.27933, 454.7598, 257.35794, 433.86298]
2025-05-07 16:45:43,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [73.0, 69.0, 83.0, 86.0, 89.0, 90.0, 103.0, 85.0, 56.0, 83.0]
2025-05-07 16:45:43,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (406.72) for latency ExtremeSparseL4U32
2025-05-07 16:45:43,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 16:45:43,776 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 16:45:43,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 3 minutes, 45 seconds)
2025-05-07 16:49:38,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:49:39,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 418.68024 ± 112.873
2025-05-07 16:49:39,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [394.83575, 399.27734, 333.31357, 518.9448, 420.5503, 716.5256, 326.06775, 337.9218, 353.82724, 385.53812]
2025-05-07 16:49:39,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [77.0, 75.0, 66.0, 111.0, 88.0, 150.0, 61.0, 65.0, 67.0, 72.0]
2025-05-07 16:49:39,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (418.68) for latency ExtremeSparseL4U32
2025-05-07 16:49:39,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 16:49:39,876 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 16:49:39,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 4/100 (estimated time remaining: 6 hours, 7 minutes, 16 seconds)
2025-05-07 16:53:44,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:53:46,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 419.61597 ± 122.026
2025-05-07 16:53:46,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [343.91153, 388.71774, 329.52582, 672.8957, 426.10632, 490.13528, 190.34547, 418.78906, 529.9735, 405.75922]
2025-05-07 16:53:46,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [76.0, 85.0, 74.0, 142.0, 93.0, 101.0, 37.0, 78.0, 106.0, 78.0]
2025-05-07 16:53:46,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (419.62) for latency ExtremeSparseL4U32
2025-05-07 16:53:46,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 16:53:46,024 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 16:53:46,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 5/100 (estimated time remaining: 6 hours, 11 minutes, 4 seconds)
2025-05-07 16:57:50,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:57:51,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 295.33807 ± 133.202
2025-05-07 16:57:51,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [352.90082, 126.686386, 523.10284, 425.30908, 322.88107, 280.4822, 435.75134, 130.02158, 204.5675, 151.67831]
2025-05-07 16:57:51,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [68.0, 27.0, 104.0, 82.0, 63.0, 55.0, 86.0, 25.0, 40.0, 29.0]
2025-05-07 16:57:51,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 6/100 (estimated time remaining: 6 hours, 11 minutes, 30 seconds)
2025-05-07 17:01:56,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:01:58,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 412.79791 ± 144.822
2025-05-07 17:01:58,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [648.35034, 421.11993, 439.7773, 365.15384, 480.07205, 320.30545, 104.20191, 610.1671, 378.09964, 360.73132]
2025-05-07 17:01:58,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [137.0, 76.0, 92.0, 69.0, 91.0, 59.0, 22.0, 131.0, 69.0, 68.0]
2025-05-07 17:01:58,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 7/100 (estimated time remaining: 6 hours, 19 minutes, 2 seconds)
2025-05-07 17:06:03,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:06:04,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 367.86060 ± 148.229
2025-05-07 17:06:04,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [430.0625, 399.36948, 702.9288, 460.6169, 150.82353, 166.64456, 292.8289, 369.3178, 365.5979, 340.41547]
2025-05-07 17:06:04,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [82.0, 76.0, 149.0, 88.0, 29.0, 32.0, 55.0, 70.0, 68.0, 64.0]
2025-05-07 17:06:04,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 8/100 (estimated time remaining: 6 hours, 18 minutes, 29 seconds)
2025-05-07 17:10:18,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:10:20,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 462.55850 ± 150.822
2025-05-07 17:10:20,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [145.66443, 581.50684, 401.862, 652.70575, 468.85327, 367.0088, 496.44226, 380.6317, 433.43765, 697.4722]
2025-05-07 17:10:20,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [28.0, 124.0, 76.0, 122.0, 86.0, 75.0, 93.0, 72.0, 80.0, 135.0]
2025-05-07 17:10:20,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (462.56) for latency ExtremeSparseL4U32
2025-05-07 17:10:20,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 17:10:20,926 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 17:10:20,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 9/100 (estimated time remaining: 6 hours, 20 minutes, 35 seconds)
2025-05-07 17:14:29,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:14:31,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 570.77454 ± 147.940
2025-05-07 17:14:31,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [439.96738, 472.83585, 735.7749, 565.3435, 405.12338, 532.61084, 728.50104, 350.2657, 683.2778, 794.0443]
2025-05-07 17:14:31,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [81.0, 97.0, 142.0, 106.0, 86.0, 101.0, 153.0, 78.0, 131.0, 155.0]
2025-05-07 17:14:31,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (570.77) for latency ExtremeSparseL4U32
2025-05-07 17:14:31,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 17:14:31,628 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 17:14:31,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 17 minutes, 49 seconds)
2025-05-07 17:18:33,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:18:35,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 534.81897 ± 120.612
2025-05-07 17:18:35,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [712.88806, 575.7723, 324.313, 479.88736, 718.1262, 635.92, 434.04614, 474.66635, 450.51645, 542.0534]
2025-05-07 17:18:35,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [137.0, 110.0, 72.0, 91.0, 140.0, 120.0, 82.0, 100.0, 98.0, 116.0]
2025-05-07 17:18:35,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 11/100 (estimated time remaining: 6 hours, 13 minutes, 13 seconds)
2025-05-07 17:22:48,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:22:49,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 410.85809 ± 120.251
2025-05-07 17:22:49,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [469.51343, 507.58865, 410.56476, 197.43044, 170.10504, 533.7673, 437.87192, 424.5145, 439.32172, 517.90314]
2025-05-07 17:22:49,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [88.0, 95.0, 78.0, 38.0, 33.0, 116.0, 81.0, 78.0, 82.0, 97.0]
2025-05-07 17:22:49,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 12/100 (estimated time remaining: 6 hours, 11 minutes, 20 seconds)
2025-05-07 17:26:57,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:26:59,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 447.81055 ± 151.561
2025-05-07 17:26:59,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [301.72034, 399.54126, 480.10898, 559.34155, 640.43915, 290.5423, 154.06741, 484.8942, 638.9844, 528.4658]
2025-05-07 17:26:59,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [60.0, 88.0, 90.0, 122.0, 124.0, 57.0, 30.0, 94.0, 123.0, 100.0]
2025-05-07 17:26:59,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 13/100 (estimated time remaining: 6 hours, 8 minutes)
2025-05-07 17:31:01,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:31:04,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 524.59021 ± 151.209
2025-05-07 17:31:04,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [726.00464, 457.35953, 433.89163, 556.8027, 484.19357, 310.1737, 762.5961, 324.80585, 695.52856, 494.54584]
2025-05-07 17:31:04,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [143.0, 96.0, 80.0, 106.0, 102.0, 60.0, 146.0, 67.0, 135.0, 93.0]
2025-05-07 17:31:04,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 14/100 (estimated time remaining: 6 hours, 32 seconds)
2025-05-07 17:35:07,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:35:09,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 505.87384 ± 115.166
2025-05-07 17:35:09,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [489.41718, 602.02136, 534.10626, 667.5586, 528.8167, 539.4382, 490.7914, 200.75505, 478.6117, 527.222]
2025-05-07 17:35:09,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [90.0, 114.0, 99.0, 128.0, 98.0, 115.0, 93.0, 40.0, 87.0, 97.0]
2025-05-07 17:35:09,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 54 minutes, 51 seconds)
2025-05-07 17:39:13,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:39:15,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 361.62952 ± 107.650
2025-05-07 17:39:15,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [576.97003, 317.99994, 263.84686, 354.31107, 266.8113, 263.89395, 371.81277, 372.992, 543.7716, 283.88544]
2025-05-07 17:39:15,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [123.0, 68.0, 52.0, 77.0, 53.0, 55.0, 72.0, 80.0, 102.0, 60.0]
2025-05-07 17:39:15,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 51 minutes, 10 seconds)
2025-05-07 17:43:17,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:43:19,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 511.16080 ± 269.517
2025-05-07 17:43:19,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [576.04987, 445.02713, 664.9174, 350.10043, 333.54654, 590.4045, 658.2386, 1134.529, 174.47197, 184.32213]
2025-05-07 17:43:19,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [108.0, 89.0, 127.0, 66.0, 68.0, 112.0, 124.0, 222.0, 34.0, 36.0]
2025-05-07 17:43:19,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 44 minutes, 16 seconds)
2025-05-07 17:47:23,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:47:25,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 458.47968 ± 56.330
2025-05-07 17:47:25,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [461.8154, 548.207, 376.62698, 459.02118, 387.04337, 514.31116, 388.3485, 513.745, 450.51074, 485.1673]
2025-05-07 17:47:25,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [86.0, 120.0, 69.0, 85.0, 71.0, 96.0, 72.0, 96.0, 84.0, 91.0]
2025-05-07 17:47:25,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 39 minutes, 9 seconds)
2025-05-07 17:51:32,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:51:34,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 537.64368 ± 73.364
2025-05-07 17:51:34,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [636.3437, 557.708, 470.1543, 511.4942, 464.9004, 433.61902, 566.60803, 594.51294, 660.23804, 480.85767]
2025-05-07 17:51:34,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [121.0, 112.0, 89.0, 95.0, 86.0, 81.0, 106.0, 112.0, 126.0, 89.0]
2025-05-07 17:51:34,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 36 minutes, 19 seconds)
2025-05-07 17:55:54,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:55:55,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 307.63626 ± 243.249
2025-05-07 17:55:55,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [185.91496, 150.94109, 414.8004, 982.2734, 413.1407, 176.52303, 181.99379, 193.1149, 177.39864, 200.26166]
2025-05-07 17:55:55,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [36.0, 29.0, 80.0, 192.0, 80.0, 34.0, 35.0, 38.0, 34.0, 39.0]
2025-05-07 17:55:55,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 36 minutes, 26 seconds)
2025-05-07 18:00:14,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:00:16,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 597.14374 ± 234.616
2025-05-07 18:00:16,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [632.9762, 790.69073, 651.71497, 235.32948, 289.9967, 712.7146, 556.21783, 408.25436, 1079.2976, 614.2451]
2025-05-07 18:00:16,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [120.0, 170.0, 125.0, 45.0, 59.0, 137.0, 103.0, 89.0, 228.0, 116.0]
2025-05-07 18:00:16,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (597.14) for latency ExtremeSparseL4U32
2025-05-07 18:00:16,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 18:00:16,987 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 18:00:16,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 36 minutes, 30 seconds)
2025-05-07 18:04:34,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:04:37,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 560.20886 ± 93.936
2025-05-07 18:04:37,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [540.4639, 413.02234, 550.37744, 515.1803, 718.51086, 680.3013, 569.19434, 439.54446, 652.6452, 522.8492]
2025-05-07 18:04:37,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [102.0, 76.0, 104.0, 96.0, 149.0, 127.0, 120.0, 81.0, 127.0, 98.0]
2025-05-07 18:04:37,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 36 minutes, 27 seconds)
2025-05-07 18:08:53,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:08:56,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 589.68695 ± 231.467
2025-05-07 18:08:56,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [651.31195, 201.7863, 139.59972, 643.002, 699.39044, 477.05408, 790.07587, 694.43024, 723.6836, 876.53564]
2025-05-07 18:08:56,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [122.0, 36.0, 27.0, 121.0, 132.0, 102.0, 151.0, 132.0, 155.0, 169.0]
2025-05-07 18:08:56,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 35 minutes, 39 seconds)
2025-05-07 18:13:10,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:13:12,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 463.53741 ± 133.043
2025-05-07 18:13:12,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [413.46466, 144.50143, 517.19366, 448.53952, 558.02625, 333.0959, 472.01617, 606.60284, 566.2464, 575.68726]
2025-05-07 18:13:12,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [78.0, 28.0, 104.0, 82.0, 117.0, 68.0, 97.0, 116.0, 105.0, 108.0]
2025-05-07 18:13:12,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 33 minutes, 6 seconds)
2025-05-07 18:17:25,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:17:28,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 573.41217 ± 199.135
2025-05-07 18:17:28,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [770.6996, 652.1548, 477.00723, 521.36725, 833.92096, 155.40364, 466.75803, 507.07907, 848.73834, 500.99283]
2025-05-07 18:17:28,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [149.0, 127.0, 90.0, 95.0, 158.0, 30.0, 100.0, 93.0, 164.0, 92.0]
2025-05-07 18:17:28,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 27 minutes, 32 seconds)
2025-05-07 18:21:38,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:21:40,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 505.18286 ± 216.598
2025-05-07 18:21:40,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [679.95374, 77.93064, 651.52545, 502.93478, 689.02966, 629.4782, 157.23116, 749.3746, 463.4996, 450.8713]
2025-05-07 18:21:40,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [140.0, 16.0, 125.0, 97.0, 129.0, 117.0, 30.0, 153.0, 85.0, 85.0]
2025-05-07 18:21:40,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 20 minutes, 50 seconds)
2025-05-07 18:25:50,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:25:53,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 552.34753 ± 159.564
2025-05-07 18:25:53,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [162.22871, 483.4633, 797.51514, 540.76996, 644.75085, 633.5181, 539.6932, 480.43393, 682.4566, 558.6461]
2025-05-07 18:25:53,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [31.0, 90.0, 149.0, 101.0, 121.0, 134.0, 99.0, 88.0, 129.0, 104.0]
2025-05-07 18:25:53,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 14 minutes, 49 seconds)
2025-05-07 18:30:06,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:30:09,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 639.46667 ± 91.631
2025-05-07 18:30:09,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [758.78394, 492.08447, 586.19684, 641.20966, 691.2852, 764.07794, 667.20264, 691.76184, 488.5627, 613.50183]
2025-05-07 18:30:09,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [146.0, 92.0, 110.0, 122.0, 131.0, 145.0, 127.0, 133.0, 90.0, 116.0]
2025-05-07 18:30:09,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (639.47) for latency ExtremeSparseL4U32
2025-05-07 18:30:09,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 18:30:09,584 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 18:30:09,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 9 minutes, 51 seconds)
2025-05-07 18:34:19,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:34:21,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 579.62903 ± 70.557
2025-05-07 18:34:21,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [706.97156, 553.99896, 495.62613, 598.28345, 476.65598, 623.89545, 514.4001, 545.6516, 631.86945, 648.93774]
2025-05-07 18:34:21,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [134.0, 107.0, 91.0, 111.0, 87.0, 117.0, 96.0, 102.0, 133.0, 124.0]
2025-05-07 18:34:21,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 29/100 (estimated time remaining: 5 hours, 4 minutes, 38 seconds)
2025-05-07 18:38:31,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:38:33,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 568.49945 ± 188.427
2025-05-07 18:38:33,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [604.50964, 566.6981, 326.94537, 711.40247, 774.9726, 153.94373, 522.0593, 606.8363, 802.3174, 615.30963]
2025-05-07 18:38:33,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [114.0, 107.0, 62.0, 132.0, 148.0, 32.0, 96.0, 113.0, 154.0, 115.0]
2025-05-07 18:38:33,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 59 minutes, 24 seconds)
2025-05-07 18:42:46,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:42:48,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 549.69019 ± 239.097
2025-05-07 18:42:48,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [860.5972, 570.5325, 329.25116, 528.30493, 751.0425, 388.2612, 227.00906, 292.56042, 565.56134, 983.78107]
2025-05-07 18:42:48,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [164.0, 105.0, 63.0, 97.0, 143.0, 75.0, 44.0, 56.0, 108.0, 190.0]
2025-05-07 18:42:48,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 56 minutes)
2025-05-07 18:47:03,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:47:06,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 598.33362 ± 337.652
2025-05-07 18:47:06,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [819.6102, 677.8465, 548.46497, 1127.0211, 627.8819, 153.43494, 145.6004, 140.23103, 1012.68524, 730.5603]
2025-05-07 18:47:06,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [154.0, 133.0, 114.0, 225.0, 133.0, 30.0, 28.0, 27.0, 195.0, 147.0]
2025-05-07 18:47:06,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 52 minutes, 53 seconds)
2025-05-07 18:51:26,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:51:29,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 611.46838 ± 213.122
2025-05-07 18:51:29,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [317.99493, 690.53455, 871.7011, 813.1737, 723.72797, 782.52655, 170.9189, 492.02917, 663.3806, 588.6963]
2025-05-07 18:51:29,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [61.0, 130.0, 167.0, 158.0, 156.0, 147.0, 33.0, 90.0, 140.0, 111.0]
2025-05-07 18:51:29,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 50 minutes, 2 seconds)
2025-05-07 18:55:51,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:55:55,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 725.19714 ± 171.039
2025-05-07 18:55:55,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [752.27094, 501.2035, 388.54, 620.2835, 789.551, 1004.1005, 877.11304, 716.855, 782.1172, 819.93726]
2025-05-07 18:55:55,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [138.0, 92.0, 80.0, 115.0, 148.0, 215.0, 164.0, 132.0, 154.0, 154.0]
2025-05-07 18:55:55,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (725.20) for latency ExtremeSparseL4U32
2025-05-07 18:55:55,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 18:55:55,327 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 18:55:55,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 48 minutes, 52 seconds)
2025-05-07 19:00:10,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:00:12,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 438.32794 ± 249.039
2025-05-07 19:00:12,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [130.40968, 595.7381, 716.9029, 459.7408, 650.8992, 779.1528, 583.5713, 150.72792, 170.34518, 145.79135]
2025-05-07 19:00:12,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [25.0, 111.0, 136.0, 91.0, 126.0, 153.0, 109.0, 29.0, 33.0, 28.0]
2025-05-07 19:00:12,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 45 minutes, 42 seconds)
2025-05-07 19:04:29,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:04:32,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 621.94495 ± 125.410
2025-05-07 19:04:32,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [662.6705, 478.1778, 541.5306, 723.90643, 688.17523, 789.7268, 562.1692, 483.34232, 820.53613, 469.2142]
2025-05-07 19:04:32,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [137.0, 90.0, 101.0, 136.0, 130.0, 150.0, 105.0, 89.0, 154.0, 92.0]
2025-05-07 19:04:32,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 42 minutes, 25 seconds)
2025-05-07 19:08:54,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:08:57,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 697.86426 ± 194.039
2025-05-07 19:08:57,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [402.7743, 539.4664, 985.99835, 918.11017, 765.2535, 986.94574, 553.1707, 604.13324, 607.5774, 615.2128]
2025-05-07 19:08:57,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [77.0, 115.0, 198.0, 169.0, 146.0, 184.0, 103.0, 112.0, 110.0, 114.0]
2025-05-07 19:08:57,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 39 minutes, 36 seconds)
2025-05-07 19:13:17,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:13:20,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 683.07410 ± 141.720
2025-05-07 19:13:20,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [649.28107, 660.05396, 897.8886, 684.4036, 730.811, 617.5983, 802.7493, 340.51138, 799.5041, 647.9399]
2025-05-07 19:13:20,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [123.0, 126.0, 173.0, 130.0, 140.0, 117.0, 153.0, 62.0, 153.0, 122.0]
2025-05-07 19:13:20,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 35 minutes, 26 seconds)
2025-05-07 19:17:46,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:17:49,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 650.40350 ± 274.352
2025-05-07 19:17:49,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [855.696, 554.3885, 403.05875, 762.4958, 915.8338, 693.83026, 1115.5958, 100.16118, 460.64456, 642.3308]
2025-05-07 19:17:49,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [176.0, 116.0, 75.0, 143.0, 172.0, 152.0, 216.0, 21.0, 88.0, 119.0]
2025-05-07 19:17:49,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 31 minutes, 41 seconds)
2025-05-07 19:22:10,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:22:13,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 648.33984 ± 75.861
2025-05-07 19:22:13,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [761.9303, 677.132, 601.9921, 731.15063, 573.6977, 506.68994, 620.56726, 708.0492, 601.9837, 700.2058]
2025-05-07 19:22:13,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [144.0, 128.0, 112.0, 137.0, 108.0, 93.0, 124.0, 133.0, 114.0, 130.0]
2025-05-07 19:22:13,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 28 minutes, 37 seconds)
2025-05-07 19:26:34,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:26:38,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 767.40417 ± 264.038
2025-05-07 19:26:38,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [738.83234, 1021.61694, 703.2464, 904.0094, 150.1801, 597.02655, 585.3883, 933.9238, 947.1947, 1092.623]
2025-05-07 19:26:38,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [157.0, 213.0, 134.0, 175.0, 29.0, 112.0, 111.0, 182.0, 210.0, 217.0]
2025-05-07 19:26:38,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (767.40) for latency ExtremeSparseL4U32
2025-05-07 19:26:38,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 19:26:38,511 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 19:26:38,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 25 minutes, 12 seconds)
2025-05-07 19:30:51,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:30:55,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 827.47443 ± 388.222
2025-05-07 19:30:55,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [573.32196, 696.1673, 1314.0543, 552.4044, 192.94199, 535.02277, 1329.0742, 1196.2317, 618.2799, 1267.2461]
2025-05-07 19:30:55,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [121.0, 149.0, 272.0, 104.0, 37.0, 105.0, 273.0, 245.0, 134.0, 251.0]
2025-05-07 19:30:55,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (827.47) for latency ExtremeSparseL4U32
2025-05-07 19:30:55,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 19:30:55,242 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 19:30:55,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 19 minutes, 9 seconds)
2025-05-07 19:35:10,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:35:14,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 873.40100 ± 570.568
2025-05-07 19:35:14,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [2299.4854, 797.24493, 874.94574, 822.97906, 125.47054, 695.3947, 995.18115, 735.17676, 182.17128, 1205.9596]
2025-05-07 19:35:14,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [453.0, 153.0, 180.0, 166.0, 24.0, 125.0, 191.0, 149.0, 35.0, 233.0]
2025-05-07 19:35:14,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (873.40) for latency ExtremeSparseL4U32
2025-05-07 19:35:14,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 19:35:14,391 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 19:35:14,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 13 minutes, 56 seconds)
2025-05-07 19:39:20,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:39:23,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 662.05206 ± 163.814
2025-05-07 19:39:23,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [867.066, 653.60657, 870.2504, 742.5641, 607.9488, 824.53467, 368.04263, 584.7715, 673.2463, 428.4904]
2025-05-07 19:39:23,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [162.0, 126.0, 170.0, 153.0, 120.0, 155.0, 74.0, 111.0, 125.0, 79.0]
2025-05-07 19:39:23,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 44/100 (estimated time remaining: 4 hours, 5 minutes, 45 seconds)
2025-05-07 19:43:39,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:43:43,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 880.14856 ± 433.155
2025-05-07 19:43:43,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [586.25415, 565.32166, 1441.4619, 996.0451, 563.27734, 282.21445, 457.772, 1629.2164, 1262.0254, 1017.8961]
2025-05-07 19:43:43,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [112.0, 102.0, 277.0, 184.0, 104.0, 53.0, 89.0, 316.0, 235.0, 206.0]
2025-05-07 19:43:43,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (880.15) for latency ExtremeSparseL4U32
2025-05-07 19:43:43,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 19:43:43,207 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 19:43:43,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 45/100 (estimated time remaining: 4 hours, 46 seconds)
2025-05-07 19:47:51,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:47:55,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 813.42236 ± 436.017
2025-05-07 19:47:55,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [979.4475, 1438.9274, 535.908, 223.87006, 409.41916, 1314.8267, 748.2802, 1221.7826, 1081.1224, 180.6393]
2025-05-07 19:47:55,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [185.0, 278.0, 102.0, 43.0, 75.0, 253.0, 139.0, 228.0, 205.0, 35.0]
2025-05-07 19:47:55,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 54 minutes, 3 seconds)
2025-05-07 19:52:10,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:52:14,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 956.45886 ± 460.619
2025-05-07 19:52:14,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [425.0258, 1522.7308, 1547.8494, 578.5314, 652.0525, 797.71, 1758.8046, 897.40656, 938.5157, 445.96213]
2025-05-07 19:52:14,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [86.0, 298.0, 297.0, 112.0, 120.0, 150.0, 355.0, 179.0, 179.0, 85.0]
2025-05-07 19:52:14,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (956.46) for latency ExtremeSparseL4U32
2025-05-07 19:52:14,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 19:52:14,834 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 19:52:14,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 50 minutes, 19 seconds)
2025-05-07 19:56:29,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:56:34,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1067.62048 ± 313.768
2025-05-07 19:56:34,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [891.80255, 855.65796, 1522.8478, 1629.1516, 1123.3451, 869.1579, 1208.4858, 655.83014, 1215.9222, 704.00397]
2025-05-07 19:56:34,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [169.0, 167.0, 285.0, 331.0, 222.0, 163.0, 220.0, 136.0, 229.0, 142.0]
2025-05-07 19:56:34,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (1067.62) for latency ExtremeSparseL4U32
2025-05-07 19:56:34,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 19:56:34,779 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 19:56:34,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 46 minutes, 12 seconds)
2025-05-07 20:00:47,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:00:50,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 665.74268 ± 421.247
2025-05-07 20:00:50,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [583.0964, 893.4504, 606.8651, 1515.3176, 1090.0729, 458.69058, 946.9857, 219.1985, 146.14558, 197.60393]
2025-05-07 20:00:50,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [111.0, 175.0, 117.0, 291.0, 202.0, 86.0, 182.0, 42.0, 28.0, 38.0]
2025-05-07 20:00:50,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 43 minutes, 10 seconds)
2025-05-07 20:05:05,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:05:09,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 965.60486 ± 619.712
2025-05-07 20:05:09,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [1885.2648, 1680.0485, 545.3652, 140.6114, 299.54517, 1648.6443, 1042.3799, 1015.6669, 167.0261, 1231.4962]
2025-05-07 20:05:09,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [365.0, 317.0, 103.0, 27.0, 59.0, 317.0, 194.0, 191.0, 32.0, 238.0]
2025-05-07 20:05:09,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 38 minutes, 41 seconds)
2025-05-07 20:09:18,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:09:24,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1188.97266 ± 521.310
2025-05-07 20:09:24,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [759.5481, 1886.2427, 858.7277, 1584.8516, 1197.9796, 668.8785, 637.4793, 2258.708, 972.85583, 1064.4552]
2025-05-07 20:09:24,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [158.0, 379.0, 166.0, 307.0, 228.0, 139.0, 120.0, 434.0, 195.0, 208.0]
2025-05-07 20:09:24,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (1188.97) for latency ExtremeSparseL4U32
2025-05-07 20:09:24,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 20:09:24,411 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 20:09:24,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 34 minutes, 52 seconds)
2025-05-07 20:13:42,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:13:46,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 947.36835 ± 446.047
2025-05-07 20:13:46,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [692.59625, 976.1213, 909.28937, 905.6625, 2044.0315, 626.9964, 1204.428, 224.4056, 829.7869, 1060.3657]
2025-05-07 20:13:46,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [130.0, 186.0, 175.0, 177.0, 388.0, 116.0, 237.0, 43.0, 156.0, 203.0]
2025-05-07 20:13:46,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 30 minutes, 59 seconds)
2025-05-07 20:17:49,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:17:52,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 636.74597 ± 480.438
2025-05-07 20:17:52,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [145.38657, 170.20831, 199.76672, 175.81578, 1482.6416, 440.9981, 687.74115, 679.64996, 1006.3456, 1378.9065]
2025-05-07 20:17:52,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [28.0, 33.0, 38.0, 34.0, 287.0, 82.0, 134.0, 130.0, 196.0, 262.0]
2025-05-07 20:17:52,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 24 minutes, 23 seconds)
2025-05-07 20:22:14,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:22:18,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 969.72449 ± 426.563
2025-05-07 20:22:18,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [1096.441, 438.75677, 864.39813, 1175.9865, 714.86633, 865.85754, 897.37885, 630.23065, 2096.1975, 917.1322]
2025-05-07 20:22:18,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [209.0, 96.0, 175.0, 247.0, 136.0, 164.0, 172.0, 133.0, 407.0, 172.0]
2025-05-07 20:22:18,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 21 minutes, 45 seconds)
2025-05-07 20:26:19,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:26:23,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 976.97186 ± 442.627
2025-05-07 20:26:23,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [1062.2933, 414.86743, 805.3038, 1384.8492, 1284.8077, 125.594765, 1606.6063, 1236.5315, 1199.3655, 649.49927]
2025-05-07 20:26:23,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [204.0, 77.0, 153.0, 262.0, 247.0, 24.0, 308.0, 235.0, 231.0, 123.0]
2025-05-07 20:26:23,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 15 minutes, 22 seconds)
2025-05-07 20:30:35,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:30:38,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 685.61078 ± 419.686
2025-05-07 20:30:38,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [618.1197, 1665.0272, 435.20526, 548.0889, 989.3574, 220.9361, 840.78345, 906.3304, 492.08997, 140.16927]
2025-05-07 20:30:38,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [119.0, 317.0, 83.0, 106.0, 201.0, 43.0, 162.0, 176.0, 90.0, 27.0]
2025-05-07 20:30:38,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 11 minutes, 4 seconds)
2025-05-07 20:34:51,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:34:55,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 687.83856 ± 292.781
2025-05-07 20:34:55,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [1272.3544, 514.6983, 886.18524, 423.3767, 736.99615, 693.5367, 149.5649, 513.46783, 794.80884, 893.3965]
2025-05-07 20:34:55,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [242.0, 99.0, 171.0, 89.0, 136.0, 137.0, 29.0, 100.0, 153.0, 174.0]
2025-05-07 20:34:55,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 6 minutes, 1 second)
2025-05-07 20:39:11,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:39:17,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1216.59839 ± 497.509
2025-05-07 20:39:17,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [880.3294, 1018.9973, 1118.4995, 973.4713, 1186.5632, 796.4445, 2502.4888, 985.88116, 943.2206, 1760.0883]
2025-05-07 20:39:17,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [184.0, 197.0, 214.0, 184.0, 228.0, 163.0, 498.0, 193.0, 184.0, 351.0]
2025-05-07 20:39:17,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (1216.60) for latency ExtremeSparseL4U32
2025-05-07 20:39:17,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 20:39:17,313 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 20:39:17,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 58/100 (estimated time remaining: 3 hours, 4 minutes, 12 seconds)
2025-05-07 20:43:40,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:43:44,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 868.63611 ± 506.466
2025-05-07 20:43:44,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [1293.0792, 978.6605, 1216.2318, 160.9481, 1815.0912, 1232.9261, 628.4689, 628.3352, 139.80669, 592.8139]
2025-05-07 20:43:44,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [269.0, 210.0, 243.0, 31.0, 349.0, 233.0, 139.0, 120.0, 27.0, 115.0]
2025-05-07 20:43:44,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 59 minutes, 58 seconds)
2025-05-07 20:47:57,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:48:03,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1154.16382 ± 496.209
2025-05-07 20:48:03,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [1241.6392, 1304.8899, 923.64105, 155.5688, 1686.4229, 1110.1466, 888.9687, 1677.6785, 1877.192, 675.4915]
2025-05-07 20:48:03,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [256.0, 258.0, 181.0, 30.0, 336.0, 210.0, 172.0, 331.0, 372.0, 131.0]
2025-05-07 20:48:03,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 57 minutes, 37 seconds)
2025-05-07 20:52:19,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:52:28,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1779.26135 ± 886.450
2025-05-07 20:52:28,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [704.86847, 475.30167, 2229.859, 1476.6094, 1556.0641, 2334.9019, 1870.5471, 2630.8005, 999.7472, 3513.915]
2025-05-07 20:52:28,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [134.0, 94.0, 441.0, 292.0, 303.0, 454.0, 357.0, 525.0, 202.0, 679.0]
2025-05-07 20:52:28,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (1779.26) for latency ExtremeSparseL4U32
2025-05-07 20:52:28,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 20:52:28,324 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 20:52:28,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 54 minutes, 40 seconds)
2025-05-07 20:56:53,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:56:58,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1014.23224 ± 953.976
2025-05-07 20:56:58,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [2323.678, 1175.6927, 1256.6255, 174.84518, 140.89154, 166.95969, 161.10828, 2773.3535, 180.849, 1788.3195]
2025-05-07 20:56:58,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [451.0, 232.0, 267.0, 34.0, 27.0, 32.0, 31.0, 537.0, 35.0, 341.0]
2025-05-07 20:56:58,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 52 minutes, 1 second)
2025-05-07 21:01:13,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:01:18,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1083.35876 ± 721.632
2025-05-07 21:01:18,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [1134.3221, 1588.6279, 344.98398, 600.6428, 2983.6323, 1073.4124, 864.21045, 407.509, 939.05115, 897.19495]
2025-05-07 21:01:18,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [220.0, 300.0, 64.0, 133.0, 575.0, 205.0, 165.0, 77.0, 181.0, 167.0]
2025-05-07 21:01:18,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 47 minutes, 20 seconds)
2025-05-07 21:05:39,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:05:45,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1222.02087 ± 616.076
2025-05-07 21:05:45,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [677.6326, 1477.1067, 688.35443, 1093.8789, 1337.3394, 1049.542, 1141.946, 2833.3196, 1379.0931, 541.997]
2025-05-07 21:05:45,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [147.0, 278.0, 133.0, 215.0, 259.0, 209.0, 233.0, 541.0, 270.0, 106.0]
2025-05-07 21:05:45,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 42 minutes, 58 seconds)
2025-05-07 21:10:06,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:10:14,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1534.00916 ± 1010.870
2025-05-07 21:10:14,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [1787.4299, 1309.9814, 1227.3262, 155.04619, 1123.1354, 1682.8036, 342.44217, 2317.8086, 1436.4912, 3957.6282]
2025-05-07 21:10:14,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [349.0, 249.0, 243.0, 30.0, 246.0, 327.0, 64.0, 440.0, 281.0, 774.0]
2025-05-07 21:10:14,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 39 minutes, 43 seconds)
2025-05-07 21:14:38,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:14:44,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1263.14124 ± 843.057
2025-05-07 21:14:44,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [151.31224, 986.05176, 2665.499, 1097.8169, 1785.3864, 155.25316, 2039.0642, 1810.2701, 1753.8307, 186.92677]
2025-05-07 21:14:44,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [29.0, 190.0, 515.0, 216.0, 341.0, 30.0, 378.0, 355.0, 353.0, 36.0]
2025-05-07 21:14:44,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 35 minutes, 54 seconds)
2025-05-07 21:18:55,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:19:01,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1191.90723 ± 700.079
2025-05-07 21:19:01,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [855.16, 201.11246, 2123.4983, 1495.027, 897.47687, 1600.4275, 2124.3662, 146.69458, 1797.3734, 677.9365]
2025-05-07 21:19:01,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [168.0, 39.0, 409.0, 288.0, 193.0, 334.0, 431.0, 28.0, 343.0, 129.0]
2025-05-07 21:19:01,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 29 minutes, 58 seconds)
2025-05-07 21:24:00,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:24:06,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1244.54688 ± 572.495
2025-05-07 21:24:06,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [1991.2648, 623.8759, 666.1575, 1950.4193, 1334.1909, 1545.2839, 710.9989, 693.25, 866.01465, 2064.013]
2025-05-07 21:24:06,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [377.0, 129.0, 125.0, 373.0, 255.0, 293.0, 140.0, 138.0, 166.0, 392.0]
2025-05-07 21:24:06,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 30 minutes, 29 seconds)
2025-05-07 21:28:09,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:28:16,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1384.53540 ± 751.060
2025-05-07 21:28:16,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [1433.0642, 3050.7866, 714.3039, 1821.5898, 862.9657, 1735.693, 176.02928, 891.3608, 1732.484, 1427.0773]
2025-05-07 21:28:16,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [267.0, 574.0, 135.0, 338.0, 163.0, 325.0, 36.0, 174.0, 324.0, 293.0]
2025-05-07 21:28:16,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 24 minutes, 1 second)
2025-05-07 21:32:26,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:32:31,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1207.93726 ± 782.989
2025-05-07 21:32:31,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [150.04247, 150.63544, 1714.4043, 2259.8152, 544.8394, 756.94885, 2288.348, 813.17035, 1501.6991, 1899.4688]
2025-05-07 21:32:31,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [29.0, 29.0, 319.0, 439.0, 103.0, 147.0, 444.0, 159.0, 282.0, 358.0]
2025-05-07 21:32:31,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 18 minutes, 8 seconds)
2025-05-07 21:36:42,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:36:46,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 845.13574 ± 768.692
2025-05-07 21:36:46,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [585.4333, 182.62672, 154.45158, 156.48036, 205.378, 851.1214, 1406.8408, 1470.0853, 2669.9673, 768.97266]
2025-05-07 21:36:46,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [111.0, 35.0, 30.0, 30.0, 40.0, 162.0, 261.0, 275.0, 510.0, 141.0]
2025-05-07 21:36:46,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 12 minutes, 9 seconds)
2025-05-07 21:40:51,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:40:57,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1196.28796 ± 782.196
2025-05-07 21:40:57,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [1622.3428, 1776.9114, 1154.6316, 181.02657, 198.55157, 1738.2567, 2570.5212, 1642.2318, 943.5902, 134.81674]
2025-05-07 21:40:57,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [311.0, 346.0, 222.0, 35.0, 38.0, 335.0, 504.0, 319.0, 173.0, 26.0]
2025-05-07 21:40:57,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 7 minutes, 11 seconds)
2025-05-07 21:45:10,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:45:13,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 832.60498 ± 589.752
2025-05-07 21:45:13,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [727.90717, 1095.7952, 574.909, 145.81941, 194.85063, 164.19347, 1376.3601, 1536.1298, 1910.3015, 599.7836]
2025-05-07 21:45:13,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [155.0, 215.0, 109.0, 28.0, 39.0, 32.0, 273.0, 296.0, 365.0, 115.0]
2025-05-07 21:45:13,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 58 minutes, 16 seconds)
2025-05-07 21:49:24,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:49:29,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1123.05701 ± 939.061
2025-05-07 21:49:29,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [1020.46967, 1693.4895, 287.84192, 141.17026, 784.8798, 3567.3596, 1557.8966, 510.91226, 835.9022, 830.6483]
2025-05-07 21:49:29,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [196.0, 319.0, 59.0, 27.0, 151.0, 691.0, 288.0, 94.0, 162.0, 159.0]
2025-05-07 21:49:30,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 54 minutes, 39 seconds)
2025-05-07 21:53:38,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:53:44,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1302.56006 ± 1126.113
2025-05-07 21:53:44,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [165.38239, 2034.3346, 656.92664, 2375.4932, 1256.6764, 1341.1993, 3868.192, 1007.2153, 180.48311, 139.69731]
2025-05-07 21:53:44,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [32.0, 382.0, 132.0, 464.0, 241.0, 260.0, 749.0, 197.0, 35.0, 27.0]
2025-05-07 21:53:44,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 50 minutes, 19 seconds)
2025-05-07 21:57:49,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:57:54,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1127.40479 ± 620.500
2025-05-07 21:57:54,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [1576.6772, 884.7026, 1148.2881, 860.8835, 1016.0579, 1241.5764, 1258.6742, 130.35025, 2596.8408, 559.997]
2025-05-07 21:57:54,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [299.0, 183.0, 237.0, 165.0, 196.0, 229.0, 232.0, 25.0, 489.0, 106.0]
2025-05-07 21:57:54,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 45 minutes, 42 seconds)
2025-05-07 22:02:05,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:02:10,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1275.53430 ± 519.504
2025-05-07 22:02:10,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [734.38666, 2444.363, 1551.4883, 1721.596, 1442.2439, 1183.874, 786.6548, 1264.4099, 637.3459, 988.9803]
2025-05-07 22:02:10,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [139.0, 469.0, 292.0, 321.0, 267.0, 220.0, 147.0, 240.0, 116.0, 184.0]
2025-05-07 22:02:10,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 41 minutes, 52 seconds)
2025-05-07 22:06:26,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:06:34,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1688.01929 ± 1347.362
2025-05-07 22:06:34,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [5063.578, 1136.8896, 1375.6539, 850.7561, 2381.1018, 550.30304, 1778.0433, 855.786, 2681.4985, 206.58249]
2025-05-07 22:06:34,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 225.0, 275.0, 162.0, 473.0, 101.0, 346.0, 167.0, 530.0, 40.0]
2025-05-07 22:06:34,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 38 minutes, 13 seconds)
2025-05-07 22:10:43,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:10:49,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1225.90698 ± 976.087
2025-05-07 22:10:49,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [942.7033, 1176.6755, 1596.5082, 1824.5925, 3558.6204, 197.66245, 196.22348, 377.9574, 1772.3951, 615.7305]
2025-05-07 22:10:49,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [190.0, 231.0, 306.0, 341.0, 696.0, 38.0, 38.0, 73.0, 337.0, 125.0]
2025-05-07 22:10:49,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 33 minutes, 49 seconds)
2025-05-07 22:14:56,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:15:07,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 2053.74683 ± 1759.393
2025-05-07 22:15:07,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [4866.6167, 2805.9978, 1841.2382, 156.03925, 2727.5046, 179.85928, 192.072, 2065.8027, 600.95636, 5101.382]
2025-05-07 22:15:07,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [947.0, 529.0, 355.0, 30.0, 534.0, 35.0, 37.0, 399.0, 116.0, 1000.0]
2025-05-07 22:15:07,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (2053.75) for latency ExtremeSparseL4U32
2025-05-07 22:15:07,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 22:15:07,445 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 22:15:07,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 29 minutes, 48 seconds)
2025-05-07 22:19:24,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:19:32,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1714.78650 ± 953.631
2025-05-07 22:19:32,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [2249.7847, 1186.6494, 1293.9005, 3856.151, 1571.107, 1217.4448, 2793.0483, 1626.2365, 970.84424, 382.69946]
2025-05-07 22:19:32,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [425.0, 229.0, 245.0, 737.0, 302.0, 245.0, 538.0, 316.0, 187.0, 72.0]
2025-05-07 22:19:32,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 26 minutes, 32 seconds)
2025-05-07 22:23:51,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:23:58,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1404.34729 ± 1062.179
2025-05-07 22:23:58,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [551.05597, 2039.8627, 2579.5898, 727.10114, 1874.3462, 197.73718, 254.74307, 1109.0981, 1047.7821, 3662.156]
2025-05-07 22:23:58,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [102.0, 399.0, 484.0, 139.0, 358.0, 38.0, 50.0, 217.0, 193.0, 713.0]
2025-05-07 22:23:58,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 22 minutes, 49 seconds)
2025-05-07 22:28:02,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:28:17,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 2949.17188 ± 1933.620
2025-05-07 22:28:17,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [5178.433, 3354.0532, 980.9114, 5144.068, 3082.241, 162.05766, 366.88013, 5116.6323, 1571.6298, 4534.8125]
2025-05-07 22:28:17,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 646.0, 188.0, 1000.0, 602.0, 31.0, 72.0, 1000.0, 293.0, 865.0]
2025-05-07 22:28:17,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (2949.17) for latency ExtremeSparseL4U32
2025-05-07 22:28:17,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 22:28:17,756 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 22:28:17,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 18 minutes, 10 seconds)
2025-05-07 22:32:28,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:32:35,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1274.04309 ± 1162.591
2025-05-07 22:32:35,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [130.62927, 771.6618, 1815.5549, 3476.0042, 606.7241, 174.52322, 1212.8685, 3266.563, 1084.449, 201.45183]
2025-05-07 22:32:35,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [25.0, 142.0, 340.0, 683.0, 121.0, 34.0, 233.0, 633.0, 228.0, 40.0]
2025-05-07 22:32:35,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 13 minutes, 58 seconds)
2025-05-07 22:36:43,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:36:53,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1990.94080 ± 1757.093
2025-05-07 22:36:53,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [1815.2772, 171.74486, 182.44794, 171.292, 3052.0906, 349.4792, 4729.0166, 2533.3823, 5044.281, 1860.3965]
2025-05-07 22:36:53,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [354.0, 33.0, 35.0, 33.0, 604.0, 66.0, 928.0, 500.0, 1000.0, 365.0]
2025-05-07 22:36:53,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 9 minutes, 38 seconds)
2025-05-07 22:41:04,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:41:12,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1615.71021 ± 949.523
2025-05-07 22:41:12,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [1575.1782, 1410.7195, 1945.7057, 452.56824, 749.9816, 955.1906, 1895.858, 1562.6182, 1497.6805, 4111.6]
2025-05-07 22:41:12,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [302.0, 270.0, 373.0, 88.0, 144.0, 177.0, 357.0, 292.0, 280.0, 783.0]
2025-05-07 22:41:12,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 4 minutes, 57 seconds)
2025-05-07 22:45:27,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:45:32,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1131.90857 ± 1027.601
2025-05-07 22:45:32,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [3635.0425, 178.16435, 371.70767, 1171.1245, 1812.9132, 522.0543, 181.57301, 624.87756, 1982.3883, 839.24023]
2025-05-07 22:45:32,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [719.0, 34.0, 71.0, 220.0, 337.0, 96.0, 35.0, 138.0, 380.0, 160.0]
2025-05-07 22:45:32,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 87/100 (estimated time remaining: 1 hour, 24 seconds)
2025-05-07 22:49:36,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:49:45,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1821.16663 ± 1484.778
2025-05-07 22:49:45,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [567.7315, 154.9794, 1174.683, 1191.6157, 2279.9934, 3499.8252, 5117.9146, 898.51843, 2684.4653, 641.94006]
2025-05-07 22:49:45,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [113.0, 30.0, 225.0, 223.0, 450.0, 677.0, 1000.0, 186.0, 531.0, 119.0]
2025-05-07 22:49:45,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 88/100 (estimated time remaining: 55 minutes, 47 seconds)
2025-05-07 22:54:03,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:54:06,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 631.08435 ± 486.643
2025-05-07 22:54:06,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [923.30206, 435.27652, 181.72101, 1099.0594, 541.03485, 1783.195, 377.4168, 628.1751, 165.18869, 176.47443]
2025-05-07 22:54:06,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [174.0, 80.0, 35.0, 209.0, 102.0, 348.0, 76.0, 118.0, 32.0, 34.0]
2025-05-07 22:54:06,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 89/100 (estimated time remaining: 51 minutes, 40 seconds)
2025-05-07 22:58:20,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:58:25,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 854.50214 ± 846.584
2025-05-07 22:58:25,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [901.1133, 613.536, 609.0447, 1629.7466, 771.3693, 150.52856, 239.45096, 3066.406, 190.98167, 372.84494]
2025-05-07 22:58:25,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [172.0, 121.0, 119.0, 316.0, 155.0, 29.0, 46.0, 603.0, 37.0, 70.0]
2025-05-07 22:58:25,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 90/100 (estimated time remaining: 47 minutes, 21 seconds)
2025-05-07 23:02:40,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:02:49,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1813.45044 ± 1618.972
2025-05-07 23:02:49,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [2562.6108, 1477.4326, 151.5187, 146.11798, 3490.1902, 2928.3264, 193.13387, 186.24161, 1921.2402, 5077.691]
2025-05-07 23:02:49,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [504.0, 284.0, 29.0, 28.0, 699.0, 566.0, 37.0, 36.0, 381.0, 1000.0]
2025-05-07 23:02:49,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 91/100 (estimated time remaining: 43 minutes, 14 seconds)
2025-05-07 23:07:09,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:07:14,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1063.10046 ± 974.245
2025-05-07 23:07:14,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [688.1204, 3598.3843, 1528.9408, 160.40181, 1040.5822, 987.22864, 527.438, 1593.5624, 203.99446, 302.3513]
2025-05-07 23:07:14,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [131.0, 713.0, 300.0, 31.0, 197.0, 190.0, 99.0, 310.0, 40.0, 59.0]
2025-05-07 23:07:14,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 92/100 (estimated time remaining: 39 minutes, 3 seconds)
2025-05-07 23:11:12,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:11:24,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 2264.50635 ± 1812.318
2025-05-07 23:11:24,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [757.23517, 900.05383, 342.63815, 920.3721, 5121.1025, 4384.8657, 5131.6597, 2637.127, 1356.3481, 1093.6622]
2025-05-07 23:11:24,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [162.0, 169.0, 64.0, 175.0, 1000.0, 846.0, 1000.0, 513.0, 266.0, 208.0]
2025-05-07 23:11:24,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 93/100 (estimated time remaining: 34 minutes, 38 seconds)
2025-05-07 23:15:33,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:15:37,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 947.33282 ± 676.773
2025-05-07 23:15:37,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [1159.756, 273.0487, 146.24197, 451.54553, 2197.4834, 792.1214, 1797.9863, 497.68994, 1646.8811, 510.5729]
2025-05-07 23:15:37,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [222.0, 51.0, 28.0, 87.0, 424.0, 150.0, 345.0, 92.0, 314.0, 97.0]
2025-05-07 23:15:37,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 94/100 (estimated time remaining: 30 minutes, 6 seconds)
2025-05-07 23:19:45,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:19:51,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1339.91174 ± 785.750
2025-05-07 23:19:51,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [2639.593, 1734.2715, 129.99178, 437.22845, 1134.0996, 941.2245, 1221.3416, 2044.6376, 769.9634, 2346.7651]
2025-05-07 23:19:51,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [523.0, 334.0, 25.0, 79.0, 219.0, 180.0, 222.0, 397.0, 169.0, 463.0]
2025-05-07 23:19:51,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 95/100 (estimated time remaining: 25 minutes, 43 seconds)
2025-05-07 23:24:05,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:24:16,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 2167.08374 ± 888.267
2025-05-07 23:24:16,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [639.56744, 1896.5638, 1991.0033, 2362.2893, 1149.0522, 3772.9836, 2978.6619, 2045.4418, 1710.4309, 3124.8435]
2025-05-07 23:24:16,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [129.0, 350.0, 391.0, 438.0, 233.0, 734.0, 584.0, 413.0, 318.0, 628.0]
2025-05-07 23:24:16,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 96/100 (estimated time remaining: 21 minutes, 26 seconds)
2025-05-07 23:28:23,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:28:31,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1528.96313 ± 1260.607
2025-05-07 23:28:31,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [1147.4912, 1771.687, 871.8359, 684.73096, 190.75491, 3712.967, 3902.312, 908.98016, 182.0861, 1916.7874]
2025-05-07 23:28:31,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [223.0, 352.0, 162.0, 133.0, 37.0, 727.0, 771.0, 182.0, 35.0, 378.0]
2025-05-07 23:28:31,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 97/100 (estimated time remaining: 17 minutes, 1 second)
2025-05-07 23:32:42,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:32:52,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1894.34509 ± 1277.586
2025-05-07 23:32:52,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [1375.1381, 450.86224, 398.5199, 1447.9023, 1087.202, 3451.4197, 2980.145, 2522.1357, 4324.496, 905.6315]
2025-05-07 23:32:52,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [279.0, 87.0, 79.0, 273.0, 210.0, 667.0, 578.0, 504.0, 864.0, 175.0]
2025-05-07 23:32:52,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 98/100 (estimated time remaining: 12 minutes, 52 seconds)
2025-05-07 23:37:01,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:37:06,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1141.34155 ± 635.491
2025-05-07 23:37:06,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [968.31946, 140.94516, 694.569, 288.57736, 1576.7112, 1163.9272, 1349.9058, 2412.9565, 1617.7601, 1199.7434]
2025-05-07 23:37:06,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [186.0, 27.0, 129.0, 61.0, 302.0, 214.0, 256.0, 470.0, 313.0, 234.0]
2025-05-07 23:37:06,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 99/100 (estimated time remaining: 8 minutes, 35 seconds)
2025-05-07 23:41:09,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:41:13,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 994.75421 ± 760.679
2025-05-07 23:41:13,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [1458.7815, 224.91768, 380.6833, 2389.9634, 903.5242, 2254.0762, 1040.8649, 579.00037, 165.2326, 550.4985]
2025-05-07 23:41:13,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [276.0, 46.0, 71.0, 467.0, 179.0, 426.0, 200.0, 127.0, 32.0, 106.0]
2025-05-07 23:41:13,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 16 seconds)
2025-05-07 23:45:39,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:45:44,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 1130.62878 ± 692.415
2025-05-07 23:45:44,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [1026.2902, 2204.1865, 479.82376, 800.892, 1786.9208, 1667.6602, 176.14722, 150.43367, 1842.0062, 1171.9274]
2025-05-07 23:45:44,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [185.0, 432.0, 90.0, 153.0, 338.0, 316.0, 34.0, 29.0, 341.0, 215.0]
2025-05-07 23:45:44,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1149 [DEBUG]: Training session finished
