2025-08-07 02:07:04,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc0-humanoid/ExtremeSparseL4U32-bpql-mem32
2025-08-07 02:07:04,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc0-humanoid/ExtremeSparseL4U32-bpql-mem32
2025-08-07 02:07:04,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x145f344c8550>}
2025-08-07 02:07:04,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1111 [DEBUG]: using device: cuda
2025-08-07 02:07:04,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1133 [INFO]: Creating new trainer
2025-08-07 02:07:04,258 baseline-bpql-noiseperc0-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=920, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-08-07 02:07:04,258 baseline-bpql-noiseperc0-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 02:07:06,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1194 [DEBUG]: Starting training session...
2025-08-07 02:07:06,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 1/100
2025-08-07 02:09:00,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:09:02,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 446.72455 ± 32.797
2025-08-07 02:09:02,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [401.89175, 441.33084, 516.894, 441.32767, 468.34076, 445.67084, 455.34055, 473.55182, 414.0093, 408.88812]
2025-08-07 02:09:02,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 88.0, 99.0, 84.0, 88.0, 86.0, 96.0, 100.0, 87.0, 79.0]
2025-08-07 02:09:02,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (446.72) for latency ExtremeSparseL4U32
2025-08-07 02:09:02,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 11 minutes, 10 seconds)
2025-08-07 02:11:05,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:11:07,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 449.32431 ± 30.147
2025-08-07 02:11:07,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [439.4283, 509.6065, 439.93848, 429.57425, 471.1703, 401.27408, 426.06464, 462.01337, 482.2807, 431.89233]
2025-08-07 02:11:07,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 97.0, 84.0, 81.0, 89.0, 78.0, 86.0, 86.0, 94.0, 81.0]
2025-08-07 02:11:07,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (449.32) for latency ExtremeSparseL4U32
2025-08-07 02:11:07,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 16 minutes, 42 seconds)
2025-08-07 02:13:10,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:13:12,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 461.73755 ± 44.388
2025-08-07 02:13:12,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [407.4908, 403.05954, 434.76788, 449.6367, 485.56955, 548.7768, 487.84006, 511.70352, 428.2872, 460.24356]
2025-08-07 02:13:12,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 77.0, 83.0, 85.0, 92.0, 105.0, 106.0, 102.0, 87.0, 88.0]
2025-08-07 02:13:12,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (461.74) for latency ExtremeSparseL4U32
2025-08-07 02:13:12,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 17 minutes, 14 seconds)
2025-08-07 02:15:16,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:15:18,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 546.80756 ± 103.446
2025-08-07 02:15:18,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [577.0008, 496.60474, 553.3061, 471.8397, 519.0924, 837.08655, 475.7749, 529.766, 459.91364, 547.6916]
2025-08-07 02:15:18,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 91.0, 104.0, 86.0, 104.0, 166.0, 87.0, 101.0, 94.0, 101.0]
2025-08-07 02:15:18,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (546.81) for latency ExtremeSparseL4U32
2025-08-07 02:15:19,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 16 minutes, 51 seconds)
2025-08-07 02:17:22,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:17:24,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 467.82852 ± 42.129
2025-08-07 02:17:24,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [536.1348, 501.72607, 491.19833, 419.37988, 430.90015, 422.65158, 491.0647, 435.22232, 430.67892, 519.32886]
2025-08-07 02:17:24,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 99.0, 102.0, 88.0, 81.0, 80.0, 90.0, 84.0, 91.0, 100.0]
2025-08-07 02:17:24,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 15 minutes, 36 seconds)
2025-08-07 02:19:28,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:19:30,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 546.08392 ± 63.614
2025-08-07 02:19:30,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [519.73914, 589.114, 534.0266, 628.4325, 637.0015, 504.53015, 475.42648, 621.9672, 464.53946, 486.06232]
2025-08-07 02:19:30,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 111.0, 98.0, 116.0, 131.0, 102.0, 87.0, 118.0, 95.0, 89.0]
2025-08-07 02:19:30,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 16 minutes, 50 seconds)
2025-08-07 02:21:34,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:21:36,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 509.80396 ± 112.424
2025-08-07 02:21:36,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [514.76355, 555.2929, 474.17184, 510.68088, 257.3531, 516.7651, 751.90497, 488.00195, 507.0021, 522.1029]
2025-08-07 02:21:36,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 101.0, 87.0, 94.0, 50.0, 93.0, 141.0, 91.0, 91.0, 113.0]
2025-08-07 02:21:36,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 14 minutes, 48 seconds)
2025-08-07 02:23:40,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:23:42,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 515.61853 ± 77.510
2025-08-07 02:23:42,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [410.7515, 592.48157, 624.76935, 610.39844, 475.78882, 571.85657, 528.59357, 407.92123, 489.13135, 444.49283]
2025-08-07 02:23:42,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 124.0, 119.0, 112.0, 88.0, 106.0, 100.0, 88.0, 91.0, 82.0]
2025-08-07 02:23:42,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 13 minutes, 5 seconds)
2025-08-07 02:25:47,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:25:49,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 549.30176 ± 75.822
2025-08-07 02:25:49,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [482.07605, 571.2175, 672.9402, 465.91812, 469.8947, 527.51733, 673.0915, 616.56274, 503.17398, 510.62567]
2025-08-07 02:25:49,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 103.0, 125.0, 95.0, 83.0, 96.0, 133.0, 130.0, 97.0, 94.0]
2025-08-07 02:25:49,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (549.30) for latency ExtremeSparseL4U32
2025-08-07 02:25:49,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 11 minutes, 12 seconds)
2025-08-07 02:27:53,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:27:56,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 573.18005 ± 67.823
2025-08-07 02:27:56,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [554.5686, 638.4241, 514.43945, 655.5334, 523.6371, 590.5799, 488.60135, 526.0855, 534.3571, 705.57495]
2025-08-07 02:27:56,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 121.0, 106.0, 131.0, 107.0, 125.0, 103.0, 109.0, 114.0, 143.0]
2025-08-07 02:27:56,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (573.18) for latency ExtremeSparseL4U32
2025-08-07 02:27:56,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 9 minutes, 30 seconds)
2025-08-07 02:30:01,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:30:04,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 596.53339 ± 128.503
2025-08-07 02:30:04,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [566.85425, 440.48923, 583.47095, 474.05203, 510.50247, 608.855, 723.5162, 898.62823, 511.63474, 647.33105]
2025-08-07 02:30:04,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 83.0, 107.0, 101.0, 92.0, 115.0, 136.0, 170.0, 95.0, 122.0]
2025-08-07 02:30:04,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (596.53) for latency ExtremeSparseL4U32
2025-08-07 02:30:04,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 7 minutes, 51 seconds)
2025-08-07 02:32:07,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:32:10,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 580.75574 ± 74.302
2025-08-07 02:32:10,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [593.70294, 436.763, 631.79956, 555.84454, 675.2504, 574.7171, 479.3952, 556.5386, 675.93396, 627.6117]
2025-08-07 02:32:10,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 82.0, 116.0, 112.0, 127.0, 105.0, 87.0, 117.0, 124.0, 111.0]
2025-08-07 02:32:10,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 3 hours, 5 minutes, 57 seconds)
2025-08-07 02:34:14,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:34:17,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 609.48212 ± 87.854
2025-08-07 02:34:17,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [729.79175, 644.5956, 743.19073, 507.93848, 634.86066, 634.45996, 538.0006, 483.31332, 517.99603, 660.6739]
2025-08-07 02:34:17,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [149.0, 121.0, 157.0, 96.0, 115.0, 119.0, 104.0, 89.0, 99.0, 123.0]
2025-08-07 02:34:17,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (609.48) for latency ExtremeSparseL4U32
2025-08-07 02:34:17,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 3 hours, 4 minutes, 3 seconds)
2025-08-07 02:36:21,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:36:24,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 629.08051 ± 53.778
2025-08-07 02:36:24,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [593.2597, 652.19525, 629.4194, 677.14844, 650.07153, 734.48267, 596.8388, 585.31805, 528.66956, 643.40204]
2025-08-07 02:36:24,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 123.0, 121.0, 129.0, 124.0, 141.0, 116.0, 109.0, 99.0, 121.0]
2025-08-07 02:36:24,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (629.08) for latency ExtremeSparseL4U32
2025-08-07 02:36:24,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 3 hours, 1 minute, 59 seconds)
2025-08-07 02:38:28,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:38:30,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 619.66687 ± 49.868
2025-08-07 02:38:30,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [633.3754, 582.05084, 665.26086, 670.9619, 557.2464, 632.99207, 534.9079, 586.86505, 696.7862, 636.2226]
2025-08-07 02:38:30,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 109.0, 125.0, 133.0, 119.0, 120.0, 102.0, 111.0, 145.0, 118.0]
2025-08-07 02:38:30,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 59 minutes, 47 seconds)
2025-08-07 02:40:36,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:40:38,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 646.91565 ± 77.070
2025-08-07 02:40:38,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [626.1632, 690.7824, 755.1636, 610.1984, 696.4905, 613.9575, 551.6759, 784.853, 585.5034, 554.3684]
2025-08-07 02:40:38,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 132.0, 143.0, 111.0, 134.0, 111.0, 101.0, 150.0, 110.0, 100.0]
2025-08-07 02:40:38,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (646.92) for latency ExtremeSparseL4U32
2025-08-07 02:40:38,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 57 minutes, 43 seconds)
2025-08-07 02:42:43,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:42:46,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 614.34692 ± 56.334
2025-08-07 02:42:46,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [643.13544, 519.4905, 591.17786, 611.70575, 601.79395, 608.74194, 570.46375, 588.8271, 739.4222, 668.71106]
2025-08-07 02:42:46,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 107.0, 109.0, 126.0, 116.0, 112.0, 105.0, 114.0, 147.0, 123.0]
2025-08-07 02:42:46,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 55 minutes, 57 seconds)
2025-08-07 02:44:49,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:44:51,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 643.42676 ± 57.680
2025-08-07 02:44:51,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [601.3958, 562.4485, 723.6982, 567.9397, 603.021, 676.3875, 627.42316, 727.8121, 695.17395, 648.9678]
2025-08-07 02:44:51,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 101.0, 137.0, 113.0, 111.0, 127.0, 125.0, 135.0, 127.0, 121.0]
2025-08-07 02:44:51,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 53 minutes, 26 seconds)
2025-08-07 02:46:56,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:46:58,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 634.47791 ± 67.218
2025-08-07 02:46:58,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [660.5776, 716.6987, 701.9184, 726.76825, 525.30347, 566.5252, 604.43256, 624.61896, 554.94183, 662.99445]
2025-08-07 02:46:58,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 146.0, 140.0, 141.0, 104.0, 104.0, 111.0, 123.0, 102.0, 118.0]
2025-08-07 02:46:58,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 51 minutes, 16 seconds)
2025-08-07 02:49:02,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:49:05,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 694.63049 ± 123.685
2025-08-07 02:49:05,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [597.5398, 731.81964, 573.56464, 1000.3007, 687.2218, 710.3717, 728.95966, 703.7234, 698.4553, 514.3488]
2025-08-07 02:49:05,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 144.0, 103.0, 197.0, 133.0, 130.0, 152.0, 134.0, 134.0, 94.0]
2025-08-07 02:49:05,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (694.63) for latency ExtremeSparseL4U32
2025-08-07 02:49:05,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 49 minutes, 9 seconds)
2025-08-07 02:51:08,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:51:10,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 666.57556 ± 102.552
2025-08-07 02:51:10,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [687.3894, 632.6718, 692.15717, 844.4156, 563.42145, 730.2852, 711.48425, 459.41803, 748.44507, 596.06683]
2025-08-07 02:51:10,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 118.0, 132.0, 173.0, 110.0, 140.0, 132.0, 94.0, 147.0, 112.0]
2025-08-07 02:51:10,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 46 minutes, 22 seconds)
2025-08-07 02:53:17,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:53:19,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 633.29541 ± 56.952
2025-08-07 02:53:19,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [624.9448, 709.332, 636.1141, 513.7698, 555.3459, 679.59326, 636.7608, 695.2265, 648.2215, 633.64557]
2025-08-07 02:53:19,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 133.0, 114.0, 95.0, 102.0, 138.0, 114.0, 132.0, 134.0, 114.0]
2025-08-07 02:53:19,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 44 minutes, 46 seconds)
2025-08-07 02:55:26,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:55:28,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 623.23035 ± 82.972
2025-08-07 02:55:28,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [579.78204, 662.698, 564.59546, 680.1825, 580.8359, 720.2596, 651.5649, 437.15616, 622.10614, 733.12366]
2025-08-07 02:55:28,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 127.0, 112.0, 133.0, 117.0, 131.0, 119.0, 88.0, 127.0, 139.0]
2025-08-07 02:55:28,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 43 minutes, 30 seconds)
2025-08-07 02:57:35,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:57:38,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 675.77625 ± 47.725
2025-08-07 02:57:38,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [667.946, 661.9491, 637.039, 766.4344, 627.9095, 618.26184, 695.8055, 685.9624, 644.8763, 751.5782]
2025-08-07 02:57:38,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 121.0, 133.0, 150.0, 132.0, 112.0, 143.0, 138.0, 117.0, 148.0]
2025-08-07 02:57:38,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 42 minutes, 7 seconds)
2025-08-07 02:59:43,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:59:46,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 691.69305 ± 106.484
2025-08-07 02:59:46,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [684.5533, 582.3798, 719.3671, 663.6867, 561.1476, 758.38416, 649.8427, 941.5832, 760.9891, 594.9971]
2025-08-07 02:59:46,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 109.0, 132.0, 125.0, 112.0, 160.0, 137.0, 191.0, 155.0, 123.0]
2025-08-07 02:59:46,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 40 minutes, 15 seconds)
2025-08-07 03:01:53,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:01:56,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 714.98645 ± 67.570
2025-08-07 03:01:56,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [784.3836, 762.7198, 640.088, 662.5113, 821.00165, 742.60046, 695.56256, 666.844, 770.5766, 603.5769]
2025-08-07 03:01:56,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 144.0, 131.0, 122.0, 171.0, 138.0, 142.0, 134.0, 160.0, 113.0]
2025-08-07 03:01:56,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (714.99) for latency ExtremeSparseL4U32
2025-08-07 03:01:56,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 39 minutes, 15 seconds)
2025-08-07 03:04:01,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:04:04,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 731.53601 ± 130.094
2025-08-07 03:04:04,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [806.62177, 825.14185, 705.0385, 654.06, 556.1349, 779.376, 1035.8671, 682.8133, 607.6196, 662.6864]
2025-08-07 03:04:04,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 153.0, 126.0, 117.0, 102.0, 145.0, 195.0, 128.0, 117.0, 122.0]
2025-08-07 03:04:04,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (731.54) for latency ExtremeSparseL4U32
2025-08-07 03:04:04,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 36 minutes, 55 seconds)
2025-08-07 03:06:11,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:06:14,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 711.92596 ± 70.876
2025-08-07 03:06:14,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [609.7858, 768.70154, 789.46436, 598.67194, 760.295, 719.9541, 772.38544, 758.3662, 726.24475, 615.39136]
2025-08-07 03:06:14,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 148.0, 160.0, 120.0, 143.0, 137.0, 148.0, 146.0, 138.0, 107.0]
2025-08-07 03:06:14,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 34 minutes, 58 seconds)
2025-08-07 03:08:20,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:08:22,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 736.35193 ± 108.015
2025-08-07 03:08:22,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [760.44806, 641.51526, 683.8324, 909.2734, 956.9387, 602.7321, 669.72974, 734.425, 726.45276, 678.1722]
2025-08-07 03:08:22,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 116.0, 121.0, 188.0, 182.0, 109.0, 117.0, 137.0, 141.0, 138.0]
2025-08-07 03:08:22,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (736.35) for latency ExtremeSparseL4U32
2025-08-07 03:08:23,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 32 minutes, 31 seconds)
2025-08-07 03:10:28,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:10:31,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 732.08044 ± 103.906
2025-08-07 03:10:31,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [554.528, 631.95667, 727.3852, 776.50183, 797.6387, 852.0957, 776.3227, 604.2343, 895.3903, 704.7509]
2025-08-07 03:10:31,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 130.0, 132.0, 143.0, 149.0, 159.0, 160.0, 122.0, 177.0, 129.0]
2025-08-07 03:10:31,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 30 minutes, 38 seconds)
2025-08-07 03:12:37,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:12:40,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 707.69739 ± 129.418
2025-08-07 03:12:40,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [631.1127, 642.02655, 1030.832, 646.3656, 879.811, 646.96735, 681.4816, 635.85114, 617.3086, 665.21686]
2025-08-07 03:12:40,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 138.0, 195.0, 119.0, 164.0, 118.0, 121.0, 118.0, 130.0, 120.0]
2025-08-07 03:12:40,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 28 minutes, 10 seconds)
2025-08-07 03:14:46,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:14:49,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 688.47296 ± 79.763
2025-08-07 03:14:49,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [630.4679, 807.3088, 838.5905, 632.01416, 656.57825, 586.37744, 718.12915, 660.1168, 618.2039, 736.9425]
2025-08-07 03:14:49,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 156.0, 160.0, 125.0, 140.0, 106.0, 128.0, 116.0, 122.0, 136.0]
2025-08-07 03:14:49,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 26 minutes, 9 seconds)
2025-08-07 03:16:56,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:16:58,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 770.82697 ± 90.632
2025-08-07 03:16:58,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [955.3971, 814.4778, 693.1514, 674.35254, 727.8436, 696.90424, 814.3461, 884.09973, 677.05927, 770.6382]
2025-08-07 03:16:58,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [185.0, 150.0, 123.0, 121.0, 152.0, 135.0, 147.0, 175.0, 122.0, 141.0]
2025-08-07 03:16:58,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (770.83) for latency ExtremeSparseL4U32
2025-08-07 03:16:58,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 23 minutes, 55 seconds)
2025-08-07 03:19:05,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:19:08,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 758.95056 ± 80.864
2025-08-07 03:19:08,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [666.8193, 698.9138, 784.13727, 768.83417, 976.9311, 764.9101, 724.90125, 747.9272, 758.2853, 697.84686]
2025-08-07 03:19:08,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 135.0, 157.0, 154.0, 197.0, 140.0, 154.0, 139.0, 142.0, 124.0]
2025-08-07 03:19:08,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 21 minutes, 56 seconds)
2025-08-07 03:21:15,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:21:18,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 708.55426 ± 63.985
2025-08-07 03:21:18,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [678.4115, 762.6414, 798.67804, 683.393, 763.72797, 803.7453, 681.00446, 639.4299, 620.804, 653.7066]
2025-08-07 03:21:18,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 145.0, 151.0, 139.0, 140.0, 168.0, 138.0, 116.0, 111.0, 140.0]
2025-08-07 03:21:18,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 20 minutes, 7 seconds)
2025-08-07 03:23:24,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:23:27,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 670.44562 ± 79.752
2025-08-07 03:23:27,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [644.06036, 612.7412, 758.88135, 824.83075, 590.19006, 584.6557, 697.47205, 671.41583, 741.2692, 578.9402]
2025-08-07 03:23:27,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 117.0, 138.0, 151.0, 104.0, 103.0, 137.0, 129.0, 143.0, 106.0]
2025-08-07 03:23:27,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 18 minutes)
2025-08-07 03:25:33,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:25:36,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 751.84509 ± 91.359
2025-08-07 03:25:36,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [709.58594, 675.5041, 787.2813, 653.0202, 855.5381, 676.22394, 743.03186, 698.2971, 755.0428, 964.92535]
2025-08-07 03:25:36,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 138.0, 148.0, 128.0, 172.0, 130.0, 141.0, 139.0, 143.0, 182.0]
2025-08-07 03:25:36,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 15 minutes, 54 seconds)
2025-08-07 03:27:42,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:27:45,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 779.23889 ± 91.977
2025-08-07 03:27:45,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [921.71466, 777.8968, 754.13947, 913.3138, 631.16345, 729.43286, 743.67816, 777.52167, 672.0036, 871.52423]
2025-08-07 03:27:45,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [179.0, 162.0, 143.0, 182.0, 114.0, 148.0, 142.0, 142.0, 120.0, 171.0]
2025-08-07 03:27:45,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (779.24) for latency ExtremeSparseL4U32
2025-08-07 03:27:45,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 13 minutes, 37 seconds)
2025-08-07 03:29:52,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:29:54,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 700.51648 ± 97.042
2025-08-07 03:29:54,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [778.542, 604.79614, 713.5442, 815.95264, 798.7398, 479.0127, 746.5955, 674.0827, 652.9843, 740.91504]
2025-08-07 03:29:54,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [150.0, 116.0, 128.0, 149.0, 150.0, 86.0, 139.0, 130.0, 122.0, 133.0]
2025-08-07 03:29:54,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 11 minutes, 28 seconds)
2025-08-07 03:32:03,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:32:05,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 759.69580 ± 46.667
2025-08-07 03:32:05,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [768.57635, 867.54376, 782.0087, 757.78186, 780.743, 780.3176, 705.6823, 725.28265, 730.53064, 698.4912]
2025-08-07 03:32:05,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 164.0, 142.0, 147.0, 142.0, 147.0, 136.0, 130.0, 148.0, 126.0]
2025-08-07 03:32:05,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 9 minutes, 29 seconds)
2025-08-07 03:34:11,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:34:14,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 717.66742 ± 47.820
2025-08-07 03:34:14,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [755.8604, 740.51935, 738.69617, 679.10864, 809.5981, 702.57324, 725.8029, 618.8084, 702.3474, 703.35986]
2025-08-07 03:34:14,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [149.0, 139.0, 137.0, 122.0, 149.0, 150.0, 138.0, 111.0, 129.0, 125.0]
2025-08-07 03:34:14,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 7 minutes, 10 seconds)
2025-08-07 03:36:21,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:36:24,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 719.50134 ± 44.299
2025-08-07 03:36:24,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [836.1034, 703.6467, 687.6909, 695.99286, 689.6646, 755.51416, 683.43896, 705.66675, 702.3503, 734.945]
2025-08-07 03:36:24,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [155.0, 129.0, 125.0, 131.0, 125.0, 143.0, 125.0, 147.0, 129.0, 139.0]
2025-08-07 03:36:24,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 5 minutes, 8 seconds)
2025-08-07 03:38:31,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:38:34,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 774.64716 ± 55.438
2025-08-07 03:38:34,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [762.2938, 787.4494, 800.6081, 857.35236, 649.86127, 739.3256, 804.0971, 833.10034, 779.84204, 732.54224]
2025-08-07 03:38:34,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 157.0, 153.0, 162.0, 113.0, 139.0, 154.0, 155.0, 145.0, 137.0]
2025-08-07 03:38:34,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 3 minutes, 12 seconds)
2025-08-07 03:40:41,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:40:44,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 755.98743 ± 98.138
2025-08-07 03:40:44,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [716.2973, 780.4388, 655.5257, 733.2651, 739.56805, 785.7528, 1019.5238, 656.62415, 702.63995, 770.2383]
2025-08-07 03:40:44,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 151.0, 138.0, 131.0, 153.0, 162.0, 192.0, 138.0, 130.0, 141.0]
2025-08-07 03:40:44,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 1 minute, 17 seconds)
2025-08-07 03:42:50,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:42:52,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 756.09998 ± 62.278
2025-08-07 03:42:52,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [865.2324, 766.12573, 821.7908, 724.33356, 673.65564, 740.35974, 837.4747, 678.7005, 721.19104, 732.1362]
2025-08-07 03:42:52,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [163.0, 142.0, 154.0, 131.0, 120.0, 135.0, 160.0, 144.0, 132.0, 153.0]
2025-08-07 03:42:53,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 58 minutes, 37 seconds)
2025-08-07 03:44:59,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:45:02,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 757.10608 ± 45.695
2025-08-07 03:45:02,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [676.3728, 778.2481, 815.5615, 727.2419, 754.9708, 827.5174, 740.0909, 763.42804, 788.3219, 699.30774]
2025-08-07 03:45:02,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 145.0, 152.0, 129.0, 138.0, 154.0, 139.0, 141.0, 149.0, 135.0]
2025-08-07 03:45:02,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 56 minutes, 43 seconds)
2025-08-07 03:47:09,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:47:12,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 710.94452 ± 64.092
2025-08-07 03:47:12,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [749.5273, 780.46454, 596.81903, 728.1873, 672.2982, 771.7125, 601.1314, 766.5162, 700.39136, 742.3977]
2025-08-07 03:47:12,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 147.0, 112.0, 131.0, 118.0, 143.0, 106.0, 159.0, 125.0, 135.0]
2025-08-07 03:47:12,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 54 minutes, 33 seconds)
2025-08-07 03:49:18,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:49:21,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 768.61792 ± 105.784
2025-08-07 03:49:21,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [699.1838, 740.0883, 883.73816, 894.4209, 795.1099, 511.81723, 800.8853, 801.6244, 843.55634, 715.75507]
2025-08-07 03:49:21,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 136.0, 169.0, 177.0, 150.0, 91.0, 147.0, 172.0, 162.0, 134.0]
2025-08-07 03:49:21,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 52 minutes, 18 seconds)
2025-08-07 03:51:29,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:51:32,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 730.51251 ± 46.276
2025-08-07 03:51:32,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [700.8894, 741.3726, 638.18823, 730.7098, 774.4617, 717.3869, 750.7519, 693.00757, 819.41754, 738.9396]
2025-08-07 03:51:32,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 135.0, 116.0, 148.0, 143.0, 140.0, 141.0, 126.0, 160.0, 134.0]
2025-08-07 03:51:32,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 50 minutes, 11 seconds)
2025-08-07 03:53:38,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:53:41,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 758.46027 ± 83.379
2025-08-07 03:53:41,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [706.7429, 718.76544, 741.1864, 747.568, 808.6907, 600.2313, 803.17004, 939.15283, 795.89575, 723.1993]
2025-08-07 03:53:41,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [148.0, 142.0, 146.0, 153.0, 155.0, 108.0, 155.0, 177.0, 147.0, 138.0]
2025-08-07 03:53:41,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 48 minutes, 7 seconds)
2025-08-07 03:55:49,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:55:52,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 812.92859 ± 79.775
2025-08-07 03:55:52,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [955.3118, 886.9816, 686.1698, 785.39685, 757.6612, 922.2916, 741.2429, 809.83417, 796.39795, 787.9974]
2025-08-07 03:55:52,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [195.0, 164.0, 123.0, 148.0, 139.0, 172.0, 143.0, 151.0, 156.0, 144.0]
2025-08-07 03:55:52,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (812.93) for latency ExtremeSparseL4U32
2025-08-07 03:55:52,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 46 minutes, 5 seconds)
2025-08-07 03:57:58,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:58:01,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 762.06427 ± 68.135
2025-08-07 03:58:01,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [859.2875, 752.50854, 756.2532, 818.3947, 731.787, 681.49817, 891.85754, 718.9013, 682.0393, 728.1152]
2025-08-07 03:58:01,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 147.0, 153.0, 151.0, 132.0, 128.0, 177.0, 144.0, 124.0, 130.0]
2025-08-07 03:58:01,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 43 minutes, 50 seconds)
2025-08-07 04:00:08,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:00:11,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 712.08020 ± 85.845
2025-08-07 04:00:11,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [632.3625, 731.85486, 718.7312, 683.0782, 725.6212, 769.83435, 517.1014, 735.8713, 863.8534, 742.4933]
2025-08-07 04:00:11,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 135.0, 133.0, 123.0, 142.0, 151.0, 95.0, 131.0, 166.0, 133.0]
2025-08-07 04:00:11,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 41 minutes, 45 seconds)
2025-08-07 04:02:18,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:02:21,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 786.84973 ± 63.199
2025-08-07 04:02:21,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [801.1659, 667.6367, 708.5255, 768.00653, 823.92523, 772.3089, 904.14825, 842.96063, 773.74866, 806.0713]
2025-08-07 04:02:21,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 126.0, 151.0, 142.0, 158.0, 162.0, 170.0, 159.0, 144.0, 166.0]
2025-08-07 04:02:21,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 39 minutes, 31 seconds)
2025-08-07 04:04:28,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:04:31,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 750.06085 ± 52.389
2025-08-07 04:04:31,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [821.9327, 783.73193, 799.1645, 794.7201, 764.941, 764.91296, 657.5483, 733.8229, 700.92975, 678.9042]
2025-08-07 04:04:31,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [162.0, 143.0, 155.0, 148.0, 147.0, 153.0, 119.0, 134.0, 138.0, 137.0]
2025-08-07 04:04:31,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 37 minutes, 30 seconds)
2025-08-07 04:06:37,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:06:40,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 789.12762 ± 111.724
2025-08-07 04:06:40,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [840.4703, 739.729, 790.81995, 770.68335, 739.1781, 702.73627, 846.8303, 658.2569, 722.6402, 1079.9321]
2025-08-07 04:06:40,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 139.0, 148.0, 153.0, 142.0, 140.0, 155.0, 126.0, 142.0, 207.0]
2025-08-07 04:06:40,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 35 minutes, 7 seconds)
2025-08-07 04:08:47,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:08:50,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 735.66583 ± 61.831
2025-08-07 04:08:50,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [694.91656, 808.0792, 682.89905, 669.4065, 655.2515, 713.5028, 849.21204, 762.15955, 726.30414, 794.92676]
2025-08-07 04:08:50,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 145.0, 133.0, 132.0, 123.0, 135.0, 164.0, 146.0, 138.0, 147.0]
2025-08-07 04:08:50,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 32 minutes, 59 seconds)
2025-08-07 04:10:58,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:11:01,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 776.31494 ± 66.978
2025-08-07 04:11:01,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [766.5746, 625.64655, 796.02637, 870.19495, 780.40405, 806.32275, 861.5107, 795.4003, 722.4342, 738.63513]
2025-08-07 04:11:01,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 112.0, 159.0, 177.0, 139.0, 159.0, 179.0, 147.0, 132.0, 141.0]
2025-08-07 04:11:01,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 30 minutes, 57 seconds)
2025-08-07 04:13:08,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:13:11,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 742.49103 ± 99.451
2025-08-07 04:13:11,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [592.35284, 650.5809, 730.0132, 720.35114, 909.2857, 903.84125, 720.7537, 655.26575, 728.3151, 814.1507]
2025-08-07 04:13:11,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 117.0, 144.0, 133.0, 169.0, 167.0, 140.0, 122.0, 131.0, 160.0]
2025-08-07 04:13:11,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 28 minutes, 43 seconds)
2025-08-07 04:15:17,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:15:20,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 746.98578 ± 127.920
2025-08-07 04:15:20,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [793.6283, 877.74457, 806.46313, 770.67596, 910.0993, 755.2048, 744.5497, 435.11908, 756.7699, 619.60345]
2025-08-07 04:15:20,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [149.0, 167.0, 165.0, 144.0, 171.0, 148.0, 133.0, 84.0, 154.0, 130.0]
2025-08-07 04:15:20,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 26 minutes, 32 seconds)
2025-08-07 04:17:27,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:17:30,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 761.44910 ± 75.969
2025-08-07 04:17:30,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [750.1297, 809.90076, 688.8824, 939.9898, 678.5947, 827.5116, 702.031, 773.7153, 726.94434, 716.79095]
2025-08-07 04:17:30,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 156.0, 130.0, 184.0, 128.0, 163.0, 124.0, 161.0, 136.0, 137.0]
2025-08-07 04:17:30,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 24 minutes, 25 seconds)
2025-08-07 04:19:36,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:19:39,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 773.80865 ± 111.273
2025-08-07 04:19:39,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [782.26886, 679.86847, 587.3992, 803.6689, 996.7099, 845.9462, 794.1801, 742.5158, 647.6176, 857.9112]
2025-08-07 04:19:39,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [148.0, 124.0, 108.0, 157.0, 206.0, 158.0, 155.0, 145.0, 133.0, 171.0]
2025-08-07 04:19:40,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 22 minutes, 18 seconds)
2025-08-07 04:21:46,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:21:49,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 814.13245 ± 94.194
2025-08-07 04:21:49,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [819.76953, 661.9183, 778.3256, 678.37616, 820.15485, 1000.5737, 895.28705, 778.8732, 852.65576, 855.3904]
2025-08-07 04:21:49,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 124.0, 142.0, 130.0, 154.0, 187.0, 185.0, 144.0, 172.0, 161.0]
2025-08-07 04:21:49,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (814.13) for latency ExtremeSparseL4U32
2025-08-07 04:21:49,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 20 minutes)
2025-08-07 04:23:57,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:24:00,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 791.73065 ± 110.420
2025-08-07 04:24:00,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [830.10156, 651.7284, 860.6239, 905.84357, 1034.5996, 706.9457, 690.5342, 758.8379, 743.3464, 734.7458]
2025-08-07 04:24:00,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 119.0, 159.0, 173.0, 207.0, 130.0, 137.0, 136.0, 135.0, 152.0]
2025-08-07 04:24:00,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 17 minutes, 56 seconds)
2025-08-07 04:26:06,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:26:09,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 832.75281 ± 64.467
2025-08-07 04:26:09,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [943.61725, 720.3837, 796.0002, 856.51074, 844.4783, 852.31305, 812.7367, 909.579, 846.9237, 744.98517]
2025-08-07 04:26:09,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [174.0, 147.0, 152.0, 161.0, 163.0, 156.0, 157.0, 169.0, 175.0, 138.0]
2025-08-07 04:26:09,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (832.75) for latency ExtremeSparseL4U32
2025-08-07 04:26:09,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 15 minutes, 40 seconds)
2025-08-07 04:28:16,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:28:19,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 747.75793 ± 44.478
2025-08-07 04:28:19,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [712.8765, 754.8679, 699.9142, 763.0622, 850.3138, 758.488, 685.87067, 742.18164, 728.6034, 781.40155]
2025-08-07 04:28:19,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 141.0, 143.0, 137.0, 164.0, 147.0, 126.0, 148.0, 147.0, 155.0]
2025-08-07 04:28:19,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 13 minutes, 36 seconds)
2025-08-07 04:30:26,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:30:28,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 735.59167 ± 70.870
2025-08-07 04:30:28,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [669.7061, 777.7379, 797.0194, 712.2346, 604.6958, 757.61346, 657.2139, 835.0213, 813.816, 730.85785]
2025-08-07 04:30:28,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 149.0, 147.0, 142.0, 124.0, 149.0, 122.0, 153.0, 168.0, 130.0]
2025-08-07 04:30:28,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 11 minutes, 22 seconds)
2025-08-07 04:32:36,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:32:39,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 834.48761 ± 84.085
2025-08-07 04:32:39,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [888.83606, 819.6486, 810.7013, 699.0142, 801.1089, 1018.5497, 926.43915, 797.3902, 798.8584, 784.3297]
2025-08-07 04:32:39,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [176.0, 154.0, 150.0, 124.0, 163.0, 194.0, 175.0, 159.0, 150.0, 146.0]
2025-08-07 04:32:39,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (834.49) for latency ExtremeSparseL4U32
2025-08-07 04:32:39,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 9 minutes, 16 seconds)
2025-08-07 04:34:46,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:34:49,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 790.74420 ± 69.181
2025-08-07 04:34:49,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [862.47034, 893.75696, 669.8694, 712.78784, 828.9809, 784.35986, 816.78424, 717.6347, 770.91876, 849.8783]
2025-08-07 04:34:49,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [163.0, 168.0, 136.0, 135.0, 155.0, 141.0, 148.0, 127.0, 145.0, 158.0]
2025-08-07 04:34:49,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 7 minutes, 1 second)
2025-08-07 04:36:55,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:36:58,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 805.27478 ± 105.968
2025-08-07 04:36:58,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [747.3181, 802.64844, 769.4778, 744.6262, 662.2391, 1066.7549, 798.2645, 722.85425, 862.20465, 876.35925]
2025-08-07 04:36:58,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 149.0, 155.0, 154.0, 124.0, 207.0, 161.0, 139.0, 165.0, 166.0]
2025-08-07 04:36:58,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 4 minutes, 55 seconds)
2025-08-07 04:39:04,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:39:07,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 818.50189 ± 53.976
2025-08-07 04:39:07,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [878.4148, 786.14355, 814.0426, 808.27594, 793.28827, 766.5114, 896.1815, 788.0266, 910.50867, 743.6253]
2025-08-07 04:39:07,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [163.0, 149.0, 149.0, 153.0, 147.0, 144.0, 166.0, 146.0, 178.0, 137.0]
2025-08-07 04:39:07,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 2 minutes, 39 seconds)
2025-08-07 04:41:14,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:41:17,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 809.06683 ± 61.790
2025-08-07 04:41:17,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [723.58997, 786.5708, 895.65, 936.3501, 842.38, 785.7488, 778.4904, 764.9385, 763.1179, 813.83154]
2025-08-07 04:41:17,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 142.0, 164.0, 189.0, 155.0, 149.0, 146.0, 139.0, 137.0, 151.0]
2025-08-07 04:41:17,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 34 seconds)
2025-08-07 04:43:24,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:43:27,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 769.65051 ± 48.850
2025-08-07 04:43:27,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [709.6951, 796.3789, 750.56433, 710.51245, 781.2977, 870.8581, 793.77673, 707.2654, 797.0581, 779.0982]
2025-08-07 04:43:27,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 149.0, 146.0, 135.0, 165.0, 158.0, 144.0, 131.0, 148.0, 147.0]
2025-08-07 04:43:27,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 58 minutes, 21 seconds)
2025-08-07 04:45:35,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:45:38,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 811.68787 ± 67.035
2025-08-07 04:45:38,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [720.64636, 923.299, 823.4367, 840.7168, 755.3287, 899.2945, 716.4087, 783.0871, 858.74805, 795.91296]
2025-08-07 04:45:38,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 171.0, 160.0, 154.0, 149.0, 166.0, 142.0, 147.0, 161.0, 146.0]
2025-08-07 04:45:38,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 56 minutes, 15 seconds)
2025-08-07 04:47:45,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:47:48,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 796.74860 ± 73.973
2025-08-07 04:47:48,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [729.0772, 723.5459, 831.885, 713.88477, 848.0572, 726.1248, 932.4102, 852.54504, 865.8393, 744.1164]
2025-08-07 04:47:48,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 140.0, 158.0, 125.0, 173.0, 135.0, 191.0, 160.0, 164.0, 154.0]
2025-08-07 04:47:48,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 54 minutes, 10 seconds)
2025-08-07 04:49:53,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:49:56,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 762.39764 ± 135.915
2025-08-07 04:49:56,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [811.9314, 697.45575, 1052.0054, 750.7157, 776.0941, 793.3268, 607.14636, 804.7045, 510.12857, 820.46826]
2025-08-07 04:49:56,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 128.0, 201.0, 140.0, 144.0, 148.0, 113.0, 149.0, 99.0, 153.0]
2025-08-07 04:49:56,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 51 minutes, 55 seconds)
2025-08-07 04:52:05,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:52:08,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 756.19269 ± 43.630
2025-08-07 04:52:08,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [743.6498, 674.6069, 756.32227, 804.2769, 802.20514, 773.5901, 738.06793, 742.87335, 704.4849, 821.84985]
2025-08-07 04:52:08,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 119.0, 135.0, 170.0, 158.0, 140.0, 148.0, 143.0, 139.0, 156.0]
2025-08-07 04:52:08,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 49 minutes, 51 seconds)
2025-08-07 04:54:13,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:54:16,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 793.24854 ± 93.184
2025-08-07 04:54:16,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [583.14905, 847.7073, 755.4088, 849.7559, 794.40704, 752.2163, 969.5346, 820.6853, 755.0622, 804.55914]
2025-08-07 04:54:16,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 161.0, 137.0, 155.0, 146.0, 159.0, 185.0, 156.0, 137.0, 153.0]
2025-08-07 04:54:16,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 47 minutes, 33 seconds)
2025-08-07 04:56:24,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:56:27,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 777.44788 ± 90.841
2025-08-07 04:56:27,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [805.134, 968.48175, 703.7302, 693.05707, 824.7715, 850.18494, 620.46136, 755.5739, 771.5512, 781.533]
2025-08-07 04:56:27,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [161.0, 185.0, 132.0, 121.0, 157.0, 178.0, 128.0, 142.0, 161.0, 145.0]
2025-08-07 04:56:27,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 45 minutes, 27 seconds)
2025-08-07 04:58:34,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:58:37,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 854.97430 ± 131.871
2025-08-07 04:58:37,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [639.4471, 779.64813, 903.0456, 1006.4652, 768.18243, 1031.7655, 768.3814, 776.12866, 1061.0397, 815.6399]
2025-08-07 04:58:37,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 149.0, 189.0, 204.0, 152.0, 202.0, 149.0, 147.0, 204.0, 158.0]
2025-08-07 04:58:37,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (854.97) for latency ExtremeSparseL4U32
2025-08-07 04:58:37,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 43 minutes, 15 seconds)
2025-08-07 05:00:44,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:00:47,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 847.49298 ± 55.514
2025-08-07 05:00:47,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [821.2456, 886.04694, 731.4188, 821.86163, 884.2156, 858.799, 863.974, 803.2217, 852.8476, 951.2989]
2025-08-07 05:00:47,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 177.0, 134.0, 154.0, 165.0, 168.0, 162.0, 155.0, 165.0, 180.0]
2025-08-07 05:00:47,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 41 minutes, 12 seconds)
2025-08-07 05:02:52,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:02:56,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 876.72705 ± 77.320
2025-08-07 05:02:56,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [790.51575, 932.4605, 798.91656, 789.79193, 837.4151, 942.1764, 926.4114, 793.839, 976.48914, 979.2553]
2025-08-07 05:02:56,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [151.0, 182.0, 157.0, 149.0, 158.0, 185.0, 176.0, 156.0, 186.0, 187.0]
2025-08-07 05:02:56,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (876.73) for latency ExtremeSparseL4U32
2025-08-07 05:02:56,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 38 minutes, 53 seconds)
2025-08-07 05:05:03,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:05:07,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 828.51044 ± 67.946
2025-08-07 05:05:07,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [765.52545, 811.12524, 703.38544, 912.7449, 801.2979, 909.60956, 830.5826, 789.82544, 830.9031, 930.1045]
2025-08-07 05:05:07,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 166.0, 137.0, 169.0, 148.0, 173.0, 160.0, 153.0, 174.0, 183.0]
2025-08-07 05:05:07,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 36 minutes, 52 seconds)
2025-08-07 05:07:13,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:07:16,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 824.30145 ± 60.834
2025-08-07 05:07:16,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [777.1159, 853.3541, 690.18805, 893.8743, 866.07935, 769.9997, 817.60376, 895.81134, 859.4417, 819.5467]
2025-08-07 05:07:16,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 168.0, 138.0, 168.0, 162.0, 152.0, 156.0, 168.0, 158.0, 147.0]
2025-08-07 05:07:16,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 34 minutes, 35 seconds)
2025-08-07 05:09:23,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:09:26,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 798.84753 ± 62.135
2025-08-07 05:09:26,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [773.2117, 780.9548, 657.6516, 764.50775, 784.00684, 824.06213, 864.8351, 819.2039, 901.28235, 818.76]
2025-08-07 05:09:26,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 147.0, 129.0, 139.0, 145.0, 150.0, 179.0, 151.0, 167.0, 147.0]
2025-08-07 05:09:26,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 32 minutes, 25 seconds)
2025-08-07 05:11:31,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:11:34,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 833.42950 ± 48.734
2025-08-07 05:11:34,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [753.0112, 764.3283, 844.55804, 798.0497, 886.5009, 862.06464, 848.73846, 842.69183, 817.68896, 916.66235]
2025-08-07 05:11:34,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 139.0, 163.0, 148.0, 169.0, 161.0, 162.0, 156.0, 153.0, 178.0]
2025-08-07 05:11:34,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 30 minutes, 11 seconds)
2025-08-07 05:13:41,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:13:44,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 760.07806 ± 61.112
2025-08-07 05:13:44,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [790.18286, 701.26196, 659.03876, 723.0732, 851.4971, 839.3236, 760.3043, 732.8455, 717.9737, 825.27997]
2025-08-07 05:13:44,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 138.0, 127.0, 137.0, 162.0, 155.0, 143.0, 139.0, 127.0, 154.0]
2025-08-07 05:13:44,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 28 minutes, 5 seconds)
2025-08-07 05:15:50,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:15:53,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 765.06030 ± 97.392
2025-08-07 05:15:53,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [819.4778, 860.0254, 873.15283, 589.27264, 762.3422, 847.8407, 727.8397, 760.09845, 816.7847, 593.76843]
2025-08-07 05:15:53,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [150.0, 160.0, 163.0, 111.0, 151.0, 159.0, 138.0, 144.0, 156.0, 112.0]
2025-08-07 05:15:53,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 25 minutes, 51 seconds)
2025-08-07 05:17:58,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:18:01,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 867.75598 ± 99.950
2025-08-07 05:18:01,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [988.89233, 858.46405, 1060.2676, 742.31683, 899.5794, 906.2359, 769.2417, 895.66693, 737.13245, 819.76196]
2025-08-07 05:18:01,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [186.0, 157.0, 201.0, 141.0, 165.0, 177.0, 138.0, 161.0, 143.0, 154.0]
2025-08-07 05:18:01,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 23 minutes, 40 seconds)
2025-08-07 05:20:07,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:20:10,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 825.52069 ± 79.947
2025-08-07 05:20:10,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [910.0741, 671.4544, 823.95874, 800.4113, 841.2838, 841.91077, 982.8796, 796.8863, 747.3197, 839.0282]
2025-08-07 05:20:10,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [171.0, 126.0, 153.0, 149.0, 155.0, 157.0, 192.0, 143.0, 135.0, 155.0]
2025-08-07 05:20:10,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 21 minutes, 29 seconds)
2025-08-07 05:22:16,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:22:19,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 826.82684 ± 71.318
2025-08-07 05:22:19,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [881.1095, 935.4826, 761.7363, 852.2399, 891.37244, 763.3894, 718.925, 762.0855, 905.9085, 796.0192]
2025-08-07 05:22:19,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [163.0, 174.0, 143.0, 159.0, 167.0, 147.0, 135.0, 156.0, 170.0, 149.0]
2025-08-07 05:22:19,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 19 minutes, 21 seconds)
2025-08-07 05:24:25,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:24:29,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 817.12122 ± 60.827
2025-08-07 05:24:29,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [828.70355, 723.149, 858.4205, 814.94604, 848.2666, 787.0648, 701.7996, 912.5562, 840.4789, 855.8272]
2025-08-07 05:24:29,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [150.0, 126.0, 164.0, 150.0, 151.0, 145.0, 135.0, 175.0, 159.0, 162.0]
2025-08-07 05:24:29,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 17 minutes, 11 seconds)
2025-08-07 05:26:34,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:26:37,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 736.42346 ± 82.383
2025-08-07 05:26:37,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [786.7174, 731.55896, 745.4445, 795.192, 851.01654, 671.1101, 731.60925, 535.1136, 720.26904, 796.2025]
2025-08-07 05:26:37,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [151.0, 144.0, 143.0, 150.0, 160.0, 126.0, 143.0, 108.0, 143.0, 159.0]
2025-08-07 05:26:37,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 1 second)
2025-08-07 05:28:44,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:28:47,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 811.04456 ± 74.463
2025-08-07 05:28:47,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [825.536, 772.58984, 809.32635, 697.55817, 794.3657, 993.0693, 864.2619, 793.90826, 814.5729, 745.258]
2025-08-07 05:28:47,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [156.0, 145.0, 149.0, 138.0, 149.0, 185.0, 176.0, 143.0, 153.0, 134.0]
2025-08-07 05:28:47,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 55 seconds)
2025-08-07 05:30:52,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:30:55,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 768.04175 ± 74.620
2025-08-07 05:30:55,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [726.3935, 798.5514, 803.1206, 764.42365, 663.6946, 948.3168, 781.7689, 749.50024, 679.4716, 765.1764]
2025-08-07 05:30:55,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 144.0, 162.0, 153.0, 120.0, 174.0, 149.0, 150.0, 126.0, 153.0]
2025-08-07 05:30:55,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 44 seconds)
2025-08-07 05:33:00,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:33:03,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 823.08801 ± 100.113
2025-08-07 05:33:03,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [730.4314, 841.2648, 879.8227, 902.2278, 804.6384, 859.42175, 792.76086, 936.6411, 904.8555, 578.8158]
2025-08-07 05:33:03,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 156.0, 166.0, 171.0, 144.0, 159.0, 144.0, 175.0, 181.0, 110.0]
2025-08-07 05:33:03,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 35 seconds)
2025-08-07 05:35:09,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:35:12,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 876.88641 ± 69.036
2025-08-07 05:35:12,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [841.79315, 809.62354, 1042.9519, 897.1707, 953.4234, 819.6642, 844.81934, 885.4893, 852.6636, 821.2655]
2025-08-07 05:35:12,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 152.0, 192.0, 164.0, 181.0, 151.0, 155.0, 175.0, 160.0, 154.0]
2025-08-07 05:35:12,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (876.89) for latency ExtremeSparseL4U32
2025-08-07 05:35:12,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 26 seconds)
2025-08-07 05:37:19,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:37:22,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 779.00476 ± 114.187
2025-08-07 05:37:22,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [484.73, 892.86005, 737.90485, 780.8107, 896.10834, 733.4214, 847.08435, 779.853, 871.38525, 765.889]
2025-08-07 05:37:22,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 175.0, 155.0, 154.0, 163.0, 147.0, 160.0, 141.0, 168.0, 145.0]
2025-08-07 05:37:22,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 18 seconds)
2025-08-07 05:39:28,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:39:31,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 855.69580 ± 38.817
2025-08-07 05:39:31,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [853.97125, 822.2865, 883.74554, 803.3444, 837.5982, 935.56665, 895.40845, 815.55334, 838.5592, 870.9249]
2025-08-07 05:39:31,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [156.0, 149.0, 161.0, 150.0, 164.0, 187.0, 176.0, 154.0, 158.0, 159.0]
2025-08-07 05:39:31,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 8 seconds)
2025-08-07 05:41:37,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:41:40,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 825.00537 ± 60.793
2025-08-07 05:41:40,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [821.7242, 829.0935, 836.2317, 968.0374, 745.4349, 831.57275, 874.063, 790.7311, 747.4237, 805.7412]
2025-08-07 05:41:40,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 152.0, 154.0, 188.0, 149.0, 151.0, 162.0, 143.0, 138.0, 147.0]
2025-08-07 05:41:41,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1251 [DEBUG]: Training session finished
