2025-08-07 06:49:47,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc0-humanoid/ExtremeClogL1U23-bpql-mem24
2025-08-07 06:49:47,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc0-humanoid/ExtremeClogL1U23-bpql-mem24
2025-08-07 06:49:47,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x146206f586d0>}
2025-08-07 06:49:47,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1111 [DEBUG]: using device: cuda
2025-08-07 06:49:47,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1133 [INFO]: Creating new trainer
2025-08-07 06:49:47,756 baseline-bpql-noiseperc0-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=784, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-08-07 06:49:47,757 baseline-bpql-noiseperc0-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 06:49:49,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1194 [DEBUG]: Starting training session...
2025-08-07 06:49:49,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 1/100
2025-08-07 06:51:40,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:51:42,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 383.10266 ± 70.475
2025-08-07 06:51:42,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [377.19482, 419.22375, 305.91943, 539.2594, 313.38974, 415.50778, 339.83173, 391.53888, 433.30255, 295.8586]
2025-08-07 06:51:42,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 82.0, 58.0, 105.0, 60.0, 80.0, 64.0, 74.0, 85.0, 56.0]
2025-08-07 06:51:42,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (383.10) for latency ExtremeClogL1U23
2025-08-07 06:51:42,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 5 minutes, 51 seconds)
2025-08-07 06:53:41,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:53:43,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 442.00058 ± 31.793
2025-08-07 06:53:43,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [413.04077, 456.3764, 467.51996, 412.59305, 437.4796, 395.2448, 408.77765, 478.56946, 456.5665, 493.83783]
2025-08-07 06:53:43,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 87.0, 90.0, 77.0, 83.0, 76.0, 77.0, 91.0, 87.0, 95.0]
2025-08-07 06:53:43,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (442.00) for latency ExtremeClogL1U23
2025-08-07 06:53:43,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 10 minutes, 44 seconds)
2025-08-07 06:55:42,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:55:43,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 436.44891 ± 98.736
2025-08-07 06:55:43,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [459.21783, 330.85278, 466.03574, 561.87537, 459.90662, 540.72284, 313.0815, 283.57852, 564.8489, 384.36893]
2025-08-07 06:55:43,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 64.0, 100.0, 108.0, 85.0, 110.0, 64.0, 59.0, 118.0, 72.0]
2025-08-07 06:55:43,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 10 minutes, 40 seconds)
2025-08-07 06:57:43,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:57:45,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 436.54175 ± 58.350
2025-08-07 06:57:45,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [407.40787, 423.261, 402.52005, 343.16556, 545.7454, 483.67343, 510.85693, 395.3369, 454.8747, 398.5753]
2025-08-07 06:57:45,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 77.0, 79.0, 67.0, 104.0, 91.0, 97.0, 74.0, 84.0, 75.0]
2025-08-07 06:57:45,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 10 minutes, 13 seconds)
2025-08-07 06:59:44,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:59:46,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 455.89468 ± 61.580
2025-08-07 06:59:46,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [353.12665, 466.26187, 416.4122, 493.8602, 545.6789, 431.07407, 440.57672, 534.5798, 504.0947, 373.28195]
2025-08-07 06:59:46,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 88.0, 78.0, 94.0, 101.0, 94.0, 82.0, 104.0, 95.0, 69.0]
2025-08-07 06:59:46,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (455.89) for latency ExtremeClogL1U23
2025-08-07 06:59:46,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 8 minutes, 53 seconds)
2025-08-07 07:01:46,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:01:48,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 484.27164 ± 87.689
2025-08-07 07:01:48,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [675.23694, 421.5008, 435.5635, 435.8713, 418.20947, 403.04608, 419.92145, 484.7646, 559.19257, 589.4097]
2025-08-07 07:01:48,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 80.0, 86.0, 94.0, 88.0, 75.0, 81.0, 101.0, 112.0, 119.0]
2025-08-07 07:01:48,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (484.27) for latency ExtremeClogL1U23
2025-08-07 07:01:48,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 9 minutes, 52 seconds)
2025-08-07 07:03:48,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:03:49,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 468.90665 ± 56.011
2025-08-07 07:03:49,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [460.7417, 578.8716, 442.47363, 430.2942, 399.49954, 487.99158, 407.7346, 558.2889, 460.1396, 463.03128]
2025-08-07 07:03:49,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 110.0, 87.0, 87.0, 87.0, 92.0, 89.0, 109.0, 92.0, 85.0]
2025-08-07 07:03:49,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 8 minutes, 6 seconds)
2025-08-07 07:05:48,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:05:50,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 571.69690 ± 117.537
2025-08-07 07:05:50,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [821.1067, 655.7532, 487.57404, 626.3275, 446.59647, 570.36725, 464.40204, 542.3923, 673.61035, 428.83923]
2025-08-07 07:05:50,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [167.0, 126.0, 93.0, 134.0, 83.0, 106.0, 88.0, 106.0, 128.0, 81.0]
2025-08-07 07:05:50,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (571.70) for latency ExtremeClogL1U23
2025-08-07 07:05:50,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 6 minutes, 14 seconds)
2025-08-07 07:07:51,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:07:53,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 518.98181 ± 87.328
2025-08-07 07:07:53,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [543.2605, 365.28244, 521.7051, 600.6354, 512.53235, 492.82602, 670.4409, 532.44775, 569.4674, 381.2207]
2025-08-07 07:07:53,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 73.0, 98.0, 124.0, 98.0, 94.0, 136.0, 107.0, 107.0, 77.0]
2025-08-07 07:07:53,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 4 minutes, 28 seconds)
2025-08-07 07:09:52,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:09:54,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 532.71649 ± 111.005
2025-08-07 07:09:54,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [496.39737, 469.09613, 449.53757, 771.6543, 376.45197, 622.8212, 580.9175, 483.22665, 629.26935, 447.79263]
2025-08-07 07:09:54,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 90.0, 92.0, 148.0, 77.0, 119.0, 112.0, 102.0, 121.0, 88.0]
2025-08-07 07:09:54,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 2 minutes, 34 seconds)
2025-08-07 07:11:54,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:11:56,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 543.87695 ± 61.210
2025-08-07 07:11:56,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [571.17175, 491.12344, 621.57336, 577.5739, 462.66513, 639.4345, 491.70227, 580.8192, 542.1987, 460.50708]
2025-08-07 07:11:56,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 94.0, 117.0, 106.0, 87.0, 122.0, 94.0, 111.0, 99.0, 86.0]
2025-08-07 07:11:56,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 30 seconds)
2025-08-07 07:13:56,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:13:58,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 573.96582 ± 83.762
2025-08-07 07:13:58,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [515.6555, 510.18134, 607.83417, 750.3127, 528.30865, 480.8464, 561.3552, 697.8421, 509.3776, 577.9448]
2025-08-07 07:13:58,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 105.0, 114.0, 148.0, 102.0, 88.0, 108.0, 134.0, 98.0, 109.0]
2025-08-07 07:13:58,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (573.97) for latency ExtremeClogL1U23
2025-08-07 07:13:58,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 58 minutes, 35 seconds)
2025-08-07 07:15:59,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:16:01,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 619.64771 ± 69.376
2025-08-07 07:16:01,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [602.2017, 596.7146, 481.16312, 659.92065, 576.30927, 649.6652, 754.27765, 565.5095, 662.8976, 647.81805]
2025-08-07 07:16:01,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 120.0, 97.0, 132.0, 115.0, 131.0, 143.0, 109.0, 133.0, 128.0]
2025-08-07 07:16:01,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (619.65) for latency ExtremeClogL1U23
2025-08-07 07:16:01,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 57 minutes, 5 seconds)
2025-08-07 07:18:01,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:18:03,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 603.51697 ± 69.780
2025-08-07 07:18:03,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [616.3583, 441.90674, 646.19183, 676.18695, 661.571, 667.5687, 586.44916, 610.34204, 517.28467, 611.3107]
2025-08-07 07:18:03,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 83.0, 122.0, 132.0, 126.0, 127.0, 110.0, 113.0, 105.0, 116.0]
2025-08-07 07:18:03,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 55 minutes, 1 second)
2025-08-07 07:20:05,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:20:07,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 603.44165 ± 114.888
2025-08-07 07:20:07,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [478.7947, 623.2089, 546.9283, 569.78455, 483.76773, 792.9051, 505.95227, 700.87897, 797.75946, 534.4365]
2025-08-07 07:20:07,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 133.0, 102.0, 105.0, 91.0, 152.0, 105.0, 147.0, 160.0, 100.0]
2025-08-07 07:20:07,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 53 minutes, 30 seconds)
2025-08-07 07:22:07,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:22:08,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 554.87207 ± 96.383
2025-08-07 07:22:08,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [809.3601, 505.23627, 514.77716, 478.50653, 482.55283, 600.19604, 580.9081, 537.538, 581.1035, 458.5418]
2025-08-07 07:22:08,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [168.0, 95.0, 106.0, 95.0, 90.0, 110.0, 109.0, 115.0, 108.0, 98.0]
2025-08-07 07:22:08,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 51 minutes, 25 seconds)
2025-08-07 07:24:08,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:24:10,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 589.21082 ± 147.522
2025-08-07 07:24:10,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [526.38245, 860.9857, 505.575, 576.08795, 503.90588, 418.1867, 462.26227, 536.3026, 862.3927, 640.02716]
2025-08-07 07:24:10,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 164.0, 103.0, 112.0, 109.0, 81.0, 93.0, 110.0, 173.0, 122.0]
2025-08-07 07:24:10,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 49 minutes, 20 seconds)
2025-08-07 07:26:11,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:26:13,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 604.44208 ± 102.616
2025-08-07 07:26:13,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [620.258, 571.99146, 461.57117, 731.41364, 484.29788, 616.0107, 565.0502, 594.1453, 571.3187, 828.3635]
2025-08-07 07:26:13,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 108.0, 86.0, 151.0, 105.0, 113.0, 109.0, 129.0, 106.0, 157.0]
2025-08-07 07:26:13,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 47 minutes, 12 seconds)
2025-08-07 07:28:13,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:28:15,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 588.68591 ± 101.334
2025-08-07 07:28:15,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [583.6884, 557.4527, 490.8483, 610.73175, 791.85895, 573.3846, 573.1495, 597.103, 708.3334, 400.308]
2025-08-07 07:28:15,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 108.0, 93.0, 116.0, 148.0, 107.0, 108.0, 111.0, 135.0, 84.0]
2025-08-07 07:28:15,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 45 minutes, 11 seconds)
2025-08-07 07:30:16,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:30:18,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 592.61218 ± 164.222
2025-08-07 07:30:18,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [748.8062, 445.70477, 507.56174, 533.5381, 637.80115, 441.28674, 431.10052, 486.0205, 949.6624, 744.63916]
2025-08-07 07:30:18,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 94.0, 105.0, 101.0, 122.0, 85.0, 92.0, 104.0, 183.0, 144.0]
2025-08-07 07:30:18,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 43 minutes, 2 seconds)
2025-08-07 07:32:18,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:32:20,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 638.05646 ± 140.798
2025-08-07 07:32:20,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [634.04474, 790.29913, 574.62695, 636.95966, 535.4263, 597.60315, 994.5572, 499.77066, 555.2088, 562.0672]
2025-08-07 07:32:20,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 152.0, 106.0, 119.0, 100.0, 116.0, 193.0, 107.0, 117.0, 104.0]
2025-08-07 07:32:20,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (638.06) for latency ExtremeClogL1U23
2025-08-07 07:32:20,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 41 minutes, 6 seconds)
2025-08-07 07:34:21,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:34:23,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 610.75482 ± 65.902
2025-08-07 07:34:23,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [586.51666, 543.1717, 557.1271, 682.0514, 650.29175, 704.1893, 613.8029, 548.8924, 520.7898, 700.7154]
2025-08-07 07:34:23,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 112.0, 103.0, 146.0, 122.0, 153.0, 115.0, 107.0, 100.0, 132.0]
2025-08-07 07:34:23,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 39 minutes, 17 seconds)
2025-08-07 07:36:24,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:36:26,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 644.56250 ± 181.339
2025-08-07 07:36:26,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [403.75015, 731.0267, 434.11557, 760.33716, 571.49603, 698.684, 988.17596, 410.5254, 646.22003, 801.2942]
2025-08-07 07:36:26,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 143.0, 92.0, 157.0, 110.0, 130.0, 184.0, 87.0, 120.0, 170.0]
2025-08-07 07:36:26,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (644.56) for latency ExtremeClogL1U23
2025-08-07 07:36:26,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 37 minutes, 21 seconds)
2025-08-07 07:38:26,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:38:28,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 622.60010 ± 65.058
2025-08-07 07:38:28,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [690.23035, 525.54535, 663.5022, 755.6246, 595.81226, 575.52136, 590.53644, 556.6422, 648.02795, 624.5584]
2025-08-07 07:38:28,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 97.0, 125.0, 145.0, 114.0, 122.0, 128.0, 102.0, 122.0, 133.0]
2025-08-07 07:38:28,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 35 minutes, 11 seconds)
2025-08-07 07:40:29,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:40:31,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 601.21887 ± 115.006
2025-08-07 07:40:31,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [639.90326, 550.9161, 524.9102, 577.01086, 814.3302, 815.06573, 467.17654, 575.7696, 522.4105, 524.6952]
2025-08-07 07:40:31,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 106.0, 110.0, 124.0, 170.0, 175.0, 90.0, 111.0, 110.0, 111.0]
2025-08-07 07:40:31,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 33 minutes, 18 seconds)
2025-08-07 07:42:32,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:42:34,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 600.22302 ± 199.178
2025-08-07 07:42:34,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [477.38864, 486.5159, 1107.7765, 580.5107, 756.395, 667.94855, 595.1755, 464.282, 493.8346, 372.40323]
2025-08-07 07:42:34,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 98.0, 212.0, 124.0, 148.0, 124.0, 116.0, 96.0, 109.0, 77.0]
2025-08-07 07:42:34,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 31 minutes, 17 seconds)
2025-08-07 07:44:34,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:44:36,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 633.82758 ± 129.386
2025-08-07 07:44:36,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [708.71936, 698.7835, 451.52084, 525.3178, 696.1331, 685.86334, 784.5298, 824.86383, 491.94354, 470.6012]
2025-08-07 07:44:36,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [143.0, 141.0, 91.0, 111.0, 132.0, 128.0, 150.0, 156.0, 101.0, 103.0]
2025-08-07 07:44:36,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 29 minutes, 13 seconds)
2025-08-07 07:46:37,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:46:39,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 611.91479 ± 153.807
2025-08-07 07:46:39,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [519.97015, 824.02264, 448.93674, 598.6328, 702.9171, 538.1392, 445.26974, 550.5367, 551.05994, 939.6625]
2025-08-07 07:46:39,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 158.0, 92.0, 110.0, 151.0, 116.0, 95.0, 105.0, 119.0, 202.0]
2025-08-07 07:46:39,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 27 minutes, 14 seconds)
2025-08-07 07:48:41,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:48:43,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 687.39148 ± 143.884
2025-08-07 07:48:43,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [612.9935, 951.34955, 979.6968, 577.9238, 614.69165, 620.9247, 616.03204, 699.4436, 646.9379, 553.92126]
2025-08-07 07:48:43,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 180.0, 204.0, 111.0, 133.0, 115.0, 115.0, 132.0, 128.0, 104.0]
2025-08-07 07:48:43,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (687.39) for latency ExtremeClogL1U23
2025-08-07 07:48:43,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 25 minutes, 37 seconds)
2025-08-07 07:50:45,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:50:47,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 669.50378 ± 138.395
2025-08-07 07:50:47,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [778.8039, 997.04175, 543.0874, 653.4718, 767.16705, 667.36395, 539.98, 518.1402, 593.8582, 636.1239]
2025-08-07 07:50:47,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 186.0, 103.0, 140.0, 140.0, 122.0, 119.0, 108.0, 118.0, 132.0]
2025-08-07 07:50:47,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 23 minutes, 42 seconds)
2025-08-07 07:52:46,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:52:49,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 664.95923 ± 91.686
2025-08-07 07:52:49,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [602.36957, 738.86554, 863.0912, 588.87506, 716.0567, 581.6767, 542.211, 672.3092, 622.72974, 721.4078]
2025-08-07 07:52:49,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 145.0, 182.0, 109.0, 136.0, 124.0, 104.0, 125.0, 127.0, 149.0]
2025-08-07 07:52:49,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 21 minutes, 26 seconds)
2025-08-07 07:54:49,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:54:51,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 749.24023 ± 102.854
2025-08-07 07:54:51,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [646.11676, 813.11444, 663.02155, 629.2842, 859.8903, 927.90906, 861.3704, 648.6031, 700.9123, 742.1808]
2025-08-07 07:54:51,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 152.0, 124.0, 117.0, 175.0, 172.0, 167.0, 120.0, 130.0, 142.0]
2025-08-07 07:54:51,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (749.24) for latency ExtremeClogL1U23
2025-08-07 07:54:51,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 19 minutes, 26 seconds)
2025-08-07 07:56:50,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:56:53,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 762.79266 ± 105.596
2025-08-07 07:56:53,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [721.8027, 676.55963, 935.65393, 937.9324, 682.12555, 832.7431, 789.9549, 644.8907, 637.2695, 768.99426]
2025-08-07 07:56:53,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 125.0, 177.0, 181.0, 140.0, 161.0, 164.0, 123.0, 130.0, 154.0]
2025-08-07 07:56:53,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (762.79) for latency ExtremeClogL1U23
2025-08-07 07:56:53,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 16 minutes, 57 seconds)
2025-08-07 07:58:52,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:58:55,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 674.43097 ± 96.772
2025-08-07 07:58:55,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [731.71387, 613.23944, 864.9448, 600.71674, 776.66364, 725.682, 514.4806, 633.12964, 676.546, 607.1922]
2025-08-07 07:58:55,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 127.0, 160.0, 113.0, 138.0, 136.0, 97.0, 122.0, 125.0, 116.0]
2025-08-07 07:58:55,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 14 minutes, 32 seconds)
2025-08-07 08:00:54,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:00:57,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 786.01416 ± 144.989
2025-08-07 08:00:57,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [730.00287, 654.27075, 1066.6628, 737.42255, 965.9568, 599.6839, 856.3242, 633.9826, 728.90894, 886.92584]
2025-08-07 08:00:57,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 134.0, 195.0, 133.0, 181.0, 111.0, 163.0, 119.0, 136.0, 177.0]
2025-08-07 08:00:57,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (786.01) for latency ExtremeClogL1U23
2025-08-07 08:00:57,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 12 minutes, 4 seconds)
2025-08-07 08:02:55,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:02:58,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 693.01904 ± 160.818
2025-08-07 08:02:58,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [469.66498, 753.21893, 574.78314, 686.33826, 756.23553, 776.73315, 550.9135, 627.7611, 649.4343, 1085.1077]
2025-08-07 08:02:58,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 152.0, 114.0, 126.0, 137.0, 150.0, 110.0, 133.0, 137.0, 205.0]
2025-08-07 08:02:58,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 9 minutes, 57 seconds)
2025-08-07 08:04:57,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:04:59,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 655.92566 ± 87.758
2025-08-07 08:04:59,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [489.78824, 627.5025, 806.1378, 593.0593, 572.916, 715.0579, 659.0185, 634.1843, 732.8868, 728.7051]
2025-08-07 08:04:59,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 132.0, 169.0, 115.0, 116.0, 144.0, 129.0, 123.0, 140.0, 132.0]
2025-08-07 08:04:59,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 7 minutes, 38 seconds)
2025-08-07 08:06:57,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:07:00,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 767.91211 ± 116.892
2025-08-07 08:07:00,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [664.95056, 643.621, 1007.7133, 742.6591, 794.3562, 650.49963, 654.56476, 857.01526, 764.81396, 898.92725]
2025-08-07 08:07:00,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 135.0, 193.0, 151.0, 170.0, 120.0, 130.0, 179.0, 153.0, 190.0]
2025-08-07 08:07:00,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 5 minutes, 30 seconds)
2025-08-07 08:08:59,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:09:01,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 746.49329 ± 103.588
2025-08-07 08:09:01,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [829.7852, 581.7501, 730.7507, 902.142, 728.1924, 717.1258, 727.357, 924.1433, 687.319, 636.367]
2025-08-07 08:09:01,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [157.0, 117.0, 137.0, 175.0, 136.0, 136.0, 141.0, 178.0, 125.0, 122.0]
2025-08-07 08:09:02,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 3 minutes, 23 seconds)
2025-08-07 08:11:02,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:11:05,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 826.79559 ± 161.149
2025-08-07 08:11:05,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [814.93115, 834.1872, 841.568, 801.19257, 747.573, 663.7927, 816.3081, 892.69, 610.9207, 1244.7921]
2025-08-07 08:11:05,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [174.0, 157.0, 174.0, 148.0, 149.0, 137.0, 158.0, 172.0, 122.0, 235.0]
2025-08-07 08:11:05,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (826.80) for latency ExtremeClogL1U23
2025-08-07 08:11:05,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 1 minute, 38 seconds)
2025-08-07 08:13:02,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:13:05,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 784.92139 ± 107.166
2025-08-07 08:13:05,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [827.5708, 561.9644, 688.38074, 759.0573, 864.66296, 774.58484, 793.6067, 805.57153, 773.86127, 999.95325]
2025-08-07 08:13:05,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 120.0, 141.0, 139.0, 163.0, 149.0, 168.0, 156.0, 143.0, 190.0]
2025-08-07 08:13:05,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 59 minutes, 23 seconds)
2025-08-07 08:15:05,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:15:08,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 749.35175 ± 128.132
2025-08-07 08:15:08,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [646.43414, 612.777, 838.36835, 807.77496, 603.2174, 910.9765, 800.1569, 792.3095, 551.61536, 929.8875]
2025-08-07 08:15:08,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 111.0, 161.0, 156.0, 114.0, 176.0, 153.0, 168.0, 113.0, 169.0]
2025-08-07 08:15:08,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 57 minutes, 36 seconds)
2025-08-07 08:17:05,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:17:08,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 757.16052 ± 135.397
2025-08-07 08:17:08,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [762.18866, 965.5742, 594.70685, 704.27686, 835.68506, 954.8675, 744.277, 631.5219, 547.39636, 831.1115]
2025-08-07 08:17:08,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 180.0, 121.0, 141.0, 162.0, 202.0, 150.0, 128.0, 110.0, 156.0]
2025-08-07 08:17:08,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 55 minutes, 28 seconds)
2025-08-07 08:19:07,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:19:09,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 755.63513 ± 170.195
2025-08-07 08:19:09,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [776.9672, 747.96967, 467.32626, 911.2787, 678.8203, 681.94934, 1025.2727, 1005.308, 676.7112, 584.74774]
2025-08-07 08:19:09,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 158.0, 102.0, 168.0, 139.0, 136.0, 198.0, 185.0, 125.0, 121.0]
2025-08-07 08:19:09,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 53 minutes, 29 seconds)
2025-08-07 08:21:08,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:21:11,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 801.16736 ± 166.460
2025-08-07 08:21:11,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [791.78766, 1050.2653, 611.157, 674.9852, 840.21674, 650.08856, 1133.5162, 862.53156, 739.185, 657.94006]
2025-08-07 08:21:11,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 197.0, 115.0, 127.0, 153.0, 122.0, 197.0, 171.0, 135.0, 118.0]
2025-08-07 08:21:11,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 51 minutes, 5 seconds)
2025-08-07 08:23:10,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:23:12,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 792.34070 ± 133.821
2025-08-07 08:23:12,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [916.13617, 941.2941, 777.7525, 645.28046, 722.80054, 771.91705, 648.65753, 1069.2375, 766.3914, 663.93945]
2025-08-07 08:23:12,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [164.0, 179.0, 144.0, 124.0, 156.0, 153.0, 136.0, 213.0, 146.0, 122.0]
2025-08-07 08:23:12,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 49 minutes, 20 seconds)
2025-08-07 08:25:12,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:25:15,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 814.37079 ± 173.438
2025-08-07 08:25:15,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [903.66473, 1180.4877, 712.65247, 676.7878, 915.69794, 869.0476, 966.959, 635.5901, 665.7505, 617.0701]
2025-08-07 08:25:15,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [171.0, 218.0, 137.0, 140.0, 165.0, 171.0, 173.0, 119.0, 135.0, 129.0]
2025-08-07 08:25:15,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 47 minutes, 16 seconds)
2025-08-07 08:27:14,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:27:17,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 806.32788 ± 136.179
2025-08-07 08:27:17,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [639.1852, 747.7403, 1124.1089, 957.155, 845.7847, 653.5054, 795.8805, 752.9008, 760.95026, 786.06805]
2025-08-07 08:27:17,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 133.0, 237.0, 182.0, 153.0, 142.0, 162.0, 138.0, 146.0, 155.0]
2025-08-07 08:27:17,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 45 minutes, 33 seconds)
2025-08-07 08:29:15,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:29:18,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 885.70251 ± 212.905
2025-08-07 08:29:18,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [510.6913, 615.4399, 1134.8828, 1214.3885, 896.7177, 973.22906, 697.0283, 907.65674, 862.859, 1044.1313]
2025-08-07 08:29:18,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 128.0, 193.0, 239.0, 184.0, 179.0, 138.0, 186.0, 162.0, 187.0]
2025-08-07 08:29:18,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (885.70) for latency ExtremeClogL1U23
2025-08-07 08:29:18,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 43 minutes, 23 seconds)
2025-08-07 08:31:20,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:31:22,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 808.96509 ± 162.912
2025-08-07 08:31:22,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [837.74554, 727.52734, 698.21466, 700.1948, 1221.6694, 641.0647, 713.0864, 925.3885, 891.6317, 733.1275]
2025-08-07 08:31:22,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [155.0, 130.0, 126.0, 149.0, 239.0, 135.0, 134.0, 187.0, 171.0, 154.0]
2025-08-07 08:31:22,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 41 minutes, 53 seconds)
2025-08-07 08:33:20,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:33:23,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 849.65466 ± 127.871
2025-08-07 08:33:23,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [932.96375, 677.4669, 1029.567, 670.2101, 994.9128, 759.4735, 803.48456, 750.36993, 989.7831, 888.31525]
2025-08-07 08:33:23,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [186.0, 129.0, 206.0, 140.0, 190.0, 156.0, 149.0, 152.0, 189.0, 160.0]
2025-08-07 08:33:23,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 39 minutes, 44 seconds)
2025-08-07 08:35:22,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:35:25,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 883.12177 ± 209.340
2025-08-07 08:35:25,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [982.2873, 693.81995, 924.72156, 650.35626, 619.4868, 1164.7242, 1259.1648, 1013.17896, 767.24084, 756.2371]
2025-08-07 08:35:25,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [176.0, 150.0, 192.0, 127.0, 113.0, 234.0, 239.0, 201.0, 161.0, 164.0]
2025-08-07 08:35:25,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 37 minutes, 42 seconds)
2025-08-07 08:37:23,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:37:26,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 939.15442 ± 197.420
2025-08-07 08:37:26,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1033.6168, 750.29315, 584.51776, 988.15234, 856.8558, 1340.0388, 777.3408, 1068.6677, 1002.2851, 989.77545]
2025-08-07 08:37:26,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [216.0, 157.0, 118.0, 178.0, 174.0, 255.0, 137.0, 213.0, 180.0, 190.0]
2025-08-07 08:37:26,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (939.15) for latency ExtremeClogL1U23
2025-08-07 08:37:27,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 35 minutes, 32 seconds)
2025-08-07 08:39:26,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:39:28,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 768.61047 ± 137.592
2025-08-07 08:39:28,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [676.94025, 809.5099, 731.08307, 646.1317, 730.3978, 967.4146, 808.14844, 959.98114, 494.77002, 861.7277]
2025-08-07 08:39:28,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 168.0, 152.0, 121.0, 158.0, 183.0, 158.0, 180.0, 93.0, 162.0]
2025-08-07 08:39:28,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 33 minutes, 39 seconds)
2025-08-07 08:41:27,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:41:31,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 976.40540 ± 240.872
2025-08-07 08:41:31,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [753.94324, 1266.8485, 1054.8765, 768.8512, 1242.623, 859.3573, 1419.3171, 818.8556, 894.1476, 685.2334]
2025-08-07 08:41:31,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 247.0, 208.0, 140.0, 224.0, 168.0, 260.0, 144.0, 181.0, 142.0]
2025-08-07 08:41:31,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (976.41) for latency ExtremeClogL1U23
2025-08-07 08:41:31,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 31 minutes, 15 seconds)
2025-08-07 08:43:29,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:43:32,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 827.86603 ± 154.885
2025-08-07 08:43:32,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [584.7309, 1179.5426, 776.87695, 811.6433, 818.5123, 963.52075, 841.9354, 893.7311, 687.2211, 720.9462]
2025-08-07 08:43:32,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 236.0, 157.0, 166.0, 144.0, 180.0, 149.0, 192.0, 145.0, 137.0]
2025-08-07 08:43:32,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 29 minutes, 17 seconds)
2025-08-07 08:45:31,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:45:34,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 955.19788 ± 207.078
2025-08-07 08:45:34,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [663.85504, 1131.6941, 624.7277, 1187.7836, 908.4621, 901.7735, 1092.7034, 740.9987, 1193.3523, 1106.6282]
2025-08-07 08:45:34,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 212.0, 125.0, 225.0, 168.0, 172.0, 200.0, 154.0, 209.0, 201.0]
2025-08-07 08:45:34,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 27 minutes, 13 seconds)
2025-08-07 08:47:34,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:47:37,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 913.25488 ± 137.862
2025-08-07 08:47:37,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [868.12915, 766.031, 951.60724, 650.55554, 1065.1378, 888.37067, 853.0666, 1042.7938, 1140.6884, 906.16925]
2025-08-07 08:47:37,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [185.0, 162.0, 171.0, 137.0, 207.0, 160.0, 152.0, 213.0, 222.0, 188.0]
2025-08-07 08:47:37,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 25 minutes, 27 seconds)
2025-08-07 08:49:37,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:49:40,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 955.84485 ± 166.547
2025-08-07 08:49:40,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [910.2814, 1186.9598, 960.0234, 1011.2009, 625.5154, 806.071, 991.58154, 1070.8998, 1183.9625, 811.9523]
2025-08-07 08:49:40,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [180.0, 220.0, 195.0, 203.0, 130.0, 137.0, 195.0, 208.0, 234.0, 147.0]
2025-08-07 08:49:40,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 23 minutes, 34 seconds)
2025-08-07 08:51:38,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:51:41,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1049.74731 ± 252.787
2025-08-07 08:51:41,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1018.70123, 820.26, 1410.6006, 810.9999, 854.62305, 776.5137, 1448.5736, 1390.5059, 1010.1819, 956.51373]
2025-08-07 08:51:41,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [194.0, 172.0, 281.0, 158.0, 184.0, 142.0, 283.0, 282.0, 183.0, 181.0]
2025-08-07 08:51:41,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (1049.75) for latency ExtremeClogL1U23
2025-08-07 08:51:41,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 21 minutes, 26 seconds)
2025-08-07 08:53:40,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:53:43,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 937.08105 ± 201.039
2025-08-07 08:53:43,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1098.2799, 665.70105, 972.8867, 995.52765, 715.8984, 706.47595, 1363.4857, 871.1622, 936.7424, 1044.6503]
2025-08-07 08:53:43,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [204.0, 125.0, 183.0, 190.0, 145.0, 136.0, 265.0, 167.0, 195.0, 199.0]
2025-08-07 08:53:43,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 19 minutes, 27 seconds)
2025-08-07 08:55:44,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:55:47,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 904.06702 ± 148.620
2025-08-07 08:55:47,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [863.9458, 669.9579, 728.29803, 1054.4089, 1074.4412, 972.3549, 920.5023, 962.15234, 1089.0568, 705.5521]
2025-08-07 08:55:47,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [166.0, 137.0, 139.0, 197.0, 202.0, 197.0, 166.0, 184.0, 222.0, 135.0]
2025-08-07 08:55:47,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 17 minutes, 41 seconds)
2025-08-07 08:57:46,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:57:49,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 978.58234 ± 257.431
2025-08-07 08:57:49,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1169.8663, 1258.3958, 1006.6856, 815.215, 995.55566, 495.08536, 832.3763, 740.74786, 1430.717, 1041.1775]
2025-08-07 08:57:49,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [229.0, 223.0, 185.0, 151.0, 180.0, 97.0, 161.0, 143.0, 262.0, 186.0]
2025-08-07 08:57:49,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 15 minutes, 28 seconds)
2025-08-07 08:59:48,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:59:51,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 878.45197 ± 251.058
2025-08-07 08:59:51,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [861.5492, 1418.6647, 686.65607, 683.74304, 1186.7854, 700.13074, 691.78156, 889.00446, 612.2542, 1053.95]
2025-08-07 08:59:51,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [172.0, 276.0, 133.0, 135.0, 214.0, 143.0, 139.0, 152.0, 126.0, 191.0]
2025-08-07 08:59:51,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 13 minutes, 21 seconds)
2025-08-07 09:01:50,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:01:54,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1097.19067 ± 261.031
2025-08-07 09:01:54,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1679.8429, 841.2693, 896.3609, 928.7193, 1008.5485, 1079.8273, 872.6227, 1227.0543, 992.30054, 1445.3601]
2025-08-07 09:01:54,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [303.0, 163.0, 161.0, 164.0, 199.0, 207.0, 151.0, 231.0, 187.0, 268.0]
2025-08-07 09:01:54,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (1097.19) for latency ExtremeClogL1U23
2025-08-07 09:01:54,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 11 minutes, 27 seconds)
2025-08-07 09:03:53,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:03:56,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 893.89435 ± 210.629
2025-08-07 09:03:56,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [620.9974, 758.8693, 1330.1315, 727.61035, 1127.909, 924.664, 1059.2476, 817.91003, 887.77246, 683.83167]
2025-08-07 09:03:56,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 162.0, 238.0, 161.0, 230.0, 187.0, 195.0, 173.0, 160.0, 142.0]
2025-08-07 09:03:56,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 9 minutes, 30 seconds)
2025-08-07 09:05:57,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:06:00,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1150.64172 ± 473.135
2025-08-07 09:06:00,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1022.9934, 785.9244, 697.39996, 2353.4973, 1260.571, 1110.952, 724.6625, 1295.7025, 1458.8069, 795.9076]
2025-08-07 09:06:00,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [211.0, 146.0, 128.0, 447.0, 220.0, 200.0, 148.0, 251.0, 288.0, 156.0]
2025-08-07 09:06:00,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (1150.64) for latency ExtremeClogL1U23
2025-08-07 09:06:00,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 7 minutes, 25 seconds)
2025-08-07 09:07:58,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:08:02,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1097.88770 ± 229.787
2025-08-07 09:08:02,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1530.5376, 1081.199, 1314.9587, 1213.0503, 995.4015, 725.5685, 1028.886, 1268.2916, 1026.714, 794.26996]
2025-08-07 09:08:02,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [286.0, 196.0, 268.0, 229.0, 189.0, 152.0, 202.0, 251.0, 198.0, 166.0]
2025-08-07 09:08:02,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 5 minutes, 24 seconds)
2025-08-07 09:10:01,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:10:05,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1086.45435 ± 321.457
2025-08-07 09:10:05,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [729.69305, 1385.4954, 1339.9808, 750.0309, 1337.1991, 748.7759, 1651.7961, 1044.6079, 1158.8619, 718.1019]
2025-08-07 09:10:05,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 240.0, 255.0, 153.0, 231.0, 159.0, 308.0, 202.0, 223.0, 141.0]
2025-08-07 09:10:05,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 3 minutes, 23 seconds)
2025-08-07 09:12:04,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:12:09,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1294.46509 ± 439.305
2025-08-07 09:12:09,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [778.4024, 1008.4834, 1820.1346, 1083.7018, 924.92706, 1288.5734, 1177.8175, 2274.79, 1020.23676, 1567.5853]
2025-08-07 09:12:09,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [150.0, 183.0, 347.0, 196.0, 189.0, 234.0, 211.0, 429.0, 172.0, 291.0]
2025-08-07 09:12:09,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (1294.47) for latency ExtremeClogL1U23
2025-08-07 09:12:09,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 1 minute, 27 seconds)
2025-08-07 09:14:07,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:14:10,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1045.80884 ± 385.077
2025-08-07 09:14:10,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1041.858, 1130.5033, 918.8706, 743.26184, 1071.0508, 2145.4038, 841.0941, 909.64716, 867.585, 788.8131]
2025-08-07 09:14:10,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [190.0, 206.0, 171.0, 141.0, 191.0, 395.0, 173.0, 175.0, 157.0, 141.0]
2025-08-07 09:14:10,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 59 minutes, 21 seconds)
2025-08-07 09:16:10,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:16:14,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1115.91980 ± 243.947
2025-08-07 09:16:14,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1070.2859, 876.23865, 1048.3151, 983.5088, 1114.8959, 1276.8993, 845.84845, 920.51666, 1332.8613, 1689.8278]
2025-08-07 09:16:14,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [198.0, 164.0, 192.0, 178.0, 206.0, 246.0, 151.0, 166.0, 243.0, 301.0]
2025-08-07 09:16:14,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 57 minutes, 14 seconds)
2025-08-07 09:18:13,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:18:16,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 945.31689 ± 310.168
2025-08-07 09:18:16,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1308.7915, 1497.8724, 980.7483, 619.4366, 1106.285, 722.19214, 656.55725, 697.63556, 1245.1714, 618.4799]
2025-08-07 09:18:16,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [260.0, 287.0, 202.0, 129.0, 189.0, 153.0, 133.0, 151.0, 264.0, 133.0]
2025-08-07 09:18:16,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 55 minutes, 15 seconds)
2025-08-07 09:20:15,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:20:19,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1184.42627 ± 334.121
2025-08-07 09:20:19,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1512.6381, 610.69635, 678.9855, 1290.9082, 1059.5851, 1226.9716, 1491.1324, 1216.9412, 1042.7533, 1713.6514]
2025-08-07 09:20:19,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [273.0, 123.0, 143.0, 261.0, 199.0, 221.0, 267.0, 225.0, 218.0, 330.0]
2025-08-07 09:20:19,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 53 minutes, 15 seconds)
2025-08-07 09:22:18,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:22:21,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 990.19592 ± 214.947
2025-08-07 09:22:21,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [921.39465, 1130.3854, 896.06464, 698.3959, 1232.7065, 643.58356, 905.68854, 938.26514, 1248.0231, 1287.4513]
2025-08-07 09:22:21,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 236.0, 196.0, 149.0, 240.0, 129.0, 176.0, 180.0, 226.0, 227.0]
2025-08-07 09:22:21,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 51 minutes, 1 second)
2025-08-07 09:24:21,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:24:24,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1085.10535 ± 197.798
2025-08-07 09:24:24,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1294.4967, 1154.4269, 1042.5835, 960.2263, 819.0923, 1018.04816, 1485.9807, 819.54285, 1041.8267, 1214.8295]
2025-08-07 09:24:24,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [251.0, 217.0, 180.0, 203.0, 172.0, 184.0, 281.0, 151.0, 185.0, 245.0]
2025-08-07 09:24:24,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 49 minutes, 5 seconds)
2025-08-07 09:26:21,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:26:25,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1047.03577 ± 210.601
2025-08-07 09:26:25,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1189.3458, 1281.6965, 1305.5547, 980.71857, 1184.9769, 1161.8197, 911.5819, 583.0277, 879.3055, 992.3303]
2025-08-07 09:26:25,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [206.0, 224.0, 250.0, 170.0, 227.0, 245.0, 186.0, 128.0, 180.0, 195.0]
2025-08-07 09:26:25,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 46 minutes, 50 seconds)
2025-08-07 09:28:25,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:28:29,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1121.81689 ± 257.854
2025-08-07 09:28:29,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1621.7917, 946.53485, 1370.022, 1460.5005, 1127.9097, 846.22406, 1041.4835, 923.42267, 839.98834, 1040.2916]
2025-08-07 09:28:29,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [312.0, 184.0, 240.0, 274.0, 212.0, 166.0, 197.0, 170.0, 161.0, 218.0]
2025-08-07 09:28:29,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 44 minutes, 55 seconds)
2025-08-07 09:30:29,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:30:33,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1377.72925 ± 671.721
2025-08-07 09:30:33,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1620.6393, 977.50336, 1257.7445, 992.0788, 1755.8033, 1055.65, 811.67303, 929.16223, 3200.1309, 1176.9071]
2025-08-07 09:30:33,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [301.0, 203.0, 263.0, 184.0, 352.0, 190.0, 146.0, 183.0, 565.0, 214.0]
2025-08-07 09:30:33,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (1377.73) for latency ExtremeClogL1U23
2025-08-07 09:30:33,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 42 minutes, 58 seconds)
2025-08-07 09:32:34,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:32:39,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1508.64062 ± 320.270
2025-08-07 09:32:39,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1934.7262, 1364.7806, 1438.2736, 1655.0057, 796.8552, 1956.3384, 1642.9003, 1622.8835, 1325.9238, 1348.719]
2025-08-07 09:32:39,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [332.0, 279.0, 297.0, 312.0, 144.0, 360.0, 295.0, 281.0, 233.0, 258.0]
2025-08-07 09:32:39,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (1508.64) for latency ExtremeClogL1U23
2025-08-07 09:32:39,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 41 minutes, 10 seconds)
2025-08-07 09:34:38,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:34:41,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1238.41321 ± 340.516
2025-08-07 09:34:42,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1036.6583, 1315.4597, 823.8544, 1035.8584, 1219.4084, 1223.1952, 790.15356, 1620.2457, 1974.1163, 1345.1819]
2025-08-07 09:34:42,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [197.0, 234.0, 148.0, 192.0, 222.0, 224.0, 154.0, 274.0, 382.0, 246.0]
2025-08-07 09:34:42,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 39 minutes, 6 seconds)
2025-08-07 09:36:43,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:36:46,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1119.39465 ± 204.218
2025-08-07 09:36:46,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [992.1381, 1321.9614, 1187.8881, 788.18365, 935.6238, 1108.344, 985.99994, 1423.5803, 1423.6821, 1026.5461]
2025-08-07 09:36:46,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [188.0, 249.0, 242.0, 169.0, 172.0, 206.0, 193.0, 256.0, 245.0, 192.0]
2025-08-07 09:36:47,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 37 minutes, 18 seconds)
2025-08-07 09:38:46,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:38:50,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1069.75562 ± 425.486
2025-08-07 09:38:50,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1117.1426, 1137.9957, 2256.046, 805.4975, 1207.9662, 882.79285, 693.95166, 804.2712, 905.38306, 886.50995]
2025-08-07 09:38:50,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [211.0, 229.0, 468.0, 174.0, 219.0, 154.0, 140.0, 140.0, 163.0, 159.0]
2025-08-07 09:38:50,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 35 minutes, 11 seconds)
2025-08-07 09:40:50,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:40:54,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1197.94507 ± 443.216
2025-08-07 09:40:54,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [571.01776, 1415.1401, 748.7926, 1449.0139, 1377.2665, 1046.7101, 1169.6743, 776.1603, 1215.1135, 2210.563]
2025-08-07 09:40:54,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 267.0, 148.0, 265.0, 279.0, 220.0, 218.0, 158.0, 231.0, 433.0]
2025-08-07 09:40:54,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 33 minutes, 7 seconds)
2025-08-07 09:42:52,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:42:56,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1309.61377 ± 534.181
2025-08-07 09:42:56,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1347.855, 1526.4769, 948.89075, 808.172, 2415.4785, 1673.0056, 1151.8821, 460.44824, 986.9889, 1776.9392]
2025-08-07 09:42:56,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [267.0, 274.0, 190.0, 182.0, 425.0, 318.0, 200.0, 88.0, 191.0, 334.0]
2025-08-07 09:42:56,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 30 minutes, 53 seconds)
2025-08-07 09:44:56,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:45:01,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1304.01685 ± 184.652
2025-08-07 09:45:01,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1143.88, 1402.3451, 1373.2606, 1445.5665, 1613.4204, 1009.1175, 1449.2716, 1067.3226, 1363.9244, 1172.0607]
2025-08-07 09:45:01,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [204.0, 270.0, 262.0, 255.0, 301.0, 211.0, 274.0, 216.0, 234.0, 242.0]
2025-08-07 09:45:01,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 28 minutes, 53 seconds)
2025-08-07 09:47:02,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:47:07,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1324.46948 ± 414.308
2025-08-07 09:47:07,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1034.8053, 1341.7456, 1023.53534, 635.761, 1066.3367, 1292.1876, 1563.3857, 1992.7983, 2026.452, 1267.6869]
2025-08-07 09:47:07,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [189.0, 251.0, 180.0, 134.0, 207.0, 230.0, 300.0, 388.0, 425.0, 243.0]
2025-08-07 09:47:07,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 26 minutes, 52 seconds)
2025-08-07 09:49:07,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:49:13,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1890.62146 ± 562.439
2025-08-07 09:49:13,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2159.5452, 1473.2863, 2665.1138, 1261.2273, 2532.172, 1684.4543, 2737.0227, 1159.0884, 1798.412, 1435.8921]
2025-08-07 09:49:13,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [405.0, 303.0, 493.0, 240.0, 495.0, 307.0, 502.0, 196.0, 337.0, 249.0]
2025-08-07 09:49:13,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (1890.62) for latency ExtremeClogL1U23
2025-08-07 09:49:13,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 24 minutes, 55 seconds)
2025-08-07 09:51:11,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:51:16,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1356.60864 ± 388.336
2025-08-07 09:51:16,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1228.0325, 1394.684, 1572.9594, 1074.2157, 1316.9547, 1316.2484, 2415.9116, 1172.8978, 1048.4073, 1025.7753]
2025-08-07 09:51:16,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [227.0, 280.0, 303.0, 208.0, 289.0, 264.0, 448.0, 245.0, 204.0, 206.0]
2025-08-07 09:51:16,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 47 seconds)
2025-08-07 09:53:17,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:53:22,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1586.34412 ± 584.090
2025-08-07 09:53:22,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2153.7266, 822.26654, 1001.3049, 1089.5884, 1703.8995, 1514.4835, 1364.4565, 1199.1638, 2542.5283, 2472.022]
2025-08-07 09:53:22,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [419.0, 153.0, 202.0, 231.0, 340.0, 290.0, 253.0, 244.0, 498.0, 442.0]
2025-08-07 09:53:22,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes, 51 seconds)
2025-08-07 09:55:22,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:55:26,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1251.50830 ± 257.242
2025-08-07 09:55:26,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1889.755, 984.0701, 903.6941, 1188.8787, 1337.8959, 1279.415, 1192.1937, 1074.9684, 1364.0166, 1300.1957]
2025-08-07 09:55:26,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [347.0, 188.0, 173.0, 221.0, 246.0, 232.0, 232.0, 200.0, 271.0, 246.0]
2025-08-07 09:55:26,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 46 seconds)
2025-08-07 09:57:28,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:57:32,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1368.14929 ± 449.545
2025-08-07 09:57:32,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [991.3802, 1158.6703, 1363.5704, 2615.2188, 1045.7825, 1071.3881, 1118.2823, 1374.1042, 1516.4438, 1426.6533]
2025-08-07 09:57:32,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [197.0, 243.0, 268.0, 503.0, 204.0, 201.0, 215.0, 246.0, 294.0, 256.0]
2025-08-07 09:57:32,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 40 seconds)
2025-08-07 09:59:33,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:59:38,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1498.33569 ± 403.652
2025-08-07 09:59:38,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1353.184, 1200.5746, 1287.6874, 1851.3278, 922.8721, 1596.1931, 1871.3512, 1449.2681, 1106.9028, 2343.9949]
2025-08-07 09:59:38,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [260.0, 221.0, 262.0, 336.0, 177.0, 308.0, 364.0, 279.0, 223.0, 430.0]
2025-08-07 09:59:38,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 35 seconds)
2025-08-07 10:01:37,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:01:41,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1291.00122 ± 315.031
2025-08-07 10:01:41,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1626.998, 860.6437, 1516.1244, 881.4494, 884.96576, 1431.082, 1749.8816, 1274.4343, 1536.3164, 1148.1165]
2025-08-07 10:01:41,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [302.0, 188.0, 320.0, 168.0, 204.0, 295.0, 317.0, 248.0, 292.0, 250.0]
2025-08-07 10:01:41,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 30 seconds)
2025-08-07 10:03:41,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:03:45,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1288.72363 ± 269.005
2025-08-07 10:03:45,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1259.6338, 1213.7507, 1119.558, 1180.6044, 1296.5023, 1127.7899, 1841.2262, 996.2132, 1092.5114, 1759.4468]
2025-08-07 10:03:45,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [225.0, 230.0, 207.0, 204.0, 242.0, 212.0, 371.0, 194.0, 202.0, 321.0]
2025-08-07 10:03:45,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 22 seconds)
2025-08-07 10:05:47,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:05:53,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1759.79358 ± 667.813
2025-08-07 10:05:53,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [3494.3323, 1557.9402, 1685.8177, 1256.9253, 2009.7738, 1497.3882, 930.33014, 1299.6696, 2107.5732, 1758.1841]
2025-08-07 10:05:53,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [612.0, 304.0, 302.0, 244.0, 415.0, 281.0, 189.0, 274.0, 370.0, 342.0]
2025-08-07 10:05:53,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 21 seconds)
2025-08-07 10:07:51,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:07:58,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2052.81519 ± 1231.467
2025-08-07 10:07:58,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1601.9661, 1296.2609, 4679.1367, 1428.6831, 1329.1613, 3864.0342, 924.30286, 2787.4539, 1742.8297, 874.3219]
2025-08-07 10:07:58,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [318.0, 240.0, 909.0, 282.0, 259.0, 698.0, 202.0, 508.0, 348.0, 177.0]
2025-08-07 10:07:58,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (2052.82) for latency ExtremeClogL1U23
2025-08-07 10:07:58,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 15 seconds)
2025-08-07 10:09:58,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:10:03,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1593.68140 ± 531.312
2025-08-07 10:10:03,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1004.7337, 1837.1279, 1263.419, 2315.012, 1215.1094, 2682.5906, 1114.1104, 1811.2711, 1518.5468, 1174.8936]
2025-08-07 10:10:03,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [176.0, 364.0, 223.0, 441.0, 238.0, 519.0, 218.0, 301.0, 291.0, 213.0]
2025-08-07 10:10:03,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 9 seconds)
2025-08-07 10:12:03,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:12:11,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2462.63672 ± 1341.264
2025-08-07 10:12:11,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1395.1882, 2673.86, 4435.3125, 1573.6703, 2225.5134, 1548.1252, 2386.9797, 5418.487, 948.08514, 2021.1437]
2025-08-07 10:12:11,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [266.0, 495.0, 794.0, 305.0, 396.0, 273.0, 429.0, 1000.0, 186.0, 398.0]
2025-08-07 10:12:11,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (2462.64) for latency ExtremeClogL1U23
2025-08-07 10:12:11,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 5 seconds)
2025-08-07 10:14:10,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:14:16,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1727.44507 ± 414.010
2025-08-07 10:14:16,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2024.4224, 1380.6812, 1372.4904, 957.55133, 2483.2947, 1982.9915, 1726.5225, 1497.1029, 2023.141, 1826.2535]
2025-08-07 10:14:16,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [346.0, 251.0, 276.0, 197.0, 491.0, 401.0, 301.0, 304.0, 357.0, 341.0]
2025-08-07 10:14:16,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1251 [DEBUG]: Training session finished
