2025-09-16 10:46:25,298 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.000-delay_3
2025-09-16 10:46:25,298 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.000-delay_3
2025-09-16 10:46:25,298 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'3': <latency_env.delayed_mdp.ConstantDelay object at 0x147088b24690>}
2025-09-16 10:46:25,298 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 10:46:25,302 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 10:46:25,321 baseline-bpql-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=427, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 10:46:25,321 baseline-bpql-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 10:46:26,934 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 10:46:26,935 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 10:48:06,646 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 10:48:07,384 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 304.26474 ± 51.623
2025-09-16 10:48:07,384 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [232.0611, 297.82642, 281.61203, 383.8036, 409.2903, 270.7882, 286.55942, 269.47137, 284.6924, 326.54257]
2025-09-16 10:48:07,384 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [50.0, 62.0, 60.0, 78.0, 84.0, 59.0, 63.0, 57.0, 62.0, 71.0]
2025-09-16 10:48:07,384 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (304.26) for latency 3
2025-09-16 10:48:07,390 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 45 minutes, 45 seconds)
2025-09-16 10:49:56,163 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 10:49:56,811 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 277.64911 ± 35.429
2025-09-16 10:49:56,811 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [256.4294, 265.37762, 240.53056, 272.25235, 276.14722, 229.63364, 305.39713, 282.77225, 363.64835, 284.30286]
2025-09-16 10:49:56,811 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [52.0, 55.0, 48.0, 56.0, 58.0, 46.0, 63.0, 58.0, 77.0, 60.0]
2025-09-16 10:49:56,816 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 51 minutes, 24 seconds)
2025-09-16 10:51:44,961 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 10:51:45,953 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 427.91315 ± 96.026
2025-09-16 10:51:45,953 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [439.96332, 396.41513, 680.14404, 349.25494, 454.08624, 328.50754, 457.1799, 336.01413, 393.9844, 443.58163]
2025-09-16 10:51:45,953 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 84.0, 137.0, 74.0, 84.0, 66.0, 101.0, 65.0, 75.0, 83.0]
2025-09-16 10:51:45,953 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (427.91) for latency 3
2025-09-16 10:51:45,958 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 51 minutes, 55 seconds)
2025-09-16 10:53:34,865 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 10:53:35,855 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 423.75812 ± 98.945
2025-09-16 10:53:35,856 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [388.1597, 563.95776, 367.38324, 365.45718, 570.84894, 376.02982, 568.34644, 396.3178, 278.58667, 362.49332]
2025-09-16 10:53:35,856 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 107.0, 74.0, 71.0, 114.0, 73.0, 117.0, 83.0, 54.0, 74.0]
2025-09-16 10:53:35,861 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 51 minutes, 34 seconds)
2025-09-16 10:55:24,444 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 10:55:25,369 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 399.29236 ± 90.985
2025-09-16 10:55:25,369 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [501.50537, 373.8296, 393.19806, 316.67834, 321.5366, 313.6589, 383.60275, 440.6589, 335.76752, 612.4876]
2025-09-16 10:55:25,369 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 75.0, 84.0, 64.0, 66.0, 68.0, 81.0, 90.0, 67.0, 124.0]
2025-09-16 10:55:25,379 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 50 minutes, 30 seconds)
2025-09-16 10:57:13,992 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 10:57:14,809 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 371.62881 ± 42.094
2025-09-16 10:57:14,809 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [328.87613, 326.95544, 433.34326, 435.75156, 403.4322, 338.3657, 320.69186, 403.94388, 357.4221, 367.50616]
2025-09-16 10:57:14,809 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 61.0, 81.0, 82.0, 74.0, 65.0, 60.0, 76.0, 68.0, 68.0]
2025-09-16 10:57:14,814 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 51 minutes, 31 seconds)
2025-09-16 10:59:03,293 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 10:59:04,507 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 526.00354 ± 124.032
2025-09-16 10:59:04,507 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [657.662, 531.52484, 387.9102, 700.8788, 386.4245, 659.8415, 417.00403, 411.72595, 660.70496, 446.35858]
2025-09-16 10:59:04,507 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 103.0, 73.0, 134.0, 85.0, 127.0, 80.0, 78.0, 126.0, 83.0]
2025-09-16 10:59:04,507 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (526.00) for latency 3
2025-09-16 10:59:04,511 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 49 minutes, 47 seconds)
2025-09-16 11:00:53,448 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:00:54,522 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 478.22369 ± 65.070
2025-09-16 11:00:54,522 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [412.2286, 544.2612, 553.5547, 468.1612, 489.64566, 483.8112, 347.44537, 461.91266, 446.4994, 574.717]
2025-09-16 11:00:54,522 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 108.0, 106.0, 87.0, 100.0, 92.0, 66.0, 90.0, 84.0, 109.0]
2025-09-16 11:00:54,528 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 48 minutes, 13 seconds)
2025-09-16 11:02:43,774 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:02:44,940 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 463.27350 ± 92.955
2025-09-16 11:02:44,940 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [408.23093, 384.8731, 373.1462, 444.2073, 395.0563, 678.3132, 591.8975, 442.58954, 477.23795, 437.18295]
2025-09-16 11:02:44,940 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 87.0, 82.0, 98.0, 86.0, 147.0, 116.0, 99.0, 91.0, 85.0]
2025-09-16 11:02:44,943 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 46 minutes, 33 seconds)
2025-09-16 11:04:34,616 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:04:35,878 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 528.31403 ± 78.508
2025-09-16 11:04:35,879 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [546.3352, 491.12378, 640.876, 580.63416, 454.7008, 593.41626, 600.0089, 390.46823, 556.8033, 428.77325]
2025-09-16 11:04:35,879 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 93.0, 141.0, 112.0, 87.0, 113.0, 115.0, 85.0, 104.0, 83.0]
2025-09-16 11:04:35,879 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (528.31) for latency 3
2025-09-16 11:04:35,888 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 45 minutes, 9 seconds)
2025-09-16 11:06:24,294 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:06:25,551 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 523.62726 ± 147.178
2025-09-16 11:06:25,552 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [451.6383, 504.69125, 332.70447, 493.0262, 469.54575, 695.45856, 409.6354, 500.49005, 498.95264, 880.13007]
2025-09-16 11:06:25,552 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 110.0, 71.0, 92.0, 102.0, 134.0, 89.0, 96.0, 94.0, 188.0]
2025-09-16 11:06:25,557 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 43 minutes, 23 seconds)
2025-09-16 11:08:14,787 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:08:15,908 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 476.04608 ± 75.829
2025-09-16 11:08:15,908 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [523.1174, 464.21036, 368.49893, 579.25574, 481.81723, 500.1091, 573.7559, 490.27115, 327.83972, 451.5856]
2025-09-16 11:08:15,908 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 92.0, 70.0, 110.0, 110.0, 95.0, 111.0, 95.0, 73.0, 88.0]
2025-09-16 11:08:15,913 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 41 minutes, 44 seconds)
2025-09-16 11:10:05,053 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:10:06,397 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 568.39526 ± 97.244
2025-09-16 11:10:06,397 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [475.62045, 520.9856, 612.08813, 523.61993, 614.32135, 545.8735, 688.4284, 754.9597, 545.2835, 402.7718]
2025-09-16 11:10:06,397 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 101.0, 117.0, 97.0, 123.0, 102.0, 132.0, 146.0, 106.0, 89.0]
2025-09-16 11:10:06,397 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (568.40) for latency 3
2025-09-16 11:10:06,402 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 40 minutes, 2 seconds)
2025-09-16 11:11:56,181 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:11:57,631 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 596.12317 ± 81.768
2025-09-16 11:11:57,631 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [504.9271, 599.6624, 613.5168, 566.69116, 564.1103, 800.7584, 617.6917, 556.6147, 494.727, 642.5319]
2025-09-16 11:11:57,631 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 112.0, 120.0, 112.0, 109.0, 157.0, 136.0, 103.0, 93.0, 126.0]
2025-09-16 11:11:57,631 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (596.12) for latency 3
2025-09-16 11:11:57,637 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 38 minutes, 26 seconds)
2025-09-16 11:13:46,569 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:13:47,905 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 587.81250 ± 168.055
2025-09-16 11:13:47,905 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [571.4742, 555.87067, 596.4273, 469.05286, 397.1743, 515.9906, 1020.18665, 486.32053, 742.72217, 522.90594]
2025-09-16 11:13:47,905 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 105.0, 116.0, 89.0, 75.0, 96.0, 198.0, 91.0, 141.0, 101.0]
2025-09-16 11:13:47,909 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 36 minutes, 24 seconds)
2025-09-16 11:15:36,922 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:15:38,435 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 634.45563 ± 206.483
2025-09-16 11:15:38,435 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1154.5691, 716.6298, 474.73294, 705.8189, 539.8815, 571.6739, 487.6803, 576.846, 372.4195, 744.3038]
2025-09-16 11:15:38,435 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [245.0, 137.0, 90.0, 134.0, 105.0, 115.0, 95.0, 115.0, 76.0, 144.0]
2025-09-16 11:15:38,435 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (634.46) for latency 3
2025-09-16 11:15:38,439 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 34 minutes, 48 seconds)
2025-09-16 11:17:28,428 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:17:30,159 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 711.13800 ± 243.382
2025-09-16 11:17:30,160 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [828.20557, 549.20624, 743.2371, 638.77875, 543.33, 671.80194, 575.3589, 535.14716, 1390.3901, 635.92395]
2025-09-16 11:17:30,160 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [162.0, 104.0, 142.0, 124.0, 120.0, 131.0, 113.0, 101.0, 295.0, 120.0]
2025-09-16 11:17:30,160 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (711.14) for latency 3
2025-09-16 11:17:30,166 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 33 minutes, 20 seconds)
2025-09-16 11:19:20,473 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:19:22,096 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 656.15710 ± 156.842
2025-09-16 11:19:22,097 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1027.3137, 533.01404, 590.806, 473.51474, 843.9109, 597.9698, 597.69727, 545.1688, 687.8148, 664.3616]
2025-09-16 11:19:22,097 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [223.0, 105.0, 115.0, 109.0, 166.0, 110.0, 129.0, 102.0, 148.0, 147.0]
2025-09-16 11:19:22,103 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 31 minutes, 53 seconds)
2025-09-16 11:21:10,326 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:21:12,167 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 729.30505 ± 135.796
2025-09-16 11:21:12,167 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [446.79184, 922.3148, 843.594, 669.18774, 612.67035, 736.72675, 862.87134, 844.6076, 662.5392, 691.7472]
2025-09-16 11:21:12,167 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 176.0, 168.0, 148.0, 136.0, 140.0, 187.0, 182.0, 141.0, 152.0]
2025-09-16 11:21:12,167 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (729.31) for latency 3
2025-09-16 11:21:12,177 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 29 minutes, 43 seconds)
2025-09-16 11:23:01,857 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:23:03,446 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 654.62689 ± 204.868
2025-09-16 11:23:03,446 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [500.18497, 984.0991, 475.07635, 613.2069, 536.80615, 979.98737, 474.07465, 573.1226, 916.1881, 493.52313]
2025-09-16 11:23:03,446 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 195.0, 88.0, 126.0, 109.0, 191.0, 96.0, 106.0, 185.0, 106.0]
2025-09-16 11:23:03,452 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 28 minutes, 8 seconds)
2025-09-16 11:24:53,679 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:24:55,275 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 673.25244 ± 126.690
2025-09-16 11:24:55,275 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [703.1976, 798.1846, 934.9451, 562.14966, 780.5283, 620.2382, 585.5104, 685.6682, 514.3523, 547.75024]
2025-09-16 11:24:55,275 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 159.0, 185.0, 103.0, 153.0, 131.0, 111.0, 135.0, 96.0, 107.0]
2025-09-16 11:24:55,283 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 26 minutes, 38 seconds)
2025-09-16 11:26:44,917 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:26:46,766 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 758.04449 ± 252.130
2025-09-16 11:26:46,767 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [494.4454, 461.4833, 627.92944, 618.5142, 664.9304, 629.45703, 1269.014, 755.7104, 1093.5502, 965.41003]
2025-09-16 11:26:46,767 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 89.0, 116.0, 118.0, 138.0, 124.0, 246.0, 160.0, 226.0, 183.0]
2025-09-16 11:26:46,767 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (758.04) for latency 3
2025-09-16 11:26:46,773 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 24 minutes, 43 seconds)
2025-09-16 11:28:36,549 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:28:38,650 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 844.47382 ± 247.356
2025-09-16 11:28:38,650 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [705.39154, 739.5276, 685.5277, 918.00226, 557.6421, 745.76587, 1464.1779, 708.4967, 838.5611, 1081.6455]
2025-09-16 11:28:38,650 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 166.0, 133.0, 175.0, 121.0, 149.0, 290.0, 138.0, 182.0, 215.0]
2025-09-16 11:28:38,650 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (844.47) for latency 3
2025-09-16 11:28:38,655 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 22 minutes, 50 seconds)
2025-09-16 11:30:27,178 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:30:28,764 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 701.97327 ± 100.792
2025-09-16 11:30:28,764 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [761.2892, 719.57983, 785.84357, 916.8781, 559.5549, 602.64307, 638.24634, 713.3655, 602.5424, 719.7897]
2025-09-16 11:30:28,764 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [150.0, 138.0, 153.0, 180.0, 109.0, 118.0, 123.0, 136.0, 119.0, 136.0]
2025-09-16 11:30:28,770 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 21 minutes)
2025-09-16 11:32:17,961 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:32:19,391 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 627.58093 ± 85.653
2025-09-16 11:32:19,392 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [562.4283, 689.389, 622.071, 615.5221, 687.36774, 642.3785, 692.303, 515.4461, 773.69446, 475.2097]
2025-09-16 11:32:19,392 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 132.0, 119.0, 117.0, 131.0, 123.0, 134.0, 101.0, 148.0, 94.0]
2025-09-16 11:32:19,399 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 18 minutes, 59 seconds)
2025-09-16 11:34:10,846 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:34:13,211 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 968.33966 ± 317.234
2025-09-16 11:34:13,211 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [902.5417, 574.83374, 1043.2361, 1719.45, 770.7636, 759.7275, 1155.8038, 925.6347, 639.6971, 1191.7081]
2025-09-16 11:34:13,211 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [171.0, 111.0, 203.0, 345.0, 150.0, 150.0, 236.0, 193.0, 118.0, 246.0]
2025-09-16 11:34:13,211 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (968.34) for latency 3
2025-09-16 11:34:13,217 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 17 minutes, 37 seconds)
2025-09-16 11:36:01,076 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:36:02,812 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 721.71130 ± 109.607
2025-09-16 11:36:02,812 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [600.9557, 677.1488, 932.2591, 756.1631, 819.83435, 753.61334, 517.8607, 742.496, 758.7062, 658.07574]
2025-09-16 11:36:02,812 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 129.0, 187.0, 160.0, 170.0, 145.0, 99.0, 140.0, 149.0, 136.0]
2025-09-16 11:36:02,817 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 15 minutes, 18 seconds)
2025-09-16 11:37:52,733 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:37:54,261 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 676.88007 ± 132.080
2025-09-16 11:37:54,261 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [710.1934, 1053.9994, 658.65625, 667.3653, 560.5538, 593.41797, 627.9084, 602.3281, 662.7236, 631.6544]
2025-09-16 11:37:54,261 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 209.0, 125.0, 129.0, 106.0, 111.0, 118.0, 113.0, 124.0, 124.0]
2025-09-16 11:37:54,270 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 13 minutes, 20 seconds)
2025-09-16 11:39:43,729 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:39:45,662 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 792.03448 ± 219.520
2025-09-16 11:39:45,662 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [834.65826, 1135.3934, 703.4686, 1143.0785, 581.19696, 963.54755, 834.39594, 485.67508, 587.6802, 651.2503]
2025-09-16 11:39:45,662 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 237.0, 140.0, 222.0, 128.0, 190.0, 162.0, 110.0, 127.0, 142.0]
2025-09-16 11:39:45,667 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 11 minutes, 47 seconds)
2025-09-16 11:41:35,464 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:41:38,243 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1135.87854 ± 275.318
2025-09-16 11:41:38,243 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [778.5453, 1606.137, 1015.33185, 1393.7977, 1046.8213, 1164.4766, 902.7318, 1220.9333, 756.6046, 1473.4059]
2025-09-16 11:41:38,243 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [148.0, 317.0, 198.0, 295.0, 202.0, 226.0, 173.0, 236.0, 164.0, 297.0]
2025-09-16 11:41:38,243 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1135.88) for latency 3
2025-09-16 11:41:38,250 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 10 minutes, 23 seconds)
2025-09-16 11:43:29,925 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:43:31,987 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 861.11719 ± 162.865
2025-09-16 11:43:31,987 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [870.52905, 1022.0283, 621.7729, 888.3811, 929.8008, 770.536, 685.2042, 670.9892, 1009.2348, 1142.695]
2025-09-16 11:43:31,987 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [181.0, 207.0, 119.0, 170.0, 176.0, 150.0, 130.0, 128.0, 192.0, 230.0]
2025-09-16 11:43:31,992 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 8 minutes, 31 seconds)
2025-09-16 11:45:20,815 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:45:23,887 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1232.62317 ± 372.407
2025-09-16 11:45:23,887 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1354.0083, 897.70325, 1375.2178, 1012.89124, 805.60187, 1858.4023, 755.01904, 1855.3145, 1214.3524, 1197.7211]
2025-09-16 11:45:23,887 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [280.0, 179.0, 273.0, 208.0, 167.0, 369.0, 153.0, 368.0, 248.0, 239.0]
2025-09-16 11:45:23,887 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1232.62) for latency 3
2025-09-16 11:45:23,895 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 7 minutes, 10 seconds)
2025-09-16 11:47:13,882 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:47:16,276 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1006.07947 ± 283.264
2025-09-16 11:47:16,276 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [705.3633, 1374.0615, 645.577, 1419.0962, 681.4442, 1183.9819, 843.3001, 869.9278, 1026.1472, 1311.8955]
2025-09-16 11:47:16,276 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 276.0, 124.0, 280.0, 131.0, 228.0, 170.0, 179.0, 210.0, 267.0]
2025-09-16 11:47:16,282 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 5 minutes, 30 seconds)
2025-09-16 11:49:05,958 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:49:09,039 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1248.81360 ± 456.194
2025-09-16 11:49:09,039 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1266.6921, 2240.865, 788.1933, 1673.0552, 711.124, 1097.6962, 1124.4794, 1279.0042, 730.4216, 1576.6044]
2025-09-16 11:49:09,039 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [263.0, 455.0, 153.0, 329.0, 134.0, 213.0, 214.0, 254.0, 141.0, 318.0]
2025-09-16 11:49:09,040 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1248.81) for latency 3
2025-09-16 11:49:09,046 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 3 minutes, 56 seconds)
2025-09-16 11:50:59,569 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:51:01,930 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 997.60760 ± 151.752
2025-09-16 11:51:01,930 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [959.3438, 848.05176, 1087.5388, 1195.4325, 1187.034, 684.96875, 873.5908, 1062.6036, 1002.5737, 1074.9382]
2025-09-16 11:51:01,931 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [195.0, 164.0, 228.0, 232.0, 227.0, 131.0, 166.0, 207.0, 195.0, 201.0]
2025-09-16 11:51:01,936 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 2 minutes, 7 seconds)
2025-09-16 11:52:51,759 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:52:56,014 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1659.04419 ± 763.478
2025-09-16 11:52:56,014 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2320.3193, 1196.2096, 3246.317, 2581.2222, 1050.3168, 1726.1576, 965.78314, 943.681, 1556.9861, 1003.4476]
2025-09-16 11:52:56,014 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [468.0, 246.0, 656.0, 524.0, 231.0, 369.0, 193.0, 194.0, 333.0, 207.0]
2025-09-16 11:52:56,014 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1659.04) for latency 3
2025-09-16 11:52:56,020 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 19 seconds)
2025-09-16 11:54:46,811 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:54:49,840 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1230.11694 ± 425.134
2025-09-16 11:54:49,840 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1268.9816, 1455.2551, 1126.9071, 744.26025, 771.5065, 2052.176, 888.82184, 1893.058, 1078.306, 1021.89795]
2025-09-16 11:54:49,840 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [249.0, 292.0, 222.0, 145.0, 159.0, 426.0, 170.0, 373.0, 212.0, 201.0]
2025-09-16 11:54:49,845 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 58 minutes, 50 seconds)
2025-09-16 11:56:39,419 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:56:42,086 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1024.91284 ± 392.375
2025-09-16 11:56:42,086 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1581.2755, 819.92773, 368.04428, 938.6122, 1096.7438, 685.0819, 568.9483, 1440.2689, 1338.0037, 1412.2216]
2025-09-16 11:56:42,086 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [318.0, 185.0, 79.0, 197.0, 234.0, 139.0, 123.0, 302.0, 286.0, 300.0]
2025-09-16 11:56:42,094 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 56 minutes, 56 seconds)
2025-09-16 11:58:34,741 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:58:37,389 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 991.14630 ± 306.370
2025-09-16 11:58:37,389 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1708.8492, 1125.2814, 873.31903, 1004.82996, 807.53094, 666.56104, 1190.9442, 636.7567, 1152.7566, 744.6341]
2025-09-16 11:58:37,389 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [361.0, 238.0, 187.0, 211.0, 178.0, 150.0, 252.0, 142.0, 254.0, 171.0]
2025-09-16 11:58:37,397 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 55 minutes, 33 seconds)
2025-09-16 12:00:26,773 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:00:29,326 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1100.74292 ± 255.815
2025-09-16 12:00:29,326 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1753.3683, 852.8003, 891.3218, 1145.047, 1033.6382, 1248.7838, 1125.1646, 1007.53284, 810.8534, 1138.9191]
2025-09-16 12:00:29,326 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [351.0, 162.0, 164.0, 220.0, 196.0, 235.0, 218.0, 190.0, 157.0, 214.0]
2025-09-16 12:00:29,333 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 53 minutes, 28 seconds)
2025-09-16 12:02:19,281 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:02:23,307 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1587.11523 ± 476.641
2025-09-16 12:02:23,307 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1763.9513, 1291.0029, 2041.89, 1341.2809, 2196.5596, 1106.3125, 777.96155, 1204.9115, 2140.1294, 2007.1528]
2025-09-16 12:02:23,307 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [347.0, 266.0, 408.0, 262.0, 427.0, 223.0, 169.0, 234.0, 462.0, 402.0]
2025-09-16 12:02:23,313 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 51 minutes, 34 seconds)
2025-09-16 12:04:13,641 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:04:17,506 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1562.34546 ± 462.102
2025-09-16 12:04:17,507 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2301.4067, 2025.2012, 1568.982, 1604.7662, 954.2396, 1093.8289, 1472.9103, 1123.0149, 1229.6643, 2249.442]
2025-09-16 12:04:17,507 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [446.0, 398.0, 319.0, 333.0, 186.0, 209.0, 287.0, 218.0, 233.0, 455.0]
2025-09-16 12:04:17,513 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 49 minutes, 44 seconds)
2025-09-16 12:06:07,458 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:06:11,050 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1476.00952 ± 600.852
2025-09-16 12:06:11,050 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1469.3981, 1172.9786, 1779.0526, 1602.8099, 1048.9678, 1390.2645, 1209.0648, 3028.3596, 1442.5835, 616.6155]
2025-09-16 12:06:11,050 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [292.0, 225.0, 370.0, 314.0, 205.0, 265.0, 235.0, 608.0, 284.0, 122.0]
2025-09-16 12:06:11,058 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 48 minutes, 6 seconds)
2025-09-16 12:08:09,435 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:08:12,620 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1326.97876 ± 517.100
2025-09-16 12:08:12,620 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [745.8063, 1356.299, 956.8291, 1195.4038, 972.9675, 1990.8922, 1327.4448, 2511.7598, 867.7846, 1344.6016]
2025-09-16 12:08:12,620 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 266.0, 183.0, 227.0, 188.0, 389.0, 253.0, 500.0, 169.0, 265.0]
2025-09-16 12:08:12,630 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 47 minutes, 22 seconds)
2025-09-16 12:09:55,979 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:10:01,780 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2140.99170 ± 1419.129
2025-09-16 12:10:01,780 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2012.7031, 4694.1724, 1096.643, 822.41766, 3275.3733, 1204.9628, 4609.4824, 1356.4365, 1248.7496, 1088.9746]
2025-09-16 12:10:01,781 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [402.0, 1000.0, 243.0, 160.0, 675.0, 259.0, 1000.0, 269.0, 273.0, 213.0]
2025-09-16 12:10:01,781 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (2140.99) for latency 3
2025-09-16 12:10:01,786 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 44 minutes, 56 seconds)
2025-09-16 12:12:00,330 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:12:10,149 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3583.78711 ± 1441.575
2025-09-16 12:12:10,150 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1620.1566, 4110.4653, 4755.7305, 4822.857, 4717.8604, 3376.8154, 4803.424, 1419.6014, 4805.728, 1405.2339]
2025-09-16 12:12:10,150 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [318.0, 807.0, 1000.0, 1000.0, 937.0, 684.0, 1000.0, 293.0, 1000.0, 273.0]
2025-09-16 12:12:10,150 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (3583.79) for latency 3
2025-09-16 12:12:10,160 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 45 minutes, 37 seconds)
2025-09-16 12:13:54,750 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:14:00,072 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2098.21826 ± 919.185
2025-09-16 12:14:00,072 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2832.6008, 1226.9858, 3873.3472, 1602.9578, 973.55505, 1159.1384, 3304.6592, 1890.9316, 2289.2104, 1828.795]
2025-09-16 12:14:00,072 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [567.0, 241.0, 772.0, 333.0, 191.0, 222.0, 689.0, 373.0, 441.0, 361.0]
2025-09-16 12:14:00,078 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 42 minutes, 55 seconds)
2025-09-16 12:15:54,790 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:16:01,496 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2469.35986 ± 1466.999
2025-09-16 12:16:01,496 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1121.3484, 4783.8247, 439.88956, 1788.2837, 2589.3604, 2954.3726, 3518.6624, 4827.5024, 1709.4117, 960.9424]
2025-09-16 12:16:01,496 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [226.0, 1000.0, 97.0, 376.0, 560.0, 629.0, 709.0, 1000.0, 346.0, 188.0]
2025-09-16 12:16:01,503 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 42 minutes, 20 seconds)
2025-09-16 12:17:54,787 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:18:00,962 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2410.11450 ± 1320.043
2025-09-16 12:18:00,962 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4781.3945, 2076.427, 4988.12, 2583.3591, 1664.3107, 1772.663, 1083.6471, 2187.8682, 972.1599, 1991.1948]
2025-09-16 12:18:00,962 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 416.0, 1000.0, 515.0, 332.0, 353.0, 212.0, 434.0, 196.0, 403.0]
2025-09-16 12:18:00,972 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 40 minutes, 1 second)
2025-09-16 12:19:47,837 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:19:53,378 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2160.17725 ± 1156.022
2025-09-16 12:19:53,378 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1926.562, 943.77295, 4203.624, 3165.6177, 1840.3993, 1687.5393, 1280.481, 3991.6658, 713.82965, 1848.2819]
2025-09-16 12:19:53,378 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [383.0, 178.0, 865.0, 628.0, 389.0, 325.0, 250.0, 813.0, 156.0, 355.0]
2025-09-16 12:19:53,386 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 38 minutes, 35 seconds)
2025-09-16 12:21:44,311 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:21:50,132 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2305.30615 ± 1201.976
2025-09-16 12:21:50,133 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4954.0396, 1383.4874, 2681.6602, 2658.081, 2582.0652, 3355.3904, 706.714, 1258.3805, 1089.2449, 2384.0]
2025-09-16 12:21:50,133 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 273.0, 526.0, 506.0, 509.0, 657.0, 136.0, 256.0, 211.0, 466.0]
2025-09-16 12:21:50,139 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 34 minutes, 43 seconds)
2025-09-16 12:23:45,291 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:23:54,633 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3407.90039 ± 1723.826
2025-09-16 12:23:54,633 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1671.812, 1356.8374, 4911.893, 4253.814, 4721.0547, 5008.817, 4969.942, 898.66205, 1360.7401, 4925.431]
2025-09-16 12:23:54,633 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [351.0, 285.0, 1000.0, 867.0, 952.0, 1000.0, 1000.0, 188.0, 290.0, 1000.0]
2025-09-16 12:23:54,640 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 35 minutes, 7 seconds)
2025-09-16 12:25:41,706 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:25:47,588 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2222.01929 ± 1190.641
2025-09-16 12:25:47,588 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1461.907, 3875.0767, 2142.5732, 1228.0203, 1532.2842, 2798.0945, 2243.1135, 1021.424, 4766.6436, 1151.0571]
2025-09-16 12:25:47,588 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [289.0, 811.0, 417.0, 241.0, 305.0, 563.0, 477.0, 197.0, 1000.0, 222.0]
2025-09-16 12:25:47,596 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 31 minutes, 49 seconds)
2025-09-16 12:27:41,989 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:27:53,588 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4218.21289 ± 1024.493
2025-09-16 12:27:53,588 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4744.9077, 1373.9662, 4947.9995, 3641.913, 4684.2764, 4726.213, 4790.461, 4396.0474, 4846.1396, 4030.207]
2025-09-16 12:27:53,588 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 277.0, 1000.0, 746.0, 1000.0, 1000.0, 1000.0, 884.0, 1000.0, 836.0]
2025-09-16 12:27:53,588 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (4218.21) for latency 3
2025-09-16 12:27:53,597 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 30 minutes, 52 seconds)
2025-09-16 12:29:42,176 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:29:50,513 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3156.65015 ± 1629.161
2025-09-16 12:29:50,514 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4993.977, 4967.739, 3269.0896, 729.5177, 2061.9233, 5017.748, 2704.2192, 2095.889, 886.08417, 4840.316]
2025-09-16 12:29:50,514 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 655.0, 143.0, 427.0, 1000.0, 533.0, 402.0, 175.0, 1000.0]
2025-09-16 12:29:50,521 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 29 minutes, 34 seconds)
2025-09-16 12:31:46,549 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:31:56,122 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3403.12256 ± 1487.130
2025-09-16 12:31:56,123 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4753.1763, 4697.343, 1717.5497, 4840.3174, 1650.8003, 4701.0713, 1936.6123, 4865.554, 3615.443, 1253.3584]
2025-09-16 12:31:56,123 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 352.0, 1000.0, 332.0, 1000.0, 399.0, 1000.0, 754.0, 271.0]
2025-09-16 12:31:56,130 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 28 minutes, 52 seconds)
2025-09-16 12:33:47,639 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:33:56,408 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3257.66748 ± 1414.790
2025-09-16 12:33:56,408 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [3477.6243, 2619.5054, 1552.5043, 2491.5322, 1728.5886, 4782.9517, 1338.7885, 4870.9175, 4844.9473, 4869.317]
2025-09-16 12:33:56,408 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [704.0, 513.0, 301.0, 500.0, 345.0, 1000.0, 266.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:33:56,415 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 26 minutes, 15 seconds)
2025-09-16 12:35:53,037 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:36:00,548 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2904.33887 ± 1811.266
2025-09-16 12:36:00,549 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5059.5103, 5121.9634, 694.6702, 4992.0684, 982.0622, 1015.6779, 1984.6777, 3039.2942, 4782.2197, 1371.245]
2025-09-16 12:36:00,549 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 138.0, 1000.0, 186.0, 195.0, 382.0, 607.0, 972.0, 274.0]
2025-09-16 12:36:00,555 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 25 minutes, 48 seconds)
2025-09-16 12:37:52,425 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:37:58,212 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2266.52734 ± 1000.180
2025-09-16 12:37:58,212 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4397.4897, 1366.077, 3169.617, 2767.2688, 1605.1876, 1538.7156, 1895.039, 1243.8118, 1497.4291, 3184.6365]
2025-09-16 12:37:58,212 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [856.0, 269.0, 652.0, 547.0, 311.0, 305.0, 376.0, 241.0, 293.0, 647.0]
2025-09-16 12:37:58,219 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 22 minutes, 37 seconds)
2025-09-16 12:39:47,187 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:39:57,095 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3843.41650 ± 1415.492
2025-09-16 12:39:57,095 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2666.156, 5089.061, 5090.5054, 2355.6335, 5106.2256, 4384.1343, 2082.9863, 5019.2305, 1529.3812, 5110.851]
2025-09-16 12:39:57,095 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [519.0, 1000.0, 1000.0, 461.0, 1000.0, 851.0, 400.0, 1000.0, 300.0, 1000.0]
2025-09-16 12:39:57,105 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 20 minutes, 52 seconds)
2025-09-16 12:41:48,055 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:42:00,641 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4439.92578 ± 598.661
2025-09-16 12:42:00,641 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4177.7144, 4460.0273, 4658.4116, 4891.178, 4890.889, 2740.2908, 4623.735, 4674.548, 4667.8965, 4614.5645]
2025-09-16 12:42:00,641 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [899.0, 948.0, 1000.0, 1000.0, 1000.0, 555.0, 1000.0, 945.0, 1000.0, 1000.0]
2025-09-16 12:42:00,641 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (4439.93) for latency 3
2025-09-16 12:42:00,653 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 18 minutes, 35 seconds)
2025-09-16 12:43:50,871 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:44:03,921 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4536.87305 ± 861.602
2025-09-16 12:44:03,921 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4910.464, 4675.618, 4691.072, 4987.0166, 4901.186, 1969.8759, 4898.3164, 4796.116, 4698.6484, 4840.4175]
2025-09-16 12:44:03,921 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 398.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:44:03,921 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (4536.87) for latency 3
2025-09-16 12:44:03,927 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 16 minutes, 57 seconds)
2025-09-16 12:46:01,095 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:46:14,037 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4597.50928 ± 803.446
2025-09-16 12:46:14,037 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5039.76, 4698.9233, 4989.8726, 4817.7026, 4669.0024, 5087.135, 4904.373, 4820.476, 2222.2563, 4725.5933]
2025-09-16 12:46:14,037 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 462.0, 1000.0]
2025-09-16 12:46:14,037 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (4597.51) for latency 3
2025-09-16 12:46:14,045 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 15 minutes, 39 seconds)
2025-09-16 12:48:08,512 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:48:14,207 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2193.75342 ± 1391.698
2025-09-16 12:48:14,208 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1203.1838, 1980.2798, 1463.3724, 5118.9546, 1483.0176, 4702.9653, 1416.9078, 1590.6976, 994.3121, 1983.8438]
2025-09-16 12:48:14,208 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [232.0, 382.0, 290.0, 1000.0, 293.0, 1000.0, 277.0, 305.0, 197.0, 389.0]
2025-09-16 12:48:14,215 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 13 minutes, 55 seconds)
2025-09-16 12:49:59,619 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:50:10,671 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3850.18677 ± 1305.947
2025-09-16 12:50:10,671 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1147.3359, 4768.9434, 4667.274, 4720.083, 4623.134, 4692.247, 2042.0231, 4629.491, 4649.2754, 2562.061]
2025-09-16 12:50:10,671 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [224.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 440.0, 1000.0, 1000.0, 513.0]
2025-09-16 12:50:10,681 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 11 minutes, 35 seconds)
2025-09-16 12:52:00,470 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:52:09,589 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3221.32104 ± 1109.681
2025-09-16 12:52:09,590 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1440.0536, 1861.1184, 4744.314, 4945.128, 2697.2075, 3778.077, 3590.5815, 2480.9412, 2744.8218, 3930.9683]
2025-09-16 12:52:09,590 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [308.0, 363.0, 987.0, 1000.0, 579.0, 764.0, 735.0, 535.0, 583.0, 846.0]
2025-09-16 12:52:09,598 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 9 minutes)
2025-09-16 12:54:03,243 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:54:14,559 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4045.32764 ± 1033.650
2025-09-16 12:54:14,559 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4577.728, 4809.7417, 4829.7783, 2580.979, 1700.7607, 3709.3796, 4863.178, 4044.5007, 4692.597, 4644.632]
2025-09-16 12:54:14,559 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 503.0, 329.0, 721.0, 1000.0, 890.0, 1000.0, 906.0]
2025-09-16 12:54:14,574 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 7 minutes, 10 seconds)
2025-09-16 12:56:15,018 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:56:26,190 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4183.44141 ± 1445.363
2025-09-16 12:56:26,191 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1942.0573, 5091.095, 5113.531, 4703.0034, 5127.464, 4960.126, 962.9968, 5090.9756, 3678.1511, 5165.0107]
2025-09-16 12:56:26,191 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [382.0, 1000.0, 1000.0, 912.0, 1000.0, 1000.0, 182.0, 1000.0, 716.0, 996.0]
2025-09-16 12:56:26,201 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 5 minutes, 17 seconds)
2025-09-16 12:58:12,772 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:58:25,148 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4594.79346 ± 879.622
2025-09-16 12:58:25,149 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2152.7012, 5115.154, 5104.265, 4059.501, 4624.0884, 5178.3804, 4587.2764, 5120.36, 4950.5356, 5055.6704]
2025-09-16 12:58:25,149 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [424.0, 1000.0, 1000.0, 792.0, 915.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:58:25,159 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 3 minutes, 7 seconds)
2025-09-16 13:00:17,781 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:00:31,680 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4856.67871 ± 45.933
2025-09-16 13:00:31,680 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4777.331, 4864.2124, 4913.2593, 4855.707, 4827.0425, 4935.5186, 4811.0723, 4828.987, 4896.7085, 4856.9487]
2025-09-16 13:00:31,680 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:00:31,680 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (4856.68) for latency 3
2025-09-16 13:00:31,688 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 2 minutes, 6 seconds)
2025-09-16 13:02:25,533 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:02:39,153 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5021.95898 ± 87.035
2025-09-16 13:02:39,153 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5059.7485, 5054.0337, 5016.1562, 5055.4624, 5092.7847, 5079.234, 5059.134, 5036.2344, 4992.6265, 4774.1772]
2025-09-16 13:02:39,153 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 949.0]
2025-09-16 13:02:39,153 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5021.96) for latency 3
2025-09-16 13:02:39,162 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 51 seconds)
2025-09-16 13:04:31,710 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:04:45,275 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4929.98438 ± 37.807
2025-09-16 13:04:45,275 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4947.0366, 4929.6343, 4953.9097, 4989.7886, 4945.858, 4885.9224, 4961.272, 4940.626, 4877.928, 4867.869]
2025-09-16 13:04:45,275 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:04:45,282 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 58 minutes, 51 seconds)
2025-09-16 13:06:37,940 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:06:51,810 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4818.48584 ± 101.904
2025-09-16 13:06:51,810 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4761.782, 4735.6216, 4733.1377, 4968.739, 4754.92, 4761.6685, 4747.791, 4803.7227, 5036.4424, 4881.035]
2025-09-16 13:06:51,810 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:06:51,821 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 56 minutes, 18 seconds)
2025-09-16 13:08:44,405 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:08:58,006 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5027.25684 ± 41.799
2025-09-16 13:08:58,006 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4988.9165, 4963.1167, 5036.545, 5094.834, 5030.2686, 5070.4844, 5054.8364, 5059.5107, 4974.3994, 4999.6523]
2025-09-16 13:08:58,006 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:08:58,006 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5027.26) for latency 3
2025-09-16 13:08:58,016 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 54 minutes, 50 seconds)
2025-09-16 13:10:53,519 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:10:57,727 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1752.04077 ± 1324.809
2025-09-16 13:10:57,727 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5104.202, 3194.6472, 890.93646, 1038.3889, 883.4701, 975.8103, 917.9855, 869.6458, 2088.2678, 1557.0546]
2025-09-16 13:10:57,727 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [963.0, 604.0, 168.0, 199.0, 165.0, 193.0, 174.0, 166.0, 391.0, 290.0]
2025-09-16 13:10:57,737 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 52 minutes, 10 seconds)
2025-09-16 13:12:52,754 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:13:05,866 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4789.80469 ± 660.222
2025-09-16 13:13:05,866 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5176.7656, 5167.316, 5035.6055, 4687.333, 4690.09, 5040.776, 2874.6296, 5093.114, 5137.416, 4995.0]
2025-09-16 13:13:05,866 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 561.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:13:05,874 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 50 minutes, 8 seconds)
2025-09-16 13:14:47,221 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:14:59,224 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4437.97070 ± 926.933
2025-09-16 13:14:59,224 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4083.8083, 3567.031, 5062.5474, 5082.498, 4882.183, 4804.327, 5085.8604, 4687.525, 2052.954, 5070.9727]
2025-09-16 13:14:59,224 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [796.0, 702.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 415.0, 1000.0]
2025-09-16 13:14:59,231 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 47 minutes, 4 seconds)
2025-09-16 13:16:58,077 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:17:09,782 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4368.15430 ± 1313.834
2025-09-16 13:17:09,782 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5113.6797, 5143.692, 5112.522, 4266.0835, 5146.55, 1836.6561, 5085.779, 5137.1665, 1743.4039, 5096.0107]
2025-09-16 13:17:09,782 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 826.0, 1000.0, 367.0, 1000.0, 1000.0, 339.0, 1000.0]
2025-09-16 13:17:09,792 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 45 minutes, 19 seconds)
2025-09-16 13:19:05,482 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:19:17,170 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4146.92725 ± 1303.085
2025-09-16 13:19:17,170 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4716.1978, 4865.407, 4803.421, 4775.1455, 4806.4336, 4949.7314, 1838.1602, 4717.2476, 1275.002, 4722.532]
2025-09-16 13:19:17,170 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 364.0, 1000.0, 248.0, 1000.0]
2025-09-16 13:19:17,178 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 43 minutes, 20 seconds)
2025-09-16 13:21:08,884 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:21:22,287 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5120.36963 ± 99.766
2025-09-16 13:21:22,287 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5139.9453, 5219.332, 5210.0396, 5178.381, 5188.519, 4894.525, 5179.5547, 5044.0776, 5008.3335, 5140.988]
2025-09-16 13:21:22,287 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 993.0, 1000.0]
2025-09-16 13:21:22,287 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5120.37) for latency 3
2025-09-16 13:21:22,294 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 41 minutes, 38 seconds)
2025-09-16 13:23:13,663 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:23:26,082 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4563.60693 ± 936.130
2025-09-16 13:23:26,082 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4736.1167, 2840.9788, 5141.5156, 5155.8574, 5001.373, 5175.319, 2604.186, 5136.122, 5135.828, 4708.774]
2025-09-16 13:23:26,082 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 545.0, 1000.0, 1000.0, 1000.0, 1000.0, 505.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:23:26,094 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 39 minutes, 16 seconds)
2025-09-16 13:25:16,940 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:25:30,523 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5120.18652 ± 17.005
2025-09-16 13:25:30,523 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5120.9575, 5129.3384, 5128.2944, 5122.806, 5122.821, 5120.7324, 5117.5137, 5144.1333, 5121.4023, 5073.8677]
2025-09-16 13:25:30,523 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:25:30,533 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 37 minutes, 52 seconds)
2025-09-16 13:27:17,190 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:27:30,345 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5155.85498 ± 63.632
2025-09-16 13:27:30,345 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5131.811, 4999.469, 5157.8384, 5184.3853, 5237.944, 5169.763, 5191.055, 5202.0938, 5186.541, 5097.6523]
2025-09-16 13:27:30,345 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [988.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:27:30,345 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5155.85) for latency 3
2025-09-16 13:27:30,357 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 35 minutes, 9 seconds)
2025-09-16 13:29:14,939 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:29:26,517 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4533.29102 ± 1058.787
2025-09-16 13:29:26,518 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5122.2363, 5268.8784, 3205.0955, 5135.5728, 5151.001, 5199.7603, 2106.1025, 5218.6797, 5169.4805, 3756.1008]
2025-09-16 13:29:26,518 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 615.0, 1000.0, 1000.0, 1000.0, 402.0, 1000.0, 1000.0, 712.0]
2025-09-16 13:29:26,529 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 32 minutes, 29 seconds)
2025-09-16 13:31:15,355 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:31:28,713 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4894.17676 ± 226.228
2025-09-16 13:31:28,713 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5073.6816, 5081.339, 5082.114, 4625.6895, 5073.8164, 4616.4727, 5081.107, 5077.917, 4653.2627, 4576.368]
2025-09-16 13:31:28,713 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:31:28,725 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 30 minutes, 19 seconds)
2025-09-16 13:33:22,033 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:33:35,104 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5215.85205 ± 23.566
2025-09-16 13:33:35,104 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5233.6436, 5220.426, 5155.0854, 5241.0347, 5227.68, 5216.4785, 5237.347, 5199.2393, 5214.9297, 5212.6577]
2025-09-16 13:33:35,104 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:33:35,104 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5215.85) for latency 3
2025-09-16 13:33:35,114 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 28 minutes, 25 seconds)
2025-09-16 13:35:23,814 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:35:36,829 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5265.75098 ± 16.105
2025-09-16 13:35:36,829 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5237.497, 5269.679, 5279.6606, 5237.553, 5272.135, 5256.89, 5269.2017, 5267.2793, 5285.55, 5282.0566]
2025-09-16 13:35:36,830 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:35:36,830 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5265.75) for latency 3
2025-09-16 13:35:36,845 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 26 minutes, 16 seconds)
2025-09-16 13:37:23,171 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:37:31,798 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3118.61475 ± 1910.367
2025-09-16 13:37:31,798 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4825.9287, 582.3353, 3796.6357, 608.3313, 4700.9355, 1096.1874, 4851.1562, 4767.449, 974.52014, 4982.6685]
2025-09-16 13:37:31,798 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 123.0, 754.0, 144.0, 1000.0, 233.0, 1000.0, 1000.0, 215.0, 1000.0]
2025-09-16 13:37:31,807 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 24 minutes, 3 seconds)
2025-09-16 13:39:24,596 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:39:37,799 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5255.07861 ± 14.711
2025-09-16 13:39:37,799 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5247.2656, 5278.369, 5259.211, 5245.033, 5266.727, 5264.154, 5223.686, 5255.143, 5244.6753, 5266.519]
2025-09-16 13:39:37,799 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:39:37,810 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 24 seconds)
2025-09-16 13:41:39,605 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:41:53,071 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5086.14697 ± 18.557
2025-09-16 13:41:53,071 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5103.224, 5089.1177, 5089.0425, 5057.2485, 5093.2607, 5044.87, 5105.7783, 5095.9756, 5089.6704, 5093.2827]
2025-09-16 13:41:53,071 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:41:53,080 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes, 48 seconds)
2025-09-16 13:43:45,302 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:43:58,725 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5220.72510 ± 24.804
2025-09-16 13:43:58,725 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5258.462, 5229.8296, 5214.536, 5235.0654, 5166.142, 5244.8296, 5225.4565, 5227.7905, 5209.968, 5195.1714]
2025-09-16 13:43:58,725 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:43:58,738 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 42 seconds)
2025-09-16 13:45:51,320 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:46:04,788 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5132.42236 ± 72.024
2025-09-16 13:46:04,788 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4964.067, 5150.917, 5173.331, 5155.6416, 5043.6167, 5174.4346, 5195.35, 5100.265, 5159.273, 5207.324]
2025-09-16 13:46:04,788 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:46:04,799 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 44 seconds)
2025-09-16 13:47:47,069 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:48:00,694 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5047.51074 ± 71.699
2025-09-16 13:48:00,695 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5076.2163, 5113.207, 5051.255, 5078.545, 5077.4116, 5067.3413, 4840.391, 5051.009, 5040.0312, 5079.6978]
2025-09-16 13:48:00,695 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:48:00,703 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 40 seconds)
2025-09-16 13:49:53,060 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:50:05,083 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4151.87500 ± 1195.335
2025-09-16 13:50:05,083 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4622.3926, 4808.887, 1032.7174, 4708.7656, 4636.4487, 4732.7603, 4706.8447, 2749.8982, 4832.4316, 4687.602]
2025-09-16 13:50:05,083 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 221.0, 1000.0, 1000.0, 1000.0, 1000.0, 590.0, 1000.0, 1000.0]
2025-09-16 13:50:05,095 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 32 seconds)
2025-09-16 13:51:57,464 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:52:10,031 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4807.74756 ± 1054.529
2025-09-16 13:52:10,031 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5156.079, 5154.365, 5175.7896, 5160.889, 5147.724, 1644.2476, 5162.7593, 5167.85, 5156.4863, 5151.29]
2025-09-16 13:52:10,031 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 338.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:52:10,039 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 16 seconds)
2025-09-16 13:54:02,413 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:54:16,148 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5068.05176 ± 31.850
2025-09-16 13:54:16,149 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5041.9087, 5091.2046, 5112.707, 5104.9805, 5074.1455, 5081.568, 5018.291, 5071.925, 5014.6196, 5069.1675]
2025-09-16 13:54:16,149 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:54:16,157 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 13 seconds)
2025-09-16 13:56:08,802 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:56:22,536 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4831.81152 ± 424.146
2025-09-16 13:56:22,536 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4964.854, 5046.0586, 4937.0225, 4946.3486, 4970.9644, 4973.6978, 4983.7153, 4955.106, 3562.1624, 4978.187]
2025-09-16 13:56:22,536 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 702.0, 1000.0]
2025-09-16 13:56:22,549 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 10 seconds)
2025-09-16 13:58:20,905 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:58:33,776 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4616.69629 ± 979.981
2025-09-16 13:58:33,776 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4941.857, 4944.2256, 4945.5054, 4939.618, 4903.527, 4949.9917, 1677.1681, 4949.3228, 4940.4834, 4975.261]
2025-09-16 13:58:33,776 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 350.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:58:33,785 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 13 seconds)
2025-09-16 14:00:26,212 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:00:39,698 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5173.89062 ± 18.224
2025-09-16 14:00:39,698 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5169.414, 5140.328, 5169.8706, 5178.3125, 5194.577, 5142.8857, 5194.9443, 5188.7944, 5182.39, 5177.391]
2025-09-16 14:00:39,698 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:00:39,707 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 6 seconds)
2025-09-16 14:02:34,313 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:02:47,923 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5216.20850 ± 29.190
2025-09-16 14:02:47,924 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5252.415, 5160.3955, 5230.491, 5224.7197, 5189.676, 5235.243, 5176.7007, 5212.262, 5244.27, 5235.91]
2025-09-16 14:02:47,924 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:02:47,934 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1251 [DEBUG]: Training session finished
