2025-08-07 10:14:47,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc20-humanoid/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:14:47,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc20-humanoid/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:14:47,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x1536253e3f10>}
2025-08-07 10:14:47,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1111 [DEBUG]: using device: cuda
2025-08-07 10:14:47,115 baseline-bpql-noiseperc20-humanoid:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 10:14:47,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1133 [INFO]: Creating new trainer
2025-08-07 10:14:47,133 baseline-bpql-noiseperc20-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=648, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-08-07 10:14:47,133 baseline-bpql-noiseperc20-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 10:14:48,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1194 [DEBUG]: Starting training session...
2025-08-07 10:14:48,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 1/100
2025-08-07 10:16:34,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:16:35,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 283.53302 ± 105.220
2025-08-07 10:16:35,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [260.80035, 310.31384, 329.0379, 463.38748, 327.81433, 114.42849, 378.02222, 260.75644, 96.60133, 294.1679]
2025-08-07 10:16:35,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [49.0, 56.0, 62.0, 95.0, 61.0, 22.0, 69.0, 49.0, 19.0, 58.0]
2025-08-07 10:16:35,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1226 [INFO]: New best (283.53) for latency MM1Queue_a033_s075
2025-08-07 10:16:35,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 56 minutes, 16 seconds)
2025-08-07 10:18:30,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:18:30,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 216.25577 ± 72.144
2025-08-07 10:18:30,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [240.6579, 235.43155, 189.06178, 240.41956, 268.96762, 96.41746, 316.63278, 163.30711, 107.02744, 304.63452]
2025-08-07 10:18:30,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [47.0, 47.0, 36.0, 46.0, 49.0, 19.0, 60.0, 34.0, 21.0, 60.0]
2025-08-07 10:18:30,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 1 minute, 19 seconds)
2025-08-07 10:20:26,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:20:27,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 224.78899 ± 127.030
2025-08-07 10:20:27,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [102.38336, 106.96492, 395.37793, 95.72735, 263.60855, 124.39285, 327.40414, 436.2467, 101.803474, 293.98056]
2025-08-07 10:20:27,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 21.0, 86.0, 19.0, 50.0, 24.0, 62.0, 92.0, 20.0, 62.0]
2025-08-07 10:20:27,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 2 minutes, 15 seconds)
2025-08-07 10:22:21,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:22:22,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 293.91827 ± 129.198
2025-08-07 10:22:22,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [293.0239, 366.64612, 530.3819, 100.5364, 113.03911, 396.50455, 162.48099, 378.06375, 317.4993, 281.00662]
2025-08-07 10:22:22,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 72.0, 99.0, 20.0, 22.0, 72.0, 33.0, 69.0, 60.0, 59.0]
2025-08-07 10:22:22,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1226 [INFO]: New best (293.92) for latency MM1Queue_a033_s075
2025-08-07 10:22:22,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 1 minute, 17 seconds)
2025-08-07 10:24:17,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:24:18,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 258.80988 ± 144.780
2025-08-07 10:24:18,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [295.87845, 606.12836, 118.566414, 179.22345, 270.6837, 326.16803, 223.302, 354.63553, 95.72874, 117.78428]
2025-08-07 10:24:18,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 122.0, 23.0, 34.0, 56.0, 70.0, 47.0, 72.0, 19.0, 23.0]
2025-08-07 10:24:18,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 15 seconds)
2025-08-07 10:26:12,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:26:13,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 235.78694 ± 136.683
2025-08-07 10:26:13,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [122.41129, 126.05337, 545.19, 210.68312, 122.666145, 106.2735, 359.63116, 324.05453, 141.71942, 299.18668]
2025-08-07 10:26:13,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 25.0, 107.0, 43.0, 24.0, 21.0, 71.0, 60.0, 27.0, 59.0]
2025-08-07 10:26:13,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 1 minute, 1 second)
2025-08-07 10:28:08,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:28:09,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 273.48306 ± 134.099
2025-08-07 10:28:09,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [283.0597, 401.18152, 99.80944, 105.73483, 308.6915, 293.93625, 537.45374, 95.14053, 306.49936, 303.324]
2025-08-07 10:28:09,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 84.0, 20.0, 21.0, 59.0, 59.0, 102.0, 19.0, 58.0, 60.0]
2025-08-07 10:28:09,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 59 minutes, 13 seconds)
2025-08-07 10:30:05,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:30:06,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 243.21854 ± 113.685
2025-08-07 10:30:06,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [112.88922, 262.99442, 286.1074, 156.45757, 289.7771, 313.67062, 335.39014, 464.20584, 108.37869, 102.31439]
2025-08-07 10:30:06,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 51.0, 53.0, 30.0, 54.0, 60.0, 62.0, 91.0, 21.0, 20.0]
2025-08-07 10:30:06,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 57 minutes, 41 seconds)
2025-08-07 10:32:01,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:32:02,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 245.80493 ± 167.687
2025-08-07 10:32:02,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [590.4801, 355.8407, 88.80192, 383.34473, 95.55038, 96.233925, 270.4914, 101.831245, 95.790794, 379.68396]
2025-08-07 10:32:02,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 63.0, 18.0, 71.0, 19.0, 19.0, 51.0, 20.0, 19.0, 78.0]
2025-08-07 10:32:02,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 56 minutes, 7 seconds)
2025-08-07 10:33:58,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:33:58,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 264.73312 ± 102.712
2025-08-07 10:33:58,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [329.75, 378.24255, 345.66833, 277.58987, 128.82616, 326.2704, 311.00827, 340.19604, 89.45082, 120.3287]
2025-08-07 10:33:58,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 74.0, 66.0, 52.0, 25.0, 62.0, 65.0, 63.0, 18.0, 23.0]
2025-08-07 10:33:58,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 54 minutes, 13 seconds)
2025-08-07 10:35:55,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:35:55,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 292.67734 ± 161.780
2025-08-07 10:35:55,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [545.07764, 224.3416, 107.38988, 384.51608, 94.9195, 253.56772, 363.08185, 569.58527, 128.5023, 255.79166]
2025-08-07 10:35:55,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 45.0, 21.0, 69.0, 19.0, 48.0, 65.0, 107.0, 25.0, 50.0]
2025-08-07 10:35:55,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 52 minutes, 47 seconds)
2025-08-07 10:37:51,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:37:51,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 189.21788 ± 120.276
2025-08-07 10:37:51,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [106.62571, 124.46312, 89.27857, 135.70496, 320.0036, 117.981384, 414.97983, 101.96894, 108.7541, 372.4186]
2025-08-07 10:37:51,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 24.0, 18.0, 26.0, 60.0, 23.0, 78.0, 20.0, 22.0, 69.0]
2025-08-07 10:37:51,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 50 minutes, 54 seconds)
2025-08-07 10:39:47,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:39:48,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 297.63138 ± 159.515
2025-08-07 10:39:48,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [497.9361, 444.91385, 368.02927, 106.36382, 441.81625, 118.862045, 354.3008, 437.38272, 111.15561, 95.55313]
2025-08-07 10:39:48,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 84.0, 68.0, 21.0, 84.0, 23.0, 65.0, 87.0, 22.0, 19.0]
2025-08-07 10:39:48,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1226 [INFO]: New best (297.63) for latency MM1Queue_a033_s075
2025-08-07 10:39:48,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 48 minutes, 48 seconds)
2025-08-07 10:41:44,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:41:44,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 288.32135 ± 162.720
2025-08-07 10:41:44,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [383.7099, 89.06804, 108.5506, 441.19617, 351.04056, 95.6084, 136.97285, 263.29092, 515.24695, 498.5291]
2025-08-07 10:41:44,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 18.0, 21.0, 83.0, 64.0, 19.0, 27.0, 49.0, 98.0, 91.0]
2025-08-07 10:41:44,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 46 minutes, 54 seconds)
2025-08-07 10:43:40,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:43:40,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 224.51065 ± 108.087
2025-08-07 10:43:40,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [108.180405, 138.09691, 113.786705, 262.67374, 192.10583, 162.31306, 149.5939, 428.1333, 367.79712, 322.42557]
2025-08-07 10:43:40,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 27.0, 22.0, 55.0, 38.0, 32.0, 29.0, 77.0, 67.0, 64.0]
2025-08-07 10:43:40,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 44 minutes, 51 seconds)
2025-08-07 10:45:37,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:45:37,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 229.35770 ± 144.527
2025-08-07 10:45:37,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [131.88417, 133.05363, 476.4038, 123.54195, 100.84162, 403.77643, 299.62668, 101.4912, 107.225876, 415.7315]
2025-08-07 10:45:37,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 26.0, 90.0, 24.0, 20.0, 74.0, 56.0, 20.0, 21.0, 79.0]
2025-08-07 10:45:37,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 42 minutes, 54 seconds)
2025-08-07 10:47:33,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:47:33,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 237.85864 ± 116.289
2025-08-07 10:47:33,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [89.15169, 134.36275, 378.55853, 168.62288, 354.7293, 357.0556, 325.45667, 343.42422, 102.27119, 124.95368]
2025-08-07 10:47:33,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 26.0, 74.0, 32.0, 67.0, 67.0, 59.0, 65.0, 20.0, 24.0]
2025-08-07 10:47:33,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 41 minutes, 2 seconds)
2025-08-07 10:49:30,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:49:31,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 294.15387 ± 159.661
2025-08-07 10:49:31,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [524.4787, 95.31712, 144.62814, 335.11218, 569.17065, 89.279625, 307.69266, 269.61847, 402.46802, 203.77286]
2025-08-07 10:49:31,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 19.0, 28.0, 60.0, 107.0, 18.0, 56.0, 54.0, 75.0, 38.0]
2025-08-07 10:49:31,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 39 minutes, 15 seconds)
2025-08-07 10:51:26,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:51:27,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 375.58969 ± 219.780
2025-08-07 10:51:27,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [530.965, 95.66122, 426.7288, 857.289, 134.8995, 359.09024, 435.19254, 96.01431, 452.39355, 367.66272]
2025-08-07 10:51:27,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 19.0, 78.0, 170.0, 26.0, 68.0, 79.0, 19.0, 85.0, 66.0]
2025-08-07 10:51:27,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1226 [INFO]: New best (375.59) for latency MM1Queue_a033_s075
2025-08-07 10:51:27,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 37 minutes, 21 seconds)
2025-08-07 10:53:23,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:53:24,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 249.21411 ± 166.835
2025-08-07 10:53:24,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [113.18716, 107.693535, 95.08545, 464.16083, 106.8052, 435.26166, 181.06982, 349.09183, 107.35267, 532.4329]
2025-08-07 10:53:24,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 21.0, 19.0, 89.0, 21.0, 81.0, 35.0, 64.0, 21.0, 99.0]
2025-08-07 10:53:24,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 35 minutes, 36 seconds)
2025-08-07 10:55:19,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:55:20,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 270.76733 ± 174.334
2025-08-07 10:55:20,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [89.847496, 390.06296, 527.7529, 96.45738, 119.07865, 472.89667, 410.6167, 96.01247, 100.97818, 403.9698]
2025-08-07 10:55:20,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 78.0, 111.0, 19.0, 23.0, 95.0, 75.0, 19.0, 20.0, 80.0]
2025-08-07 10:55:20,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 33 minutes, 31 seconds)
2025-08-07 10:57:17,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:57:17,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 203.31511 ± 124.215
2025-08-07 10:57:17,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [138.62447, 287.16986, 146.28564, 124.0408, 393.63028, 158.6924, 101.79493, 107.08344, 112.63183, 463.19757]
2025-08-07 10:57:17,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 54.0, 28.0, 24.0, 76.0, 31.0, 20.0, 21.0, 22.0, 85.0]
2025-08-07 10:57:17,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 31 minutes, 51 seconds)
2025-08-07 10:59:12,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:59:13,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 282.60654 ± 144.887
2025-08-07 10:59:13,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [183.99226, 408.38947, 563.0556, 245.7489, 383.76913, 371.89072, 119.14675, 119.351006, 106.39825, 324.32336]
2025-08-07 10:59:13,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 76.0, 106.0, 45.0, 73.0, 68.0, 23.0, 23.0, 21.0, 58.0]
2025-08-07 10:59:13,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 29 minutes, 30 seconds)
2025-08-07 11:01:09,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:01:09,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 192.31323 ± 121.370
2025-08-07 11:01:09,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [374.77646, 88.95142, 96.22921, 110.79236, 129.96368, 132.63354, 126.18225, 119.14335, 319.6136, 424.84637]
2025-08-07 11:01:09,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 18.0, 19.0, 22.0, 25.0, 26.0, 25.0, 23.0, 59.0, 79.0]
2025-08-07 11:01:09,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 27 minutes, 29 seconds)
2025-08-07 11:03:06,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:03:06,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 284.24802 ± 156.019
2025-08-07 11:03:06,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [320.65686, 426.80554, 95.58989, 240.04584, 117.99811, 628.763, 347.34476, 161.05931, 160.79959, 343.41742]
2025-08-07 11:03:06,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 91.0, 19.0, 51.0, 23.0, 120.0, 63.0, 31.0, 31.0, 75.0]
2025-08-07 11:03:06,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 25 minutes, 41 seconds)
2025-08-07 11:05:03,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:05:04,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 277.11029 ± 152.886
2025-08-07 11:05:04,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [119.66613, 433.7914, 486.89728, 351.65195, 430.71915, 113.70342, 146.89227, 430.44293, 129.36638, 127.97204]
2025-08-07 11:05:04,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 81.0, 90.0, 65.0, 81.0, 22.0, 28.0, 79.0, 25.0, 25.0]
2025-08-07 11:05:04,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 23 minutes, 55 seconds)
2025-08-07 11:07:00,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:07:01,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 211.82568 ± 144.879
2025-08-07 11:07:01,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [140.25603, 293.5222, 547.6422, 140.40625, 396.32175, 102.23271, 108.05597, 176.47047, 101.73751, 111.611855]
2025-08-07 11:07:01,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 55.0, 107.0, 27.0, 73.0, 20.0, 21.0, 34.0, 20.0, 22.0]
2025-08-07 11:07:01,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 21 minutes, 57 seconds)
2025-08-07 11:08:56,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:08:57,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 331.34412 ± 122.060
2025-08-07 11:08:57,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [255.37872, 138.74304, 434.058, 333.7167, 406.6988, 360.7312, 292.91632, 129.75728, 514.4925, 446.9487]
2025-08-07 11:08:57,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [49.0, 27.0, 80.0, 60.0, 82.0, 79.0, 57.0, 25.0, 95.0, 84.0]
2025-08-07 11:08:57,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 20 minutes, 6 seconds)
2025-08-07 11:10:53,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:10:53,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 236.43350 ± 162.911
2025-08-07 11:10:53,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [508.83258, 145.3928, 89.19725, 121.0213, 89.76648, 474.67728, 125.23896, 102.04907, 430.0286, 278.13055]
2025-08-07 11:10:53,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 28.0, 18.0, 24.0, 18.0, 87.0, 24.0, 20.0, 79.0, 50.0]
2025-08-07 11:10:53,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 18 minutes, 10 seconds)
2025-08-07 11:12:48,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:12:49,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 316.37662 ± 191.247
2025-08-07 11:12:49,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [393.756, 95.62513, 102.1496, 340.0487, 161.17233, 447.50238, 548.35736, 682.80536, 139.96956, 252.37956]
2025-08-07 11:12:49,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 19.0, 20.0, 63.0, 31.0, 82.0, 119.0, 130.0, 28.0, 50.0]
2025-08-07 11:12:49,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 15 minutes, 57 seconds)
2025-08-07 11:14:45,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:14:46,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 364.87149 ± 144.370
2025-08-07 11:14:46,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [649.80597, 139.52454, 513.6434, 347.90903, 375.37225, 343.62692, 158.31331, 402.46143, 427.32693, 290.73117]
2025-08-07 11:14:46,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 27.0, 103.0, 64.0, 68.0, 64.0, 30.0, 76.0, 93.0, 56.0]
2025-08-07 11:14:46,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 14 minutes)
2025-08-07 11:16:42,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:16:43,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 284.11185 ± 138.906
2025-08-07 11:16:43,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [522.99365, 320.9957, 405.62292, 354.49033, 100.92311, 113.11191, 407.3106, 313.67508, 134.93124, 167.06363]
2025-08-07 11:16:43,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 58.0, 74.0, 64.0, 20.0, 22.0, 76.0, 69.0, 26.0, 32.0]
2025-08-07 11:16:43,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 11 minutes, 58 seconds)
2025-08-07 11:18:38,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:18:38,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 271.98975 ± 133.611
2025-08-07 11:18:38,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [118.465195, 448.94867, 395.9621, 341.12137, 313.34363, 309.20462, 107.505264, 440.11996, 137.49066, 107.73578]
2025-08-07 11:18:38,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 88.0, 87.0, 63.0, 59.0, 57.0, 21.0, 95.0, 27.0, 21.0]
2025-08-07 11:18:38,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 9 minutes, 50 seconds)
2025-08-07 11:20:35,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:20:35,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 235.92789 ± 123.314
2025-08-07 11:20:35,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [90.25028, 107.001945, 353.95328, 411.28833, 133.38205, 306.66113, 326.3921, 173.20988, 84.01101, 373.129]
2025-08-07 11:20:35,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 21.0, 66.0, 79.0, 26.0, 57.0, 75.0, 34.0, 17.0, 67.0]
2025-08-07 11:20:35,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 8 minutes, 3 seconds)
2025-08-07 11:22:32,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:22:33,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 237.20004 ± 114.610
2025-08-07 11:22:33,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [95.01754, 393.47498, 131.00298, 124.69991, 129.46655, 377.14496, 338.1625, 287.30646, 151.26767, 344.45667]
2025-08-07 11:22:33,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 71.0, 26.0, 24.0, 25.0, 70.0, 64.0, 53.0, 29.0, 62.0]
2025-08-07 11:22:33,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 6 minutes, 25 seconds)
2025-08-07 11:24:28,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:24:29,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 176.17340 ± 130.827
2025-08-07 11:24:29,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [113.04086, 101.55557, 90.1751, 317.08466, 149.83502, 520.1099, 116.70211, 105.835724, 149.89702, 97.49799]
2025-08-07 11:24:29,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 20.0, 18.0, 69.0, 29.0, 110.0, 23.0, 21.0, 29.0, 19.0]
2025-08-07 11:24:29,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 4 minutes, 17 seconds)
2025-08-07 11:26:25,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:26:26,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 184.71130 ± 131.674
2025-08-07 11:26:26,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [101.67514, 91.0062, 127.78952, 119.74842, 94.77975, 522.7962, 114.121185, 331.73422, 150.29643, 193.16595]
2025-08-07 11:26:26,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 18.0, 25.0, 23.0, 19.0, 108.0, 22.0, 62.0, 29.0, 37.0]
2025-08-07 11:26:26,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 2 minutes, 23 seconds)
2025-08-07 11:28:21,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:28:22,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 237.77475 ± 135.351
2025-08-07 11:28:22,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [146.76642, 129.36084, 417.71143, 468.40375, 385.75806, 96.86971, 156.63605, 298.4338, 193.80434, 84.00329]
2025-08-07 11:28:22,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 25.0, 79.0, 88.0, 73.0, 19.0, 30.0, 53.0, 38.0, 17.0]
2025-08-07 11:28:22,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 32 seconds)
2025-08-07 11:30:17,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:30:18,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 229.56100 ± 142.644
2025-08-07 11:30:18,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [353.50385, 507.3903, 89.306, 253.81992, 95.38121, 336.57846, 95.195946, 347.32214, 108.22979, 108.88266]
2025-08-07 11:30:18,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 91.0, 18.0, 48.0, 19.0, 62.0, 19.0, 66.0, 21.0, 21.0]
2025-08-07 11:30:18,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 58 minutes, 26 seconds)
2025-08-07 11:32:14,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:32:15,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 237.94827 ± 162.398
2025-08-07 11:32:15,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [473.02295, 118.5863, 532.1051, 329.9072, 102.841225, 95.55182, 119.71608, 371.72842, 112.38406, 123.63962]
2025-08-07 11:32:15,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 23.0, 105.0, 60.0, 20.0, 19.0, 23.0, 70.0, 22.0, 24.0]
2025-08-07 11:32:15,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 56 minutes, 22 seconds)
2025-08-07 11:34:10,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:34:11,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 225.06868 ± 137.298
2025-08-07 11:34:11,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [117.22712, 281.61914, 397.52078, 124.43766, 107.40108, 366.4646, 139.39198, 485.5687, 95.81915, 135.23647]
2025-08-07 11:34:11,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 59.0, 74.0, 24.0, 21.0, 69.0, 27.0, 90.0, 19.0, 26.0]
2025-08-07 11:34:11,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 54 minutes, 22 seconds)
2025-08-07 11:36:06,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:36:07,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 279.41071 ± 119.528
2025-08-07 11:36:07,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [375.75845, 469.46185, 121.47767, 96.64855, 290.21936, 135.31293, 403.72748, 291.86472, 278.92313, 330.71252]
2025-08-07 11:36:07,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 89.0, 24.0, 19.0, 62.0, 26.0, 74.0, 57.0, 56.0, 60.0]
2025-08-07 11:36:07,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 52 minutes, 18 seconds)
2025-08-07 11:38:02,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:38:03,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 293.40652 ± 203.828
2025-08-07 11:38:03,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [469.78043, 655.9943, 88.859024, 426.81558, 94.21602, 310.53024, 105.050125, 536.76526, 139.03647, 107.01751]
2025-08-07 11:38:03,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 124.0, 18.0, 83.0, 19.0, 56.0, 21.0, 99.0, 27.0, 21.0]
2025-08-07 11:38:03,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 50 minutes, 27 seconds)
2025-08-07 11:40:00,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:40:00,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 264.06320 ± 135.713
2025-08-07 11:40:00,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [96.37555, 391.6709, 104.59565, 368.36063, 330.97943, 108.5541, 107.868614, 363.22702, 308.44727, 460.55295]
2025-08-07 11:40:00,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 70.0, 21.0, 68.0, 61.0, 21.0, 21.0, 67.0, 63.0, 86.0]
2025-08-07 11:40:00,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 48 minutes, 45 seconds)
2025-08-07 11:41:57,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:41:58,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 253.87720 ± 166.440
2025-08-07 11:41:58,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [127.123184, 596.47626, 298.23926, 101.605034, 455.54938, 317.14902, 115.397606, 99.921844, 331.20868, 96.10171]
2025-08-07 11:41:58,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 115.0, 56.0, 20.0, 84.0, 61.0, 23.0, 20.0, 65.0, 19.0]
2025-08-07 11:41:58,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 46 minutes, 52 seconds)
2025-08-07 11:43:53,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:43:54,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 258.52594 ± 182.404
2025-08-07 11:43:54,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [174.82756, 112.746414, 581.497, 95.072105, 549.44617, 148.58717, 100.4602, 404.45438, 322.08548, 96.08295]
2025-08-07 11:43:54,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 22.0, 111.0, 19.0, 116.0, 29.0, 20.0, 80.0, 61.0, 19.0]
2025-08-07 11:43:54,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 44 minutes, 57 seconds)
2025-08-07 11:45:49,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:45:51,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 363.92401 ± 183.596
2025-08-07 11:45:51,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [529.0743, 134.97684, 357.9433, 89.50862, 638.3907, 398.33496, 393.87332, 443.0281, 545.74817, 108.36225]
2025-08-07 11:45:51,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 26.0, 67.0, 18.0, 121.0, 73.0, 72.0, 83.0, 99.0, 21.0]
2025-08-07 11:45:51,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 43 minutes, 8 seconds)
2025-08-07 11:47:46,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:47:47,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 306.63608 ± 134.397
2025-08-07 11:47:47,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [113.07021, 107.85803, 135.65698, 397.1014, 402.78958, 422.91437, 489.26028, 383.7238, 352.6353, 261.35095]
2025-08-07 11:47:47,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 21.0, 26.0, 76.0, 75.0, 79.0, 92.0, 81.0, 65.0, 52.0]
2025-08-07 11:47:47,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 41 minutes, 17 seconds)
2025-08-07 11:49:44,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:49:45,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 346.58081 ± 166.577
2025-08-07 11:49:45,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [382.4763, 129.59795, 580.796, 472.458, 107.42689, 371.7724, 346.88776, 505.98093, 100.64625, 467.76587]
2025-08-07 11:49:45,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 25.0, 106.0, 93.0, 21.0, 69.0, 64.0, 95.0, 20.0, 98.0]
2025-08-07 11:49:45,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 39 minutes, 17 seconds)
2025-08-07 11:51:40,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:51:42,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 365.26550 ± 193.363
2025-08-07 11:51:42,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [123.18405, 384.29703, 395.9006, 757.13, 541.7543, 397.24173, 130.94388, 106.44107, 444.84015, 370.92242]
2025-08-07 11:51:42,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 73.0, 74.0, 149.0, 106.0, 83.0, 25.0, 21.0, 95.0, 69.0]
2025-08-07 11:51:42,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 37 minutes, 20 seconds)
2025-08-07 11:53:38,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:53:38,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 283.36816 ± 143.355
2025-08-07 11:53:38,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [325.41788, 107.067276, 473.90692, 370.7399, 89.42564, 352.0911, 459.81616, 386.54532, 143.86517, 124.80612]
2025-08-07 11:53:38,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 21.0, 87.0, 69.0, 18.0, 65.0, 84.0, 71.0, 28.0, 24.0]
2025-08-07 11:53:38,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 35 minutes, 30 seconds)
2025-08-07 11:55:34,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:55:35,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 231.62910 ± 157.318
2025-08-07 11:55:35,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [95.66215, 95.03886, 89.848595, 117.27276, 130.25723, 458.60583, 136.43361, 288.50192, 399.17218, 505.4979]
2025-08-07 11:55:35,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 19.0, 18.0, 23.0, 25.0, 83.0, 26.0, 60.0, 76.0, 96.0]
2025-08-07 11:55:35,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 33 minutes, 27 seconds)
2025-08-07 11:57:30,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:57:31,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 323.94370 ± 121.495
2025-08-07 11:57:31,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [369.10474, 327.24023, 377.88885, 285.33966, 448.02075, 371.83365, 110.55442, 383.33917, 471.5599, 94.555885]
2025-08-07 11:57:31,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 60.0, 69.0, 53.0, 82.0, 69.0, 22.0, 78.0, 86.0, 19.0]
2025-08-07 11:57:31,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 31 minutes, 28 seconds)
2025-08-07 11:59:27,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:59:28,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 358.97247 ± 155.060
2025-08-07 11:59:28,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [422.66046, 402.44177, 620.47455, 368.33514, 108.50488, 524.64056, 338.46152, 393.55487, 321.5313, 89.119965]
2025-08-07 11:59:28,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 74.0, 132.0, 68.0, 21.0, 97.0, 62.0, 72.0, 60.0, 18.0]
2025-08-07 11:59:28,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 29 minutes, 25 seconds)
2025-08-07 12:01:24,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:01:25,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 374.91196 ± 101.107
2025-08-07 12:01:25,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [407.92926, 123.996956, 499.0393, 314.75784, 405.0657, 341.01743, 378.96808, 499.3039, 365.2525, 413.78857]
2025-08-07 12:01:25,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 24.0, 92.0, 62.0, 78.0, 62.0, 69.0, 108.0, 66.0, 78.0]
2025-08-07 12:01:25,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 27 minutes, 29 seconds)
2025-08-07 12:03:21,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:03:21,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 263.30249 ± 137.332
2025-08-07 12:03:21,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [332.49225, 114.920044, 119.78874, 151.90674, 405.53436, 403.23553, 408.64594, 158.18832, 100.72732, 437.58572]
2025-08-07 12:03:21,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 22.0, 23.0, 29.0, 74.0, 74.0, 77.0, 30.0, 20.0, 81.0]
2025-08-07 12:03:21,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 25 minutes, 30 seconds)
2025-08-07 12:05:17,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:05:18,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 422.13940 ± 170.254
2025-08-07 12:05:18,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [89.50429, 323.36728, 638.54065, 310.63776, 327.66287, 331.14203, 616.58704, 483.377, 646.6956, 453.87955]
2025-08-07 12:05:18,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 61.0, 118.0, 61.0, 61.0, 70.0, 113.0, 88.0, 123.0, 87.0]
2025-08-07 12:05:18,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1226 [INFO]: New best (422.14) for latency MM1Queue_a033_s075
2025-08-07 12:05:18,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 23 minutes, 39 seconds)
2025-08-07 12:07:15,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:07:16,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 270.86215 ± 140.603
2025-08-07 12:07:16,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [146.978, 112.508804, 466.65903, 89.139786, 364.60834, 90.97243, 352.25443, 324.7264, 455.36172, 305.41263]
2025-08-07 12:07:16,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 22.0, 84.0, 18.0, 64.0, 18.0, 65.0, 61.0, 88.0, 62.0]
2025-08-07 12:07:16,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 21 minutes, 51 seconds)
2025-08-07 12:09:12,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:09:13,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 294.32407 ± 136.832
2025-08-07 12:09:13,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [275.64093, 410.28796, 415.0575, 334.50476, 88.91809, 145.38576, 101.25242, 514.7118, 280.5195, 376.96194]
2025-08-07 12:09:13,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [53.0, 85.0, 77.0, 65.0, 18.0, 28.0, 20.0, 96.0, 51.0, 68.0]
2025-08-07 12:09:13,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 20 minutes)
2025-08-07 12:11:10,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:11:11,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 343.48819 ± 165.380
2025-08-07 12:11:11,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [535.9175, 480.65646, 457.34198, 88.67981, 94.61883, 354.45303, 504.12683, 129.79918, 417.14548, 372.14255]
2025-08-07 12:11:11,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 90.0, 91.0, 18.0, 19.0, 75.0, 92.0, 25.0, 78.0, 67.0]
2025-08-07 12:11:11,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 18 minutes, 6 seconds)
2025-08-07 12:13:06,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:13:07,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 286.19031 ± 155.276
2025-08-07 12:13:07,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [102.91776, 540.9493, 357.19482, 408.3831, 118.62385, 126.42307, 101.49555, 326.7628, 312.43, 466.72287]
2025-08-07 12:13:07,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 99.0, 69.0, 72.0, 23.0, 25.0, 20.0, 72.0, 60.0, 99.0]
2025-08-07 12:13:07,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 16 minutes, 5 seconds)
2025-08-07 12:15:03,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:15:04,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 260.78674 ± 129.496
2025-08-07 12:15:04,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [101.66804, 330.97794, 89.213486, 352.14236, 346.0022, 383.3223, 118.80155, 120.05642, 321.08466, 444.5983]
2025-08-07 12:15:04,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 69.0, 18.0, 63.0, 63.0, 78.0, 23.0, 23.0, 59.0, 85.0]
2025-08-07 12:15:04,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 14 minutes, 9 seconds)
2025-08-07 12:17:01,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:17:02,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 402.42081 ± 185.037
2025-08-07 12:17:02,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [435.8048, 489.61685, 436.1609, 287.05676, 723.4189, 620.409, 403.4088, 107.5831, 404.20996, 116.53911]
2025-08-07 12:17:02,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 91.0, 80.0, 53.0, 140.0, 118.0, 73.0, 21.0, 76.0, 23.0]
2025-08-07 12:17:02,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 12 minutes, 18 seconds)
2025-08-07 12:18:58,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:18:59,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 397.18210 ± 185.361
2025-08-07 12:18:59,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [629.6096, 351.30573, 107.42922, 414.94156, 731.07385, 357.61215, 428.3195, 102.12197, 407.48193, 441.92545]
2025-08-07 12:18:59,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 73.0, 21.0, 93.0, 156.0, 70.0, 92.0, 20.0, 77.0, 88.0]
2025-08-07 12:18:59,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 10 minutes, 20 seconds)
2025-08-07 12:20:56,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:20:57,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 249.55655 ± 160.046
2025-08-07 12:20:57,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [320.44772, 95.49906, 101.78686, 347.09708, 95.078636, 582.8271, 372.3592, 108.20422, 349.45444, 122.81136]
2025-08-07 12:20:57,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 19.0, 20.0, 64.0, 19.0, 116.0, 69.0, 21.0, 64.0, 24.0]
2025-08-07 12:20:57,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 8 minutes, 23 seconds)
2025-08-07 12:22:53,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:22:54,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 392.66083 ± 231.461
2025-08-07 12:22:54,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [398.64264, 391.59183, 850.2547, 95.85585, 95.43876, 364.12885, 551.3505, 108.064835, 567.2709, 504.0097]
2025-08-07 12:22:54,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 70.0, 156.0, 19.0, 19.0, 64.0, 100.0, 21.0, 107.0, 92.0]
2025-08-07 12:22:54,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 6 minutes, 36 seconds)
2025-08-07 12:24:51,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:24:52,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 178.81348 ± 131.626
2025-08-07 12:24:52,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [130.18973, 101.4681, 84.16379, 137.56653, 168.44814, 519.06024, 328.37598, 106.956604, 95.37865, 116.52702]
2025-08-07 12:24:52,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 20.0, 17.0, 27.0, 32.0, 94.0, 61.0, 21.0, 19.0, 23.0]
2025-08-07 12:24:52,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 4 minutes, 39 seconds)
2025-08-07 12:26:48,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:26:49,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 199.29797 ± 141.561
2025-08-07 12:26:49,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [473.15933, 122.25018, 297.44846, 123.631226, 107.187614, 95.45139, 108.25792, 445.32492, 102.56539, 117.703285]
2025-08-07 12:26:49,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 24.0, 56.0, 24.0, 21.0, 19.0, 21.0, 82.0, 20.0, 23.0]
2025-08-07 12:26:49,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 2 minutes, 32 seconds)
2025-08-07 12:28:44,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:28:45,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 348.99829 ± 205.017
2025-08-07 12:28:45,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [101.43175, 493.04605, 484.8655, 619.0967, 500.8426, 424.21893, 96.61246, 90.99934, 130.02295, 548.84656]
2025-08-07 12:28:45,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 93.0, 89.0, 127.0, 93.0, 80.0, 19.0, 18.0, 25.0, 102.0]
2025-08-07 12:28:45,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 30 seconds)
2025-08-07 12:30:41,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:30:42,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 293.62842 ± 249.941
2025-08-07 12:30:42,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [442.0318, 113.210686, 119.66805, 124.508156, 100.89604, 334.70786, 108.45484, 561.9083, 149.55183, 881.3466]
2025-08-07 12:30:42,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 22.0, 23.0, 24.0, 20.0, 71.0, 21.0, 107.0, 29.0, 166.0]
2025-08-07 12:30:42,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 58 minutes, 31 seconds)
2025-08-07 12:32:37,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:32:38,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 259.55893 ± 159.303
2025-08-07 12:32:38,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [113.9085, 310.13873, 324.4381, 421.90897, 177.70732, 112.84394, 95.70523, 380.5758, 574.15344, 84.20946]
2025-08-07 12:32:38,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 57.0, 61.0, 78.0, 34.0, 22.0, 19.0, 68.0, 107.0, 17.0]
2025-08-07 12:32:38,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 56 minutes, 23 seconds)
2025-08-07 12:34:31,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:34:32,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 222.56873 ± 149.782
2025-08-07 12:34:32,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [313.8152, 116.355034, 335.87802, 399.49915, 128.8131, 89.2376, 125.278656, 89.21432, 526.0872, 101.50894]
2025-08-07 12:34:32,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [57.0, 23.0, 64.0, 82.0, 25.0, 18.0, 24.0, 18.0, 116.0, 20.0]
2025-08-07 12:34:32,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 54 minutes, 8 seconds)
2025-08-07 12:36:24,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:36:25,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 256.83673 ± 171.684
2025-08-07 12:36:25,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [528.43414, 430.9086, 89.1152, 288.3898, 100.96583, 101.08378, 95.50344, 356.38516, 95.108315, 482.473]
2025-08-07 12:36:25,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 80.0, 18.0, 55.0, 20.0, 20.0, 19.0, 66.0, 19.0, 88.0]
2025-08-07 12:36:25,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 51 minutes, 52 seconds)
2025-08-07 12:38:18,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:38:19,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 254.03598 ± 141.240
2025-08-07 12:38:19,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [129.49374, 569.1412, 134.26564, 302.684, 144.65582, 402.98553, 269.84158, 139.51581, 320.43384, 127.342834]
2025-08-07 12:38:19,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 103.0, 26.0, 57.0, 28.0, 79.0, 53.0, 27.0, 60.0, 25.0]
2025-08-07 12:38:19,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 49 minutes, 45 seconds)
2025-08-07 12:40:12,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:40:13,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 256.99707 ± 155.790
2025-08-07 12:40:13,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [460.3034, 90.76923, 463.9431, 102.53444, 393.68765, 322.41608, 119.68725, 113.74969, 102.01633, 400.86353]
2025-08-07 12:40:13,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 18.0, 90.0, 20.0, 72.0, 61.0, 23.0, 22.0, 20.0, 71.0]
2025-08-07 12:40:13,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 47 minutes, 33 seconds)
2025-08-07 12:42:07,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:42:08,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 260.36807 ± 143.655
2025-08-07 12:42:08,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [97.01416, 384.66855, 118.92806, 102.69861, 417.35403, 378.34027, 317.8819, 156.29181, 141.12141, 489.3817]
2025-08-07 12:42:08,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 71.0, 23.0, 20.0, 78.0, 70.0, 58.0, 30.0, 27.0, 92.0]
2025-08-07 12:42:08,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 45 minutes, 35 seconds)
2025-08-07 12:44:02,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:44:03,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 422.09326 ± 233.308
2025-08-07 12:44:03,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [331.14658, 715.697, 95.08288, 106.37724, 728.8459, 246.26555, 428.09503, 606.42065, 677.9105, 285.09128]
2025-08-07 12:44:03,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 149.0, 19.0, 21.0, 135.0, 46.0, 87.0, 110.0, 146.0, 55.0]
2025-08-07 12:44:03,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 43 minutes, 47 seconds)
2025-08-07 12:45:58,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:45:59,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 389.19818 ± 172.362
2025-08-07 12:45:59,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [335.9197, 668.068, 320.02512, 503.94052, 326.73373, 401.45523, 107.16881, 481.27133, 143.50665, 603.8927]
2025-08-07 12:45:59,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 123.0, 66.0, 92.0, 61.0, 88.0, 21.0, 86.0, 28.0, 126.0]
2025-08-07 12:45:59,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 42 minutes, 5 seconds)
2025-08-07 12:47:50,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:47:52,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 408.76569 ± 154.163
2025-08-07 12:47:52,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [473.7666, 579.06506, 559.27167, 491.4106, 312.38434, 107.58128, 156.53294, 454.3239, 480.9093, 472.41104]
2025-08-07 12:47:52,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 122.0, 121.0, 97.0, 61.0, 21.0, 30.0, 85.0, 104.0, 84.0]
2025-08-07 12:47:52,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 40 minutes, 4 seconds)
2025-08-07 12:49:43,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:49:44,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 303.88675 ± 175.203
2025-08-07 12:49:44,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [118.395996, 100.89882, 386.7873, 682.0347, 343.8458, 125.954094, 310.13086, 164.06535, 469.85553, 336.8992]
2025-08-07 12:49:44,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 20.0, 69.0, 124.0, 63.0, 25.0, 59.0, 31.0, 100.0, 62.0]
2025-08-07 12:49:44,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 38 minutes, 3 seconds)
2025-08-07 12:51:35,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:51:36,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 286.75461 ± 179.052
2025-08-07 12:51:36,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [433.79553, 490.3944, 90.2571, 121.824104, 535.7291, 379.74582, 95.45795, 113.29969, 139.48969, 467.55243]
2025-08-07 12:51:36,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 91.0, 18.0, 24.0, 114.0, 82.0, 19.0, 22.0, 27.0, 81.0]
2025-08-07 12:51:36,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 35 minutes, 58 seconds)
2025-08-07 12:53:26,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:53:27,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 201.66180 ± 136.072
2025-08-07 12:53:27,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [416.4308, 106.7122, 96.64138, 108.3344, 472.81204, 128.88672, 152.03023, 101.8173, 312.87442, 120.07866]
2025-08-07 12:53:27,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 21.0, 19.0, 21.0, 89.0, 25.0, 29.0, 20.0, 59.0, 23.0]
2025-08-07 12:53:27,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 33 minutes, 49 seconds)
2025-08-07 12:55:18,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:55:18,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 292.79358 ± 197.976
2025-08-07 12:55:18,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [524.52673, 135.77089, 511.87036, 618.3309, 101.046814, 108.39949, 102.315796, 96.70035, 408.28525, 320.68936]
2025-08-07 12:55:18,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 26.0, 97.0, 107.0, 20.0, 21.0, 20.0, 19.0, 72.0, 68.0]
2025-08-07 12:55:18,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 31 minutes, 42 seconds)
2025-08-07 12:57:10,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:57:11,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 373.36465 ± 249.820
2025-08-07 12:57:11,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [96.26485, 96.230995, 90.31322, 806.91644, 490.7631, 366.04977, 580.5413, 591.19666, 96.0362, 519.334]
2025-08-07 12:57:11,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 19.0, 18.0, 148.0, 94.0, 66.0, 106.0, 110.0, 19.0, 96.0]
2025-08-07 12:57:11,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 29 minutes, 50 seconds)
2025-08-07 12:59:02,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:59:03,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 277.67581 ± 256.447
2025-08-07 12:59:03,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [917.2509, 352.2198, 325.01636, 84.378494, 95.24538, 108.21011, 125.726036, 133.6079, 100.286606, 534.81665]
2025-08-07 12:59:03,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [171.0, 65.0, 62.0, 17.0, 19.0, 21.0, 24.0, 26.0, 20.0, 98.0]
2025-08-07 12:59:03,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 27 minutes, 56 seconds)
2025-08-07 13:00:55,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:00:56,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 420.63696 ± 411.003
2025-08-07 13:00:56,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [523.8887, 135.58957, 89.38817, 135.09972, 1483.08, 518.2087, 100.93373, 634.5464, 501.52484, 84.10985]
2025-08-07 13:00:56,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 26.0, 18.0, 26.0, 294.0, 94.0, 20.0, 114.0, 108.0, 17.0]
2025-08-07 13:00:56,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 26 minutes, 9 seconds)
2025-08-07 13:02:59,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:03:00,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 295.46832 ± 219.446
2025-08-07 13:03:00,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [96.266335, 119.09823, 774.8779, 101.09013, 326.7613, 396.93658, 496.13736, 95.787704, 433.25812, 114.46952]
2025-08-07 13:03:00,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 23.0, 157.0, 20.0, 65.0, 87.0, 89.0, 19.0, 91.0, 22.0]
2025-08-07 13:03:00,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 24 minutes, 49 seconds)
2025-08-07 13:05:04,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:05:05,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 390.16449 ± 149.326
2025-08-07 13:05:05,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [451.84967, 486.8131, 654.68024, 517.8946, 308.68985, 371.66458, 107.77703, 209.9808, 349.82703, 442.46808]
2025-08-07 13:05:05,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 106.0, 126.0, 96.0, 57.0, 67.0, 21.0, 40.0, 71.0, 82.0]
2025-08-07 13:05:05,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 23 minutes, 29 seconds)
2025-08-07 13:07:11,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:07:12,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 333.16962 ± 153.003
2025-08-07 13:07:12,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [470.45786, 388.36484, 445.81622, 146.21614, 415.30405, 555.7212, 359.6136, 347.16074, 95.53919, 107.50236]
2025-08-07 13:07:12,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 68.0, 79.0, 28.0, 75.0, 126.0, 73.0, 74.0, 19.0, 21.0]
2025-08-07 13:07:12,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 2 seconds)
2025-08-07 13:09:16,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:09:17,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 350.17267 ± 174.068
2025-08-07 13:09:17,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [358.00613, 555.02344, 629.8077, 354.24478, 405.2838, 101.162636, 107.92559, 465.8675, 385.02582, 139.37915]
2025-08-07 13:09:17,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 103.0, 113.0, 65.0, 73.0, 20.0, 21.0, 84.0, 79.0, 27.0]
2025-08-07 13:09:17,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes, 28 seconds)
2025-08-07 13:11:23,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:11:24,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 270.74631 ± 205.353
2025-08-07 13:11:24,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [661.44635, 394.67728, 89.20802, 114.69731, 568.79694, 146.0274, 102.1678, 396.57104, 131.58394, 102.28683]
2025-08-07 13:11:24,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 74.0, 18.0, 22.0, 103.0, 28.0, 20.0, 74.0, 25.0, 20.0]
2025-08-07 13:11:24,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 49 seconds)
2025-08-07 13:13:29,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:13:29,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 231.86006 ± 162.804
2025-08-07 13:13:29,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [102.39733, 129.8746, 584.02185, 94.85107, 107.19097, 326.28976, 107.84862, 330.80475, 121.186905, 414.13477]
2025-08-07 13:13:29,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 25.0, 108.0, 19.0, 21.0, 62.0, 21.0, 63.0, 24.0, 75.0]
2025-08-07 13:13:29,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 47 seconds)
2025-08-07 13:15:34,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:15:36,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 423.76849 ± 227.058
2025-08-07 13:15:36,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [95.809845, 311.71738, 673.2013, 441.36697, 339.4847, 83.916534, 814.3847, 526.6026, 620.13794, 331.06323]
2025-08-07 13:15:36,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 57.0, 145.0, 84.0, 60.0, 17.0, 163.0, 101.0, 113.0, 64.0]
2025-08-07 13:15:36,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1226 [INFO]: New best (423.77) for latency MM1Queue_a033_s075
2025-08-07 13:15:36,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 42 seconds)
2025-08-07 13:17:36,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:17:37,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 380.84454 ± 198.028
2025-08-07 13:17:37,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [368.84595, 105.86675, 552.0964, 107.90121, 537.6719, 666.0226, 422.58066, 543.3687, 101.52479, 402.56653]
2025-08-07 13:17:37,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 21.0, 101.0, 21.0, 114.0, 121.0, 77.0, 96.0, 20.0, 76.0]
2025-08-07 13:17:37,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 29 seconds)
2025-08-07 13:19:28,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:19:29,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 221.56900 ± 161.627
2025-08-07 13:19:29,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [159.64455, 128.06882, 107.54128, 124.00127, 483.45428, 89.33971, 403.79333, 505.21466, 119.53602, 95.09588]
2025-08-07 13:19:29,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 25.0, 21.0, 24.0, 100.0, 18.0, 73.0, 93.0, 23.0, 19.0]
2025-08-07 13:19:29,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 11 seconds)
2025-08-07 13:21:19,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:21:20,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 381.38986 ± 270.455
2025-08-07 13:21:20,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [677.3266, 108.70709, 934.6185, 361.57916, 470.1697, 95.8252, 479.26184, 118.49214, 478.8427, 89.07545]
2025-08-07 13:21:20,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 21.0, 168.0, 64.0, 85.0, 19.0, 93.0, 23.0, 89.0, 18.0]
2025-08-07 13:21:20,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 57 seconds)
2025-08-07 13:23:11,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:23:12,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 358.32874 ± 185.115
2025-08-07 13:23:12,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [112.56403, 567.38495, 335.18863, 114.255486, 380.96353, 88.81859, 615.34344, 475.6088, 376.12418, 517.0356]
2025-08-07 13:23:12,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 103.0, 60.0, 22.0, 73.0, 18.0, 117.0, 87.0, 66.0, 93.0]
2025-08-07 13:23:12,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 49 seconds)
2025-08-07 13:25:04,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:25:05,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 386.29803 ± 197.766
2025-08-07 13:25:05,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [125.13062, 165.01169, 411.9966, 736.81415, 487.8144, 107.39842, 477.6877, 284.1342, 543.5525, 523.4402]
2025-08-07 13:25:05,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 31.0, 76.0, 153.0, 105.0, 21.0, 90.0, 51.0, 97.0, 95.0]
2025-08-07 13:25:05,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 47 seconds)
2025-08-07 13:26:55,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:26:56,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 385.83749 ± 291.258
2025-08-07 13:26:56,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [111.91304, 116.64378, 439.37317, 544.4191, 739.3043, 829.2765, 739.6604, 147.28445, 84.28045, 106.21982]
2025-08-07 13:26:56,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 23.0, 82.0, 103.0, 138.0, 156.0, 136.0, 29.0, 17.0, 21.0]
2025-08-07 13:26:56,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 51 seconds)
2025-08-07 13:28:47,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:28:48,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 415.12604 ± 227.556
2025-08-07 13:28:48,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [490.1657, 479.91968, 143.27654, 89.6063, 809.4268, 531.6112, 593.2498, 90.13326, 368.7918, 555.079]
2025-08-07 13:28:48,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 91.0, 28.0, 18.0, 152.0, 100.0, 112.0, 18.0, 68.0, 101.0]
2025-08-07 13:28:48,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1251 [DEBUG]: Training session finished
