2025-08-07 10:11:04,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc5-humanoid/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:11:04,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc5-humanoid/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:11:04,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x1467c82c3410>}
2025-08-07 10:11:04,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1111 [DEBUG]: using device: cuda
2025-08-07 10:11:04,183 baseline-bpql-noiseperc5-humanoid:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 10:11:04,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1133 [INFO]: Creating new trainer
2025-08-07 10:11:04,201 baseline-bpql-noiseperc5-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=648, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-08-07 10:11:04,201 baseline-bpql-noiseperc5-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 10:11:05,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1194 [DEBUG]: Starting training session...
2025-08-07 10:11:05,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 1/100
2025-08-07 10:12:54,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:12:55,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 351.69925 ± 43.223
2025-08-07 10:12:55,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [402.89734, 355.94098, 429.72934, 326.9219, 337.2375, 293.08255, 299.1816, 343.5589, 328.46136, 399.98096]
2025-08-07 10:12:55,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 66.0, 81.0, 62.0, 63.0, 55.0, 57.0, 63.0, 61.0, 75.0]
2025-08-07 10:12:55,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (351.70) for latency MM1Queue_a033_s075
2025-08-07 10:12:55,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 28 seconds)
2025-08-07 10:14:51,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:14:52,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 419.07300 ± 89.368
2025-08-07 10:14:52,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [376.56146, 461.87357, 487.5136, 429.00812, 405.364, 327.91983, 638.6814, 320.6531, 387.73135, 355.42346]
2025-08-07 10:14:52,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 86.0, 93.0, 88.0, 78.0, 62.0, 121.0, 62.0, 76.0, 67.0]
2025-08-07 10:14:52,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (419.07) for latency MM1Queue_a033_s075
2025-08-07 10:14:52,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 5 minutes, 25 seconds)
2025-08-07 10:16:50,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:16:51,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 382.88095 ± 117.226
2025-08-07 10:16:51,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [578.94354, 358.634, 330.81784, 321.34656, 361.3895, 337.72314, 350.233, 159.24731, 555.7201, 474.7543]
2025-08-07 10:16:51,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 69.0, 67.0, 62.0, 69.0, 65.0, 67.0, 31.0, 119.0, 93.0]
2025-08-07 10:16:51,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 6 minutes, 24 seconds)
2025-08-07 10:18:48,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:18:49,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 346.76001 ± 116.884
2025-08-07 10:18:49,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [349.62494, 418.88474, 521.983, 199.93054, 344.47006, 207.30692, 209.59148, 406.5155, 280.94, 528.353]
2025-08-07 10:18:49,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 87.0, 100.0, 40.0, 70.0, 41.0, 42.0, 90.0, 55.0, 100.0]
2025-08-07 10:18:49,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 5 minutes, 23 seconds)
2025-08-07 10:20:47,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:20:48,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 385.27383 ± 42.198
2025-08-07 10:20:48,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [345.56427, 399.36618, 440.9121, 429.32745, 317.91138, 323.8459, 388.88333, 424.91635, 368.27203, 413.73944]
2025-08-07 10:20:48,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 79.0, 86.0, 88.0, 65.0, 70.0, 82.0, 88.0, 78.0, 83.0]
2025-08-07 10:20:48,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 4 minutes, 28 seconds)
2025-08-07 10:22:46,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:22:48,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 442.69208 ± 120.398
2025-08-07 10:22:48,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [345.31317, 295.91418, 334.6812, 430.67023, 478.72702, 418.01746, 411.5825, 481.15115, 477.43152, 753.43225]
2025-08-07 10:22:48,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 58.0, 63.0, 81.0, 89.0, 87.0, 77.0, 98.0, 100.0, 159.0]
2025-08-07 10:22:48,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (442.69) for latency MM1Queue_a033_s075
2025-08-07 10:22:48,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 5 minutes, 44 seconds)
2025-08-07 10:24:45,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:24:46,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 411.80582 ± 88.499
2025-08-07 10:24:46,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [415.63892, 395.64233, 415.3562, 429.12067, 473.72345, 457.3514, 536.23615, 392.87677, 175.65239, 426.45975]
2025-08-07 10:24:46,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 75.0, 81.0, 79.0, 93.0, 100.0, 107.0, 86.0, 34.0, 89.0]
2025-08-07 10:24:46,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 4 minutes, 2 seconds)
2025-08-07 10:26:43,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:26:45,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 455.87802 ± 179.238
2025-08-07 10:26:45,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [777.0672, 574.3368, 470.05606, 241.5609, 628.25824, 389.05606, 401.97058, 135.38573, 564.6298, 376.45908]
2025-08-07 10:26:45,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 108.0, 87.0, 50.0, 133.0, 75.0, 85.0, 26.0, 119.0, 69.0]
2025-08-07 10:26:45,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (455.88) for latency MM1Queue_a033_s075
2025-08-07 10:26:45,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 1 minute, 59 seconds)
2025-08-07 10:28:43,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:28:44,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 396.32257 ± 85.923
2025-08-07 10:28:44,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [390.2735, 437.15158, 510.33273, 416.8912, 428.83496, 169.9314, 372.04962, 374.3229, 469.22708, 394.21066]
2025-08-07 10:28:44,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 82.0, 100.0, 78.0, 80.0, 33.0, 73.0, 69.0, 88.0, 73.0]
2025-08-07 10:28:44,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 25 seconds)
2025-08-07 10:30:41,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:30:43,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 489.73355 ± 141.908
2025-08-07 10:30:43,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [468.0648, 434.83908, 446.99734, 592.7331, 361.1325, 433.3403, 348.55112, 346.42136, 672.3082, 792.9477]
2025-08-07 10:30:43,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 87.0, 94.0, 115.0, 76.0, 84.0, 66.0, 70.0, 143.0, 154.0]
2025-08-07 10:30:43,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (489.73) for latency MM1Queue_a033_s075
2025-08-07 10:30:43,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 58 minutes, 29 seconds)
2025-08-07 10:32:41,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:32:43,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 545.09705 ± 107.588
2025-08-07 10:32:43,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [488.11224, 503.78256, 634.0674, 516.444, 722.0002, 409.20383, 539.39874, 410.26657, 502.8651, 724.8302]
2025-08-07 10:32:43,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 95.0, 120.0, 101.0, 141.0, 77.0, 104.0, 78.0, 98.0, 146.0]
2025-08-07 10:32:43,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (545.10) for latency MM1Queue_a033_s075
2025-08-07 10:32:43,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 56 minutes, 29 seconds)
2025-08-07 10:34:40,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:34:42,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 480.66486 ± 93.280
2025-08-07 10:34:42,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [521.9934, 535.5784, 627.334, 484.70212, 514.48694, 393.73663, 360.36322, 609.02637, 398.79453, 360.6326]
2025-08-07 10:34:42,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 102.0, 123.0, 90.0, 100.0, 85.0, 75.0, 126.0, 73.0, 67.0]
2025-08-07 10:34:42,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 54 minutes, 40 seconds)
2025-08-07 10:36:40,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:36:41,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 557.35754 ± 117.535
2025-08-07 10:36:41,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [657.0172, 609.9329, 435.44916, 748.091, 717.46844, 436.74942, 391.5356, 550.77527, 477.22092, 549.3359]
2025-08-07 10:36:41,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 121.0, 93.0, 150.0, 141.0, 80.0, 74.0, 106.0, 94.0, 102.0]
2025-08-07 10:36:41,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (557.36) for latency MM1Queue_a033_s075
2025-08-07 10:36:41,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 53 minutes, 2 seconds)
2025-08-07 10:38:39,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:38:41,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 534.24963 ± 103.171
2025-08-07 10:38:41,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [487.48825, 466.49405, 654.8295, 376.83792, 474.74423, 693.2018, 654.4698, 513.2246, 422.4926, 598.71387]
2025-08-07 10:38:41,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 97.0, 140.0, 75.0, 100.0, 132.0, 127.0, 95.0, 88.0, 114.0]
2025-08-07 10:38:41,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 51 minutes, 10 seconds)
2025-08-07 10:40:39,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:40:41,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 593.73267 ± 104.135
2025-08-07 10:40:41,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [461.31833, 728.7351, 621.8062, 448.65378, 714.7421, 615.12537, 576.7707, 569.9537, 468.1804, 732.041]
2025-08-07 10:40:41,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 138.0, 119.0, 95.0, 138.0, 116.0, 126.0, 122.0, 94.0, 148.0]
2025-08-07 10:40:41,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (593.73) for latency MM1Queue_a033_s075
2025-08-07 10:40:41,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 49 minutes, 27 seconds)
2025-08-07 10:42:39,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:42:40,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 495.34067 ± 117.589
2025-08-07 10:42:40,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [581.90674, 427.42197, 591.06, 552.5317, 754.5291, 466.4719, 413.33627, 409.97302, 332.2024, 423.97397]
2025-08-07 10:42:40,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 90.0, 120.0, 105.0, 148.0, 85.0, 88.0, 85.0, 64.0, 89.0]
2025-08-07 10:42:40,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 47 minutes, 19 seconds)
2025-08-07 10:44:38,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:44:39,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 454.80347 ± 132.416
2025-08-07 10:44:39,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [396.8371, 139.5843, 638.8335, 349.59338, 472.00027, 521.32996, 422.79803, 496.4814, 556.38654, 554.19]
2025-08-07 10:44:39,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 27.0, 122.0, 68.0, 103.0, 112.0, 80.0, 93.0, 106.0, 105.0]
2025-08-07 10:44:39,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 45 minutes, 19 seconds)
2025-08-07 10:46:37,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:46:39,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 499.70166 ± 209.461
2025-08-07 10:46:39,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [758.9761, 457.96436, 556.1997, 817.8813, 704.2023, 155.8774, 191.2183, 441.86716, 417.32666, 495.50333]
2025-08-07 10:46:39,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 86.0, 122.0, 155.0, 136.0, 30.0, 37.0, 82.0, 77.0, 91.0]
2025-08-07 10:46:39,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 43 minutes, 14 seconds)
2025-08-07 10:48:36,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:48:38,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 582.43488 ± 193.350
2025-08-07 10:48:38,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [431.06015, 505.95273, 856.3781, 821.5461, 474.1152, 527.0985, 340.83163, 469.2987, 475.75986, 922.30774]
2025-08-07 10:48:38,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 92.0, 160.0, 154.0, 101.0, 98.0, 73.0, 87.0, 86.0, 197.0]
2025-08-07 10:48:38,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 41 minutes, 15 seconds)
2025-08-07 10:50:37,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:50:39,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 523.73206 ± 171.545
2025-08-07 10:50:39,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [496.56723, 440.15265, 471.61392, 205.73454, 400.68243, 478.60114, 783.78436, 805.51654, 497.45505, 657.21216]
2025-08-07 10:50:39,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 83.0, 95.0, 42.0, 75.0, 88.0, 167.0, 156.0, 95.0, 136.0]
2025-08-07 10:50:39,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 39 minutes, 21 seconds)
2025-08-07 10:52:36,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:52:38,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 605.79071 ± 124.374
2025-08-07 10:52:38,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [597.6805, 709.3489, 806.8966, 686.5739, 542.06635, 773.4377, 523.15234, 512.4347, 416.9225, 489.3942]
2025-08-07 10:52:38,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 139.0, 155.0, 125.0, 118.0, 162.0, 99.0, 95.0, 77.0, 90.0]
2025-08-07 10:52:38,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (605.79) for latency MM1Queue_a033_s075
2025-08-07 10:52:38,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 37 minutes, 22 seconds)
2025-08-07 10:54:36,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:54:38,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 535.85309 ± 133.394
2025-08-07 10:54:38,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [491.4904, 637.6953, 540.08527, 440.1369, 519.0639, 556.68, 478.25586, 879.2409, 437.5185, 378.36398]
2025-08-07 10:54:38,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 135.0, 99.0, 84.0, 109.0, 102.0, 104.0, 169.0, 83.0, 73.0]
2025-08-07 10:54:38,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 35 minutes, 35 seconds)
2025-08-07 10:56:36,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:56:37,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 559.60059 ± 137.079
2025-08-07 10:56:37,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [490.26862, 531.2541, 716.21497, 412.52005, 493.2079, 519.817, 590.38385, 506.76083, 894.7214, 440.85754]
2025-08-07 10:56:37,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 98.0, 145.0, 78.0, 93.0, 100.0, 112.0, 100.0, 193.0, 83.0]
2025-08-07 10:56:37,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 33 minutes, 39 seconds)
2025-08-07 10:58:36,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:58:38,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 660.23419 ± 186.163
2025-08-07 10:58:38,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [688.41406, 803.7455, 421.64838, 478.72382, 987.28577, 676.9506, 441.78043, 536.56903, 652.073, 915.1516]
2025-08-07 10:58:38,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 158.0, 91.0, 101.0, 198.0, 125.0, 96.0, 115.0, 138.0, 169.0]
2025-08-07 10:58:38,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (660.23) for latency MM1Queue_a033_s075
2025-08-07 10:58:38,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 31 minutes, 56 seconds)
2025-08-07 11:00:36,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:00:38,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 582.59454 ± 135.558
2025-08-07 11:00:38,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [404.63083, 649.4912, 840.7725, 460.87567, 457.3561, 555.93195, 495.89688, 729.7993, 519.72266, 711.468]
2025-08-07 11:00:38,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 143.0, 162.0, 86.0, 85.0, 103.0, 91.0, 156.0, 115.0, 149.0]
2025-08-07 11:00:38,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 29 minutes, 53 seconds)
2025-08-07 11:02:37,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:02:39,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 603.04767 ± 138.392
2025-08-07 11:02:39,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [487.25858, 629.021, 869.5416, 578.5425, 421.44, 638.6931, 682.90717, 778.838, 473.58408, 470.65018]
2025-08-07 11:02:39,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 118.0, 183.0, 120.0, 91.0, 124.0, 127.0, 153.0, 101.0, 87.0]
2025-08-07 11:02:39,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 28 minutes, 11 seconds)
2025-08-07 11:04:38,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:04:39,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 590.37354 ± 70.139
2025-08-07 11:04:39,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [555.0688, 617.9395, 655.9177, 620.3886, 622.48755, 563.04565, 560.44836, 422.2019, 590.8142, 695.4231]
2025-08-07 11:04:39,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 129.0, 134.0, 124.0, 116.0, 107.0, 103.0, 91.0, 110.0, 145.0]
2025-08-07 11:04:39,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 26 minutes, 25 seconds)
2025-08-07 11:06:38,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:06:40,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 619.46228 ± 160.881
2025-08-07 11:06:40,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [408.82788, 582.34106, 945.87683, 766.24713, 630.0772, 339.742, 637.653, 565.2537, 649.56946, 669.0346]
2025-08-07 11:06:40,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 108.0, 184.0, 140.0, 133.0, 69.0, 122.0, 106.0, 125.0, 120.0]
2025-08-07 11:06:40,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 24 minutes, 37 seconds)
2025-08-07 11:08:37,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:08:39,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 563.35754 ± 222.338
2025-08-07 11:08:39,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [382.06888, 741.5408, 567.3398, 161.31543, 525.43994, 688.132, 1023.8245, 557.4042, 372.47763, 614.0326]
2025-08-07 11:08:39,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 143.0, 122.0, 31.0, 97.0, 127.0, 213.0, 104.0, 82.0, 114.0]
2025-08-07 11:08:39,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 22 minutes, 19 seconds)
2025-08-07 11:10:38,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:10:40,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 644.88660 ± 158.878
2025-08-07 11:10:40,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [600.0646, 687.25494, 830.5281, 615.5114, 1022.7483, 591.3219, 475.868, 472.4718, 574.2509, 578.8456]
2025-08-07 11:10:40,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 123.0, 173.0, 111.0, 192.0, 111.0, 94.0, 99.0, 123.0, 107.0]
2025-08-07 11:10:40,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 20 minutes, 23 seconds)
2025-08-07 11:12:39,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:12:41,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 711.10059 ± 196.044
2025-08-07 11:12:41,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [911.32996, 1030.9797, 670.49316, 490.11566, 653.25366, 979.7717, 484.88977, 760.48346, 466.06583, 663.6228]
2025-08-07 11:12:41,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [174.0, 207.0, 136.0, 89.0, 119.0, 197.0, 92.0, 142.0, 87.0, 127.0]
2025-08-07 11:12:41,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (711.10) for latency MM1Queue_a033_s075
2025-08-07 11:12:41,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 18 minutes, 39 seconds)
2025-08-07 11:14:41,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:14:43,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 629.73132 ± 214.658
2025-08-07 11:14:43,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1076.0779, 350.21637, 469.1327, 463.32843, 597.77435, 782.47595, 829.56775, 504.02008, 455.4312, 769.2883]
2025-08-07 11:14:43,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [208.0, 68.0, 101.0, 84.0, 114.0, 157.0, 161.0, 91.0, 95.0, 144.0]
2025-08-07 11:14:43,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 16 minutes, 43 seconds)
2025-08-07 11:16:41,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:16:43,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 607.44293 ± 119.246
2025-08-07 11:16:43,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [724.2473, 769.3558, 654.4519, 528.4017, 437.19626, 514.2505, 523.3461, 522.6109, 811.424, 589.14496]
2025-08-07 11:16:43,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 161.0, 132.0, 113.0, 82.0, 111.0, 94.0, 95.0, 162.0, 125.0]
2025-08-07 11:16:43,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 14 minutes, 35 seconds)
2025-08-07 11:18:41,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:18:43,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 677.29529 ± 161.764
2025-08-07 11:18:43,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [400.98315, 634.81024, 689.28467, 759.20795, 839.1112, 953.0413, 567.5497, 818.7193, 634.33734, 475.90753]
2025-08-07 11:18:43,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 127.0, 132.0, 150.0, 155.0, 185.0, 112.0, 172.0, 124.0, 102.0]
2025-08-07 11:18:43,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 12 minutes, 47 seconds)
2025-08-07 11:20:41,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:20:42,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 500.53989 ± 114.138
2025-08-07 11:20:42,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [670.1941, 423.60852, 545.7025, 446.55927, 552.82056, 354.6049, 366.25937, 658.8337, 601.5736, 385.24197]
2025-08-07 11:20:42,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 78.0, 110.0, 94.0, 100.0, 66.0, 68.0, 122.0, 117.0, 73.0]
2025-08-07 11:20:42,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 10 minutes, 33 seconds)
2025-08-07 11:22:42,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:22:44,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 629.75830 ± 135.570
2025-08-07 11:22:44,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [795.99426, 468.08252, 807.9282, 384.59174, 586.0221, 686.7199, 621.37256, 743.98755, 697.0669, 505.81714]
2025-08-07 11:22:44,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 85.0, 165.0, 72.0, 108.0, 121.0, 113.0, 146.0, 144.0, 93.0]
2025-08-07 11:22:44,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 8 minutes, 32 seconds)
2025-08-07 11:24:41,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:24:44,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 741.60736 ± 188.770
2025-08-07 11:24:44,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [968.8147, 1004.9396, 926.2625, 618.2886, 901.9824, 629.5642, 572.8992, 717.5, 404.18097, 671.6417]
2025-08-07 11:24:44,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [206.0, 193.0, 195.0, 120.0, 169.0, 134.0, 121.0, 154.0, 85.0, 119.0]
2025-08-07 11:24:44,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (741.61) for latency MM1Queue_a033_s075
2025-08-07 11:24:44,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 6 minutes, 13 seconds)
2025-08-07 11:26:43,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:26:45,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 680.69751 ± 134.274
2025-08-07 11:26:45,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [748.4234, 664.2772, 723.7228, 893.53156, 551.4107, 680.5034, 866.0009, 457.30194, 517.1277, 704.6754]
2025-08-07 11:26:45,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 125.0, 134.0, 174.0, 118.0, 126.0, 156.0, 82.0, 107.0, 130.0]
2025-08-07 11:26:45,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 4 minutes, 25 seconds)
2025-08-07 11:28:44,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:28:46,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 786.12195 ± 286.805
2025-08-07 11:28:46,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [785.4588, 802.32263, 426.9031, 1287.6665, 559.1365, 447.8637, 1305.7518, 726.2628, 739.7878, 780.0653]
2025-08-07 11:28:46,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 158.0, 78.0, 259.0, 101.0, 84.0, 246.0, 134.0, 132.0, 139.0]
2025-08-07 11:28:46,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (786.12) for latency MM1Queue_a033_s075
2025-08-07 11:28:46,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 2 minutes, 36 seconds)
2025-08-07 11:30:44,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:30:46,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 661.65161 ± 186.673
2025-08-07 11:30:46,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [920.50616, 516.80084, 771.2087, 465.14194, 354.65707, 789.85974, 904.34784, 711.80206, 709.00476, 473.187]
2025-08-07 11:30:46,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [166.0, 95.0, 152.0, 98.0, 67.0, 150.0, 166.0, 151.0, 149.0, 92.0]
2025-08-07 11:30:46,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 46 seconds)
2025-08-07 11:32:46,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:32:48,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 562.05072 ± 90.739
2025-08-07 11:32:48,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [520.9232, 415.35425, 588.2265, 492.91528, 528.72614, 760.0198, 515.08514, 594.4127, 660.92346, 543.9209]
2025-08-07 11:32:48,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 78.0, 111.0, 105.0, 96.0, 138.0, 102.0, 120.0, 119.0, 104.0]
2025-08-07 11:32:48,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 58 minutes, 49 seconds)
2025-08-07 11:34:45,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:34:48,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 814.67993 ± 125.720
2025-08-07 11:34:48,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [814.12805, 760.86676, 828.8363, 747.02686, 734.3542, 836.138, 624.7265, 883.0789, 1133.6864, 783.9575]
2025-08-07 11:34:48,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 158.0, 173.0, 141.0, 139.0, 157.0, 131.0, 161.0, 230.0, 146.0]
2025-08-07 11:34:48,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (814.68) for latency MM1Queue_a033_s075
2025-08-07 11:34:48,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 56 minutes, 44 seconds)
2025-08-07 11:36:47,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:36:50,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 748.81769 ± 222.396
2025-08-07 11:36:50,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [747.98596, 1010.4277, 604.8715, 578.52625, 1156.4344, 860.60693, 957.8149, 485.4494, 548.10205, 537.9576]
2025-08-07 11:36:50,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 200.0, 116.0, 101.0, 226.0, 163.0, 190.0, 97.0, 105.0, 99.0]
2025-08-07 11:36:50,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 54 minutes, 55 seconds)
2025-08-07 11:38:47,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:38:49,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 721.76135 ± 147.000
2025-08-07 11:38:49,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [723.41565, 1018.22015, 592.4376, 744.761, 675.0196, 421.5427, 728.9265, 689.1021, 787.6765, 836.5118]
2025-08-07 11:38:49,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 180.0, 111.0, 144.0, 128.0, 78.0, 137.0, 129.0, 157.0, 153.0]
2025-08-07 11:38:49,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 52 minutes, 34 seconds)
2025-08-07 11:40:48,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:40:50,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 725.11609 ± 235.672
2025-08-07 11:40:50,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [737.4842, 505.87106, 958.07135, 525.73413, 550.5752, 842.4052, 668.7208, 1160.8997, 363.65897, 937.7397]
2025-08-07 11:40:50,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 108.0, 190.0, 112.0, 98.0, 174.0, 126.0, 210.0, 69.0, 200.0]
2025-08-07 11:40:50,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 50 minutes, 39 seconds)
2025-08-07 11:42:49,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:42:51,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 694.32947 ± 179.292
2025-08-07 11:42:51,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [428.178, 576.7067, 713.64856, 804.57336, 769.4896, 1076.7855, 615.7537, 673.3043, 468.74493, 816.1101]
2025-08-07 11:42:51,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 120.0, 148.0, 168.0, 151.0, 199.0, 129.0, 121.0, 84.0, 147.0]
2025-08-07 11:42:51,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 48 minutes, 29 seconds)
2025-08-07 11:44:50,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:44:53,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 804.84094 ± 298.984
2025-08-07 11:44:53,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1162.3054, 766.6176, 847.83954, 457.36395, 1358.1107, 1055.644, 387.89536, 497.64273, 743.81726, 771.1725]
2025-08-07 11:44:53,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [220.0, 156.0, 162.0, 86.0, 262.0, 203.0, 71.0, 95.0, 146.0, 140.0]
2025-08-07 11:44:53,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 46 minutes, 53 seconds)
2025-08-07 11:46:49,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:46:51,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 690.43835 ± 200.938
2025-08-07 11:46:51,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [700.1869, 460.12396, 583.2001, 519.89636, 658.0528, 997.51324, 582.2006, 794.39056, 1092.6724, 516.1471]
2025-08-07 11:46:51,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 83.0, 119.0, 111.0, 127.0, 198.0, 106.0, 155.0, 203.0, 105.0]
2025-08-07 11:46:51,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 44 minutes, 12 seconds)
2025-08-07 11:48:46,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:48:49,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 799.41626 ± 323.989
2025-08-07 11:48:49,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1066.8347, 589.19135, 819.00415, 761.93243, 749.7184, 364.3609, 1522.4087, 742.03973, 997.72095, 380.95178]
2025-08-07 11:48:49,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [200.0, 115.0, 152.0, 142.0, 156.0, 67.0, 302.0, 136.0, 203.0, 69.0]
2025-08-07 11:48:49,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 41 minutes, 56 seconds)
2025-08-07 11:50:47,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:50:50,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 764.06854 ± 192.651
2025-08-07 11:50:50,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [798.8755, 757.3651, 776.6059, 676.55615, 544.4456, 623.2102, 836.37256, 884.94824, 516.73694, 1225.5696]
2025-08-07 11:50:50,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 142.0, 142.0, 136.0, 102.0, 133.0, 168.0, 161.0, 106.0, 242.0]
2025-08-07 11:50:50,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 39 minutes, 55 seconds)
2025-08-07 11:52:49,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:52:51,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 637.21649 ± 188.400
2025-08-07 11:52:51,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [741.0462, 583.1895, 663.4151, 347.1999, 933.2295, 360.91153, 761.0822, 652.7126, 858.83887, 470.54]
2025-08-07 11:52:51,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 111.0, 125.0, 66.0, 170.0, 70.0, 160.0, 134.0, 181.0, 84.0]
2025-08-07 11:52:51,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 37 minutes, 58 seconds)
2025-08-07 11:54:49,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:54:51,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 765.89417 ± 154.065
2025-08-07 11:54:51,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [995.5788, 897.22064, 683.29596, 605.09814, 637.8747, 753.486, 1000.8541, 521.93787, 819.94244, 743.6533]
2025-08-07 11:54:51,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [183.0, 169.0, 127.0, 112.0, 115.0, 139.0, 192.0, 94.0, 150.0, 159.0]
2025-08-07 11:54:51,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 35 minutes, 42 seconds)
2025-08-07 11:56:49,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:56:52,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1026.42273 ± 302.774
2025-08-07 11:56:52,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1267.4127, 993.3682, 590.82947, 938.92725, 1134.974, 1230.8265, 988.8859, 890.09674, 580.0861, 1648.8201]
2025-08-07 11:56:52,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [234.0, 203.0, 110.0, 193.0, 235.0, 229.0, 206.0, 180.0, 108.0, 334.0]
2025-08-07 11:56:52,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (1026.42) for latency MM1Queue_a033_s075
2025-08-07 11:56:52,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 34 minutes, 15 seconds)
2025-08-07 11:58:49,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:58:51,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 772.09729 ± 233.430
2025-08-07 11:58:51,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [511.4265, 1064.65, 696.7255, 523.3788, 668.8987, 868.71735, 846.8985, 544.31104, 1266.5922, 729.3748]
2025-08-07 11:58:51,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 194.0, 129.0, 93.0, 142.0, 176.0, 174.0, 101.0, 247.0, 130.0]
2025-08-07 11:58:51,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 32 minutes, 26 seconds)
2025-08-07 12:00:51,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:00:54,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 958.29962 ± 295.613
2025-08-07 12:00:54,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [204.78139, 1152.1102, 852.1025, 1054.7034, 1172.6282, 1063.9324, 918.49664, 1161.9115, 732.7089, 1269.621]
2025-08-07 12:00:54,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [39.0, 216.0, 158.0, 213.0, 219.0, 210.0, 172.0, 224.0, 138.0, 261.0]
2025-08-07 12:00:54,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 30 minutes, 40 seconds)
2025-08-07 12:02:50,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:02:53,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 862.50214 ± 251.951
2025-08-07 12:02:53,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [716.33575, 691.35443, 563.7829, 979.70056, 1327.3875, 914.0421, 1038.318, 444.19797, 1099.8838, 850.0186]
2025-08-07 12:02:53,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 135.0, 124.0, 181.0, 265.0, 165.0, 188.0, 81.0, 223.0, 186.0]
2025-08-07 12:02:53,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 28 minutes, 16 seconds)
2025-08-07 12:04:49,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:04:52,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 964.79333 ± 169.636
2025-08-07 12:04:52,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [911.6328, 1223.7863, 1290.9994, 977.5866, 945.0429, 927.6623, 870.8255, 770.2083, 718.5628, 1011.6274]
2025-08-07 12:04:52,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [185.0, 238.0, 261.0, 201.0, 162.0, 168.0, 180.0, 150.0, 138.0, 199.0]
2025-08-07 12:04:52,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 26 minutes, 6 seconds)
2025-08-07 12:06:51,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:06:53,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 892.15491 ± 293.127
2025-08-07 12:06:53,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [986.1605, 414.61072, 658.2326, 882.6126, 414.1108, 917.9932, 1297.0525, 1079.6636, 1237.4276, 1033.6841]
2025-08-07 12:06:53,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [178.0, 77.0, 135.0, 171.0, 74.0, 172.0, 239.0, 215.0, 226.0, 180.0]
2025-08-07 12:06:53,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 24 minutes, 6 seconds)
2025-08-07 12:08:49,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:08:51,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 724.44653 ± 240.907
2025-08-07 12:08:51,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [652.6132, 952.7674, 720.37463, 518.87695, 1350.165, 668.9695, 678.13025, 487.26956, 623.63245, 591.6663]
2025-08-07 12:08:51,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 181.0, 149.0, 98.0, 247.0, 139.0, 127.0, 88.0, 122.0, 121.0]
2025-08-07 12:08:51,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 21 minutes, 55 seconds)
2025-08-07 12:10:48,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:10:51,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 917.86670 ± 292.855
2025-08-07 12:10:51,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1140.9296, 726.78076, 1565.5476, 762.25806, 867.1271, 432.53717, 831.9717, 885.2966, 1173.6279, 792.5906]
2025-08-07 12:10:51,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [231.0, 147.0, 306.0, 137.0, 183.0, 78.0, 175.0, 181.0, 238.0, 160.0]
2025-08-07 12:10:51,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 19 minutes, 31 seconds)
2025-08-07 12:12:47,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:12:50,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 854.31750 ± 347.037
2025-08-07 12:12:50,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [955.20325, 880.2826, 772.56354, 600.40155, 1417.3533, 139.26436, 739.5258, 1353.7715, 967.1276, 717.6815]
2025-08-07 12:12:50,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [181.0, 182.0, 157.0, 119.0, 258.0, 27.0, 138.0, 250.0, 185.0, 151.0]
2025-08-07 12:12:50,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 17 minutes, 37 seconds)
2025-08-07 12:14:45,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:14:48,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 987.10803 ± 523.752
2025-08-07 12:14:48,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [415.65747, 1002.5213, 636.04663, 1871.1193, 1945.5189, 611.7045, 927.8789, 352.78275, 1190.6075, 917.2434]
2025-08-07 12:14:48,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 186.0, 117.0, 359.0, 360.0, 115.0, 177.0, 68.0, 247.0, 175.0]
2025-08-07 12:14:48,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 15 minutes, 34 seconds)
2025-08-07 12:16:45,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:16:49,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1153.14905 ± 605.151
2025-08-07 12:16:49,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2117.6248, 999.2276, 323.62604, 958.3182, 1243.2476, 536.8323, 454.66766, 2157.861, 1406.9644, 1333.1206]
2025-08-07 12:16:49,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [402.0, 181.0, 63.0, 192.0, 252.0, 109.0, 93.0, 405.0, 256.0, 271.0]
2025-08-07 12:16:49,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (1153.15) for latency MM1Queue_a033_s075
2025-08-07 12:16:49,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 13 minutes, 25 seconds)
2025-08-07 12:18:43,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:18:47,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1130.29443 ± 151.222
2025-08-07 12:18:47,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1203.0953, 1125.975, 1173.0812, 1160.1091, 1063.7135, 1264.9066, 1244.3514, 711.74835, 1126.25, 1229.7135]
2025-08-07 12:18:47,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [236.0, 234.0, 214.0, 213.0, 207.0, 253.0, 256.0, 133.0, 228.0, 225.0]
2025-08-07 12:18:47,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 11 minutes, 29 seconds)
2025-08-07 12:20:43,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:20:46,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1057.19250 ± 342.400
2025-08-07 12:20:46,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [566.6313, 432.86337, 1459.0608, 1160.8601, 1416.5239, 1056.3995, 849.25073, 918.97595, 1360.4498, 1350.9095]
2025-08-07 12:20:46,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 81.0, 287.0, 211.0, 287.0, 202.0, 167.0, 170.0, 273.0, 250.0]
2025-08-07 12:20:46,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 9 minutes, 29 seconds)
2025-08-07 12:22:42,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:22:45,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1203.47095 ± 539.508
2025-08-07 12:22:45,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [634.3159, 1593.8795, 909.75214, 1279.5702, 1808.0417, 1435.2286, 2140.3516, 250.04036, 839.05804, 1144.4718]
2025-08-07 12:22:45,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 328.0, 179.0, 238.0, 343.0, 280.0, 391.0, 49.0, 157.0, 209.0]
2025-08-07 12:22:45,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (1203.47) for latency MM1Queue_a033_s075
2025-08-07 12:22:45,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 7 minutes, 27 seconds)
2025-08-07 12:24:43,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:24:47,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1235.57983 ± 413.293
2025-08-07 12:24:47,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1116.0518, 1109.2728, 1034.9077, 1995.833, 1338.6617, 1125.5242, 1960.9045, 1091.8175, 1009.98596, 572.8392]
2025-08-07 12:24:47,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [216.0, 201.0, 195.0, 394.0, 243.0, 223.0, 368.0, 214.0, 181.0, 114.0]
2025-08-07 12:24:47,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (1235.58) for latency MM1Queue_a033_s075
2025-08-07 12:24:47,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 5 minutes, 50 seconds)
2025-08-07 12:26:44,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:26:48,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1198.66174 ± 513.407
2025-08-07 12:26:48,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [639.6057, 501.07892, 2035.9331, 1601.6677, 1065.6444, 1153.8398, 1122.9788, 775.9764, 1030.48, 2059.4116]
2025-08-07 12:26:48,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 100.0, 401.0, 313.0, 205.0, 219.0, 202.0, 152.0, 188.0, 385.0]
2025-08-07 12:26:48,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 3 minutes, 53 seconds)
2025-08-07 12:28:41,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:28:44,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 972.84619 ± 341.476
2025-08-07 12:28:44,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1295.9825, 1537.9847, 705.57935, 908.57916, 1485.64, 931.4534, 906.4361, 882.9538, 605.7821, 468.07077]
2025-08-07 12:28:44,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [236.0, 289.0, 129.0, 179.0, 260.0, 161.0, 169.0, 156.0, 108.0, 83.0]
2025-08-07 12:28:44,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 1 minute, 42 seconds)
2025-08-07 12:30:40,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:30:44,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1335.89624 ± 395.988
2025-08-07 12:30:44,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1222.2983, 675.8724, 747.46747, 1191.2904, 1313.4961, 1887.218, 1399.8962, 1310.0444, 1835.1881, 1776.1903]
2025-08-07 12:30:44,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [219.0, 119.0, 156.0, 215.0, 271.0, 335.0, 269.0, 233.0, 333.0, 310.0]
2025-08-07 12:30:44,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (1335.90) for latency MM1Queue_a033_s075
2025-08-07 12:30:44,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 59 minutes, 46 seconds)
2025-08-07 12:32:40,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:32:43,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1128.32690 ± 549.902
2025-08-07 12:32:43,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [910.6369, 149.9633, 1570.9166, 2427.73, 1191.4067, 989.49023, 1143.4596, 819.5507, 1022.7899, 1057.3245]
2025-08-07 12:32:43,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [164.0, 29.0, 284.0, 444.0, 207.0, 189.0, 207.0, 146.0, 183.0, 192.0]
2025-08-07 12:32:43,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 57 minutes, 47 seconds)
2025-08-07 12:34:39,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:34:42,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1359.62671 ± 660.705
2025-08-07 12:34:42,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1412.5365, 875.8625, 2418.151, 926.98004, 853.4846, 814.143, 2822.8281, 1114.3142, 1160.4386, 1197.5288]
2025-08-07 12:34:42,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [251.0, 159.0, 434.0, 165.0, 168.0, 161.0, 513.0, 206.0, 222.0, 214.0]
2025-08-07 12:34:42,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (1359.63) for latency MM1Queue_a033_s075
2025-08-07 12:34:42,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 55 minutes, 34 seconds)
2025-08-07 12:36:38,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:36:43,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1795.34802 ± 1180.129
2025-08-07 12:36:43,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [764.6278, 2148.154, 3936.602, 2811.4595, 1120.7393, 875.80194, 579.44763, 654.8106, 3521.0442, 1540.7931]
2025-08-07 12:36:43,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 397.0, 759.0, 527.0, 209.0, 171.0, 109.0, 134.0, 677.0, 304.0]
2025-08-07 12:36:43,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (1795.35) for latency MM1Queue_a033_s075
2025-08-07 12:36:43,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 53 minutes, 38 seconds)
2025-08-07 12:38:45,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:38:50,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1792.19885 ± 871.290
2025-08-07 12:38:50,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [995.0202, 2102.971, 1694.2706, 3341.5007, 1143.7576, 2404.4744, 521.9043, 1664.7126, 1048.3667, 3005.0107]
2025-08-07 12:38:50,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [190.0, 377.0, 312.0, 613.0, 227.0, 442.0, 97.0, 305.0, 198.0, 556.0]
2025-08-07 12:38:50,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 52 minutes, 32 seconds)
2025-08-07 12:41:02,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:41:06,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1379.95239 ± 676.141
2025-08-07 12:41:06,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1232.0438, 1027.9806, 2461.6323, 962.36426, 885.7359, 2782.5305, 891.9903, 1734.4513, 729.7259, 1091.0688]
2025-08-07 12:41:06,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [230.0, 179.0, 455.0, 197.0, 180.0, 535.0, 164.0, 320.0, 132.0, 199.0]
2025-08-07 12:41:06,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 51 minutes, 52 seconds)
2025-08-07 12:43:21,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:43:25,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1341.79382 ± 744.540
2025-08-07 12:43:25,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2692.9395, 1743.3143, 732.4061, 565.99493, 1120.3384, 750.6221, 1915.0742, 525.71735, 2376.2634, 995.26764]
2025-08-07 12:43:25,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [519.0, 333.0, 143.0, 100.0, 227.0, 157.0, 354.0, 105.0, 450.0, 214.0]
2025-08-07 12:43:25,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 51 minutes, 24 seconds)
2025-08-07 12:45:36,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:45:40,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1634.54065 ± 663.418
2025-08-07 12:45:40,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1004.85236, 1743.3567, 1429.9077, 2117.8418, 3294.5012, 1262.0693, 1041.8032, 2002.1411, 1209.7332, 1239.2006]
2025-08-07 12:45:40,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [201.0, 308.0, 249.0, 372.0, 586.0, 218.0, 186.0, 360.0, 206.0, 221.0]
2025-08-07 12:45:40,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 50 minutes, 27 seconds)
2025-08-07 12:47:52,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:47:57,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1873.54590 ± 1316.908
2025-08-07 12:47:57,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1985.3429, 1203.3313, 1217.3527, 1216.9209, 5429.1636, 426.058, 2241.9463, 2431.371, 1573.1436, 1010.82916]
2025-08-07 12:47:57,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [359.0, 213.0, 231.0, 216.0, 1000.0, 76.0, 410.0, 434.0, 279.0, 184.0]
2025-08-07 12:47:57,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (1873.55) for latency MM1Queue_a033_s075
2025-08-07 12:47:57,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 49 minutes, 25 seconds)
2025-08-07 12:50:15,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:50:23,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2577.82007 ± 1740.850
2025-08-07 12:50:23,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [360.69357, 1056.9648, 5317.54, 1255.3623, 622.8258, 2765.3435, 5468.319, 2330.1443, 2798.5356, 3802.4734]
2025-08-07 12:50:23,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 215.0, 1000.0, 237.0, 138.0, 512.0, 1000.0, 409.0, 514.0, 713.0]
2025-08-07 12:50:23,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (2577.82) for latency MM1Queue_a033_s075
2025-08-07 12:50:23,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 48 minutes, 28 seconds)
2025-08-07 12:52:32,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:52:35,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1157.85620 ± 449.956
2025-08-07 12:52:35,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [857.0458, 1359.0477, 835.3688, 716.79944, 1598.0308, 562.5458, 1841.116, 1869.6074, 933.6525, 1005.34863]
2025-08-07 12:52:35,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [149.0, 252.0, 151.0, 124.0, 287.0, 101.0, 358.0, 344.0, 167.0, 202.0]
2025-08-07 12:52:35,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 45 minutes, 55 seconds)
2025-08-07 12:54:51,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:54:55,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1555.21240 ± 654.175
2025-08-07 12:54:55,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1473.3538, 1479.6637, 1062.7551, 2299.0488, 2020.0128, 1031.7275, 411.13184, 1342.8301, 1601.6438, 2829.9573]
2025-08-07 12:54:55,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [269.0, 265.0, 219.0, 417.0, 390.0, 197.0, 79.0, 250.0, 289.0, 532.0]
2025-08-07 12:54:55,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 43 minutes, 41 seconds)
2025-08-07 12:57:09,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:57:16,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2236.29321 ± 1511.636
2025-08-07 12:57:16,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [573.9717, 1236.4349, 5416.1284, 799.3609, 1263.4314, 3969.3687, 3713.2708, 2106.5076, 1811.2357, 1473.2201]
2025-08-07 12:57:16,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 218.0, 1000.0, 145.0, 251.0, 737.0, 659.0, 388.0, 319.0, 288.0]
2025-08-07 12:57:16,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 41 minutes, 43 seconds)
2025-08-07 12:59:30,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:59:38,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2893.25928 ± 1547.258
2025-08-07 12:59:38,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1110.5908, 2068.1157, 2397.2356, 2549.2666, 1833.5312, 5346.7456, 2450.9087, 4733.5337, 5316.1787, 1126.4873]
2025-08-07 12:59:38,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [196.0, 387.0, 435.0, 477.0, 368.0, 1000.0, 428.0, 861.0, 1000.0, 228.0]
2025-08-07 12:59:38,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (2893.26) for latency MM1Queue_a033_s075
2025-08-07 12:59:38,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 39 minutes, 43 seconds)
2025-08-07 13:02:00,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:02:08,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2544.32910 ± 1409.403
2025-08-07 13:02:08,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [673.90283, 1162.0626, 3252.995, 1631.1426, 1576.661, 3053.1003, 1560.899, 4353.768, 2883.54, 5295.221]
2025-08-07 13:02:08,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 213.0, 620.0, 334.0, 307.0, 591.0, 305.0, 805.0, 552.0, 1000.0]
2025-08-07 13:02:08,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 37 minutes, 36 seconds)
2025-08-07 13:04:21,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:04:32,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3422.41357 ± 1687.888
2025-08-07 13:04:32,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [3605.6562, 2267.2559, 2599.6748, 5278.297, 841.6863, 1412.639, 2236.6855, 5395.728, 5316.622, 5269.895]
2025-08-07 13:04:32,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [655.0, 443.0, 483.0, 1000.0, 147.0, 243.0, 417.0, 1000.0, 1000.0, 1000.0]
2025-08-07 13:04:32,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (3422.41) for latency MM1Queue_a033_s075
2025-08-07 13:04:32,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 35 minutes, 49 seconds)
2025-08-07 13:06:45,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:06:53,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2876.38037 ± 1666.220
2025-08-07 13:06:53,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1376.0126, 5458.5513, 5426.1553, 630.6855, 2513.9329, 975.042, 3009.6624, 1824.1063, 4412.7065, 3136.9492]
2025-08-07 13:06:53,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [276.0, 1000.0, 1000.0, 112.0, 452.0, 187.0, 540.0, 322.0, 812.0, 575.0]
2025-08-07 13:06:53,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 33 minutes, 30 seconds)
2025-08-07 13:09:08,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:09:15,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2577.45752 ± 1248.742
2025-08-07 13:09:15,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2674.6206, 3040.7964, 2842.0376, 3637.9817, 783.4781, 1354.0848, 1273.3346, 1567.8799, 4934.2905, 3666.068]
2025-08-07 13:09:15,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [474.0, 551.0, 509.0, 638.0, 138.0, 253.0, 222.0, 277.0, 889.0, 647.0]
2025-08-07 13:09:15,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 31 minutes, 10 seconds)
2025-08-07 13:11:32,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:11:41,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3172.35864 ± 1766.331
2025-08-07 13:11:41,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2726.3267, 5706.79, 5660.806, 408.65506, 1315.6556, 4423.6733, 1483.8329, 2292.6453, 4621.667, 3083.5347]
2025-08-07 13:11:41,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [473.0, 1000.0, 1000.0, 78.0, 226.0, 760.0, 291.0, 413.0, 819.0, 532.0]
2025-08-07 13:11:41,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 28 minutes, 53 seconds)
2025-08-07 13:13:50,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:14:01,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3686.06201 ± 1441.753
2025-08-07 13:14:01,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2164.496, 5448.1616, 5383.783, 2742.8682, 5426.182, 2425.622, 3039.1248, 1998.1664, 5408.461, 2823.755]
2025-08-07 13:14:01,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [371.0, 1000.0, 1000.0, 490.0, 1000.0, 433.0, 551.0, 393.0, 1000.0, 523.0]
2025-08-07 13:14:01,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (3686.06) for latency MM1Queue_a033_s075
2025-08-07 13:14:01,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 26 minutes, 10 seconds)
2025-08-07 13:16:17,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:16:25,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2563.03931 ± 934.414
2025-08-07 13:16:25,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [3757.3098, 2171.2935, 2605.6196, 1587.0839, 4344.5073, 2007.0809, 1784.0349, 3588.2827, 2173.2544, 1611.9252]
2025-08-07 13:16:25,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [709.0, 408.0, 479.0, 293.0, 814.0, 375.0, 357.0, 646.0, 402.0, 295.0]
2025-08-07 13:16:25,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 23 minutes, 46 seconds)
2025-08-07 13:18:40,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:18:51,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3438.30322 ± 1925.383
2025-08-07 13:18:51,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2422.5852, 5585.74, 5487.312, 4127.1987, 1611.2432, 5458.7026, 1311.2306, 2587.5078, 5493.6074, 297.9079]
2025-08-07 13:18:51,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [444.0, 1000.0, 1000.0, 812.0, 318.0, 1000.0, 246.0, 482.0, 1000.0, 58.0]
2025-08-07 13:18:51,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 21 minutes, 31 seconds)
2025-08-07 13:21:06,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:21:17,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3838.10498 ± 1311.162
2025-08-07 13:21:17,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [3159.2278, 2431.6045, 2780.487, 5442.436, 2280.748, 5407.5317, 5426.7173, 2984.916, 3092.0156, 5375.3667]
2025-08-07 13:21:17,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [578.0, 471.0, 517.0, 1000.0, 435.0, 1000.0, 1000.0, 542.0, 566.0, 1000.0]
2025-08-07 13:21:17,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (3838.10) for latency MM1Queue_a033_s075
2025-08-07 13:21:17,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 19 minutes, 15 seconds)
2025-08-07 13:23:33,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:23:45,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3975.55151 ± 1752.723
2025-08-07 13:23:45,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1599.9274, 5510.447, 2635.1938, 5403.6055, 4236.1904, 3558.39, 5487.539, 5446.6133, 484.9081, 5392.7046]
2025-08-07 13:23:45,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [286.0, 1000.0, 469.0, 1000.0, 762.0, 671.0, 1000.0, 1000.0, 91.0, 1000.0]
2025-08-07 13:23:45,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (3975.55) for latency MM1Queue_a033_s075
2025-08-07 13:23:45,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 16 minutes, 53 seconds)
2025-08-07 13:26:05,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:26:18,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4158.38770 ± 1953.058
2025-08-07 13:26:18,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5504.949, 5284.547, 876.63666, 5291.823, 547.72974, 5478.2285, 5257.765, 5517.033, 5502.2534, 2322.9082]
2025-08-07 13:26:18,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 199.0, 1000.0, 97.0, 1000.0, 1000.0, 1000.0, 1000.0, 440.0]
2025-08-07 13:26:18,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (4158.39) for latency MM1Queue_a033_s075
2025-08-07 13:26:18,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 14 minutes, 44 seconds)
2025-08-07 13:28:31,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:28:42,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3576.61719 ± 1886.454
2025-08-07 13:28:42,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [3006.4495, 521.1892, 5421.06, 2265.954, 623.2858, 4914.5527, 2866.507, 5375.2314, 5447.353, 5324.5884]
2025-08-07 13:28:42,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [561.0, 100.0, 1000.0, 443.0, 118.0, 851.0, 525.0, 1000.0, 1000.0, 1000.0]
2025-08-07 13:28:42,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 12 minutes, 16 seconds)
2025-08-07 13:30:58,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:31:10,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4229.92285 ± 1712.728
2025-08-07 13:31:10,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5560.574, 3075.3462, 1349.9248, 4954.7153, 5582.6323, 5653.494, 1061.2198, 5560.324, 3975.2603, 5525.73]
2025-08-07 13:31:10,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 548.0, 264.0, 897.0, 1000.0, 1000.0, 179.0, 1000.0, 705.0, 1000.0]
2025-08-07 13:31:10,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (4229.92) for latency MM1Queue_a033_s075
2025-08-07 13:31:10,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 9 minutes, 51 seconds)
2025-08-07 13:33:25,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:33:35,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3434.34644 ± 1694.835
2025-08-07 13:33:35,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5094.1973, 5428.944, 4778.6343, 5361.899, 1035.1556, 4521.5303, 2246.0364, 929.96014, 2483.7, 2463.4087]
2025-08-07 13:33:35,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [961.0, 1000.0, 904.0, 1000.0, 185.0, 867.0, 436.0, 180.0, 443.0, 462.0]
2025-08-07 13:33:35,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes, 22 seconds)
2025-08-07 13:35:51,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:36:02,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3475.09375 ± 1737.614
2025-08-07 13:36:02,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5318.944, 5235.1997, 1462.0913, 1313.0244, 5271.5693, 4159.6816, 1511.8557, 1460.9291, 5291.19, 3726.4539]
2025-08-07 13:36:02,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 286.0, 239.0, 1000.0, 830.0, 297.0, 259.0, 1000.0, 735.0]
2025-08-07 13:36:02,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 54 seconds)
2025-08-07 13:38:19,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:38:33,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4368.35059 ± 1401.546
2025-08-07 13:38:33,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5237.489, 1399.7311, 4606.5293, 4726.411, 5297.081, 1864.0977, 5315.5776, 5357.9883, 4601.753, 5276.8423]
2025-08-07 13:38:33,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 253.0, 906.0, 931.0, 1000.0, 348.0, 1000.0, 1000.0, 888.0, 1000.0]
2025-08-07 13:38:33,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (4368.35) for latency MM1Queue_a033_s075
2025-08-07 13:38:33,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 27 seconds)
2025-08-07 13:40:45,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:40:59,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4620.76562 ± 1863.188
2025-08-07 13:40:59,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5618.1562, 5617.913, 1023.1398, 5523.128, 5403.232, 5608.7134, 5467.2524, 774.5057, 5646.101, 5525.514]
2025-08-07 13:40:59,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 204.0, 1000.0, 975.0, 1000.0, 1000.0, 149.0, 1000.0, 1000.0]
2025-08-07 13:40:59,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (4620.77) for latency MM1Queue_a033_s075
2025-08-07 13:40:59,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1251 [DEBUG]: Training session finished
