2025-05-11 03:23:26,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem4
2025-05-11 03:23:26,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem4
2025-05-11 03:23:26,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x76dd343c5c70>}
2025-05-11 03:23:26,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1111 [DEBUG]: using device: cpu
2025-05-11 03:23:26,459 baseline-bpql-noisy-humanoid:77 [WARNING]: args.assumed_delay != args.horizon: 4 != 24
2025-05-11 03:23:26,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1133 [INFO]: Creating new trainer
2025-05-11 03:23:26,472 baseline-bpql-noisy-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=444, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-05-11 03:23:26,472 baseline-bpql-noisy-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-11 03:23:30,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1194 [DEBUG]: Starting training session...
2025-05-11 03:23:30,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 1/100
2025-05-11 03:27:03,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:27:04,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 325.21564 ± 110.286
2025-05-11 03:27:04,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [408.04263, 230.36046, 336.27167, 238.45659, 245.25752, 356.2173, 176.1762, 488.94318, 517.18744, 255.24307]
2025-05-11 03:27:04,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 48.0, 68.0, 49.0, 50.0, 74.0, 36.0, 92.0, 108.0, 53.0]
2025-05-11 03:27:04,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (325.22) for latency MM1Queue_a033_s075
2025-05-11 03:27:04,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 03:27:04,787 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 03:27:04,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 54 minutes, 14 seconds)
2025-05-11 03:31:05,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:31:07,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 395.75226 ± 75.956
2025-05-11 03:31:07,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [347.89435, 339.428, 488.2177, 341.9841, 395.88776, 401.34906, 250.26445, 416.5014, 455.27057, 520.72534]
2025-05-11 03:31:07,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 62.0, 90.0, 63.0, 74.0, 76.0, 50.0, 82.0, 88.0, 107.0]
2025-05-11 03:31:07,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (395.75) for latency MM1Queue_a033_s075
2025-05-11 03:31:07,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 03:31:07,427 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 03:31:07,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 13 minutes, 29 seconds)
2025-05-11 03:35:10,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:35:12,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 465.61108 ± 111.366
2025-05-11 03:35:12,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [488.293, 393.98825, 770.1948, 459.9231, 379.27057, 495.80762, 462.18515, 388.30734, 458.2056, 359.93558]
2025-05-11 03:35:12,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 86.0, 145.0, 90.0, 72.0, 91.0, 96.0, 83.0, 87.0, 70.0]
2025-05-11 03:35:12,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (465.61) for latency MM1Queue_a033_s075
2025-05-11 03:35:12,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 03:35:12,291 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 03:35:12,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 6 hours, 18 minutes, 24 seconds)
2025-05-11 03:39:19,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:39:21,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 391.45044 ± 98.233
2025-05-11 03:39:21,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [293.66098, 437.1437, 308.35306, 658.99664, 380.12814, 361.13116, 388.07297, 336.2497, 345.9006, 404.86755]
2025-05-11 03:39:21,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 83.0, 67.0, 133.0, 84.0, 67.0, 84.0, 67.0, 75.0, 89.0]
2025-05-11 03:39:21,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 6 hours, 20 minutes, 33 seconds)
2025-05-11 03:43:27,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:43:29,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 437.80185 ± 128.369
2025-05-11 03:43:29,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [556.3242, 332.8741, 382.84598, 605.6064, 365.54343, 360.04153, 695.50037, 328.59988, 300.68716, 449.99527]
2025-05-11 03:43:29,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 61.0, 73.0, 119.0, 76.0, 79.0, 134.0, 67.0, 66.0, 89.0]
2025-05-11 03:43:29,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 6 hours, 19 minutes, 55 seconds)
2025-05-11 03:47:35,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:47:37,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 387.26434 ± 145.558
2025-05-11 03:47:37,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [305.03726, 340.0371, 331.77832, 383.25278, 410.3493, 226.02698, 376.75485, 798.21063, 378.70145, 322.49487]
2025-05-11 03:47:37,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 73.0, 73.0, 85.0, 74.0, 47.0, 81.0, 155.0, 84.0, 74.0]
2025-05-11 03:47:37,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 6 hours, 26 minutes, 11 seconds)
2025-05-11 03:51:43,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:51:45,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 537.25751 ± 187.016
2025-05-11 03:51:45,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [627.48145, 668.7431, 503.97665, 426.63547, 989.9027, 303.99597, 359.77365, 514.9423, 402.14417, 574.98016]
2025-05-11 03:51:45,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 139.0, 99.0, 82.0, 199.0, 64.0, 68.0, 97.0, 77.0, 116.0]
2025-05-11 03:51:45,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (537.26) for latency MM1Queue_a033_s075
2025-05-11 03:51:45,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 03:51:45,568 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 03:51:45,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 6 hours, 23 minutes, 49 seconds)
2025-05-11 03:55:52,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:55:54,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 475.89706 ± 109.798
2025-05-11 03:55:54,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [355.5803, 631.2895, 413.53845, 318.22394, 609.01776, 528.9287, 591.58984, 495.756, 338.52512, 476.52087]
2025-05-11 03:55:54,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 118.0, 87.0, 62.0, 119.0, 115.0, 117.0, 93.0, 63.0, 107.0]
2025-05-11 03:55:54,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 6 hours, 20 minutes, 57 seconds)
2025-05-11 04:00:00,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:00:02,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 413.65479 ± 105.998
2025-05-11 04:00:02,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [483.9296, 572.6152, 374.49026, 447.55945, 469.76126, 495.9094, 372.51788, 448.1347, 194.48346, 277.14667]
2025-05-11 04:00:02,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 125.0, 73.0, 84.0, 99.0, 94.0, 72.0, 86.0, 40.0, 52.0]
2025-05-11 04:00:02,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 16 minutes, 29 seconds)
2025-05-11 04:04:11,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:04:13,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 459.24445 ± 80.268
2025-05-11 04:04:13,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [451.54956, 515.52344, 415.9854, 521.86523, 469.1741, 525.9489, 417.0786, 251.96696, 494.028, 529.32404]
2025-05-11 04:04:13,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 98.0, 79.0, 97.0, 89.0, 100.0, 79.0, 47.0, 94.0, 99.0]
2025-05-11 04:04:13,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 6 hours, 13 minutes, 6 seconds)
2025-05-11 04:08:26,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:08:28,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 580.74274 ± 149.267
2025-05-11 04:08:28,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [490.84998, 655.9819, 968.3321, 461.22238, 425.44745, 639.8777, 507.71857, 501.44516, 629.78925, 526.763]
2025-05-11 04:08:28,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 124.0, 189.0, 89.0, 84.0, 123.0, 91.0, 95.0, 122.0, 98.0]
2025-05-11 04:08:28,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (580.74) for latency MM1Queue_a033_s075
2025-05-11 04:08:28,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 04:08:28,553 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 04:08:28,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 6 hours, 11 minutes, 11 seconds)
2025-05-11 04:12:34,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:12:36,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 526.01721 ± 175.081
2025-05-11 04:12:36,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [432.2862, 470.22678, 551.39905, 1022.8234, 483.7707, 523.30304, 432.2495, 341.92047, 474.1004, 528.0926]
2025-05-11 04:12:36,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 94.0, 104.0, 196.0, 104.0, 96.0, 85.0, 78.0, 90.0, 96.0]
2025-05-11 04:12:36,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 6 hours, 7 minutes)
2025-05-11 04:16:43,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:16:45,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 492.00473 ± 138.255
2025-05-11 04:16:45,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [582.94324, 633.2257, 552.9279, 240.01901, 394.0166, 745.64386, 422.97476, 509.2155, 475.32666, 363.75427]
2025-05-11 04:16:45,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 120.0, 106.0, 53.0, 75.0, 145.0, 78.0, 98.0, 89.0, 67.0]
2025-05-11 04:16:45,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 6 hours, 2 minutes, 46 seconds)
2025-05-11 04:20:54,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:20:56,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 594.43634 ± 154.883
2025-05-11 04:20:56,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [553.0627, 399.27902, 511.84277, 576.0254, 688.73517, 409.7317, 495.51978, 832.95953, 882.95416, 594.25366]
2025-05-11 04:20:56,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 87.0, 99.0, 109.0, 145.0, 78.0, 98.0, 178.0, 189.0, 111.0]
2025-05-11 04:20:56,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (594.44) for latency MM1Queue_a033_s075
2025-05-11 04:20:56,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 04:20:56,917 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 04:20:56,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 59 minutes, 32 seconds)
2025-05-11 04:25:10,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:25:12,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 485.43512 ± 125.215
2025-05-11 04:25:12,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [189.08987, 627.7785, 575.032, 432.99564, 531.7669, 493.8069, 489.055, 600.9098, 355.1217, 558.79504]
2025-05-11 04:25:12,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [37.0, 117.0, 108.0, 85.0, 104.0, 99.0, 89.0, 112.0, 64.0, 117.0]
2025-05-11 04:25:12,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 56 minutes, 46 seconds)
2025-05-11 04:29:17,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:29:20,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 549.63043 ± 160.477
2025-05-11 04:29:20,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [671.7915, 461.45126, 533.0387, 891.8575, 559.5156, 486.0254, 346.75476, 723.5056, 453.394, 368.9698]
2025-05-11 04:29:20,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 98.0, 103.0, 174.0, 113.0, 96.0, 74.0, 153.0, 99.0, 75.0]
2025-05-11 04:29:20,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 50 minutes, 29 seconds)
2025-05-11 04:33:27,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:33:30,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 634.19983 ± 120.262
2025-05-11 04:33:30,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [413.5811, 669.7694, 756.3609, 727.4237, 759.81305, 581.7978, 634.593, 420.05637, 671.6582, 706.9449]
2025-05-11 04:33:30,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 132.0, 157.0, 140.0, 147.0, 115.0, 138.0, 80.0, 129.0, 145.0]
2025-05-11 04:33:30,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (634.20) for latency MM1Queue_a033_s075
2025-05-11 04:33:30,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 04:33:30,445 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 04:33:30,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 46 minutes, 51 seconds)
2025-05-11 04:37:36,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:37:39,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 563.52283 ± 167.464
2025-05-11 04:37:39,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [873.378, 662.3437, 227.83646, 689.87195, 560.00006, 541.8589, 421.09668, 492.20148, 484.88242, 681.7587]
2025-05-11 04:37:39,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [186.0, 128.0, 45.0, 133.0, 112.0, 113.0, 78.0, 106.0, 94.0, 133.0]
2025-05-11 04:37:39,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 42 minutes, 37 seconds)
2025-05-11 04:41:45,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:41:48,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 575.41724 ± 96.585
2025-05-11 04:41:48,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [720.88745, 609.6873, 709.48395, 536.4293, 379.82907, 507.11832, 609.05634, 617.1222, 501.09912, 563.45905]
2025-05-11 04:41:48,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 130.0, 142.0, 105.0, 69.0, 95.0, 116.0, 117.0, 109.0, 107.0]
2025-05-11 04:41:48,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 37 minutes, 56 seconds)
2025-05-11 04:46:04,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:46:07,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 584.34790 ± 53.765
2025-05-11 04:46:07,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [553.38385, 578.96484, 506.4335, 555.53595, 620.777, 570.5805, 608.5904, 707.1535, 612.7316, 529.3283]
2025-05-11 04:46:07,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 109.0, 95.0, 118.0, 119.0, 106.0, 115.0, 133.0, 114.0, 102.0]
2025-05-11 04:46:07,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 34 minutes, 36 seconds)
2025-05-11 04:50:21,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:50:24,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 603.55945 ± 101.508
2025-05-11 04:50:24,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [549.0836, 714.0073, 385.16483, 626.1417, 632.5147, 741.6685, 687.3168, 628.1003, 574.21844, 497.37827]
2025-05-11 04:50:24,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 140.0, 83.0, 123.0, 120.0, 142.0, 134.0, 124.0, 110.0, 102.0]
2025-05-11 04:50:24,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 32 minutes, 49 seconds)
2025-05-11 04:54:39,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:54:42,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 614.40686 ± 194.669
2025-05-11 04:54:42,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [525.7577, 823.5744, 560.1581, 677.18604, 574.2565, 490.105, 557.38605, 564.8721, 1064.61, 306.16248]
2025-05-11 04:54:42,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 175.0, 105.0, 138.0, 109.0, 92.0, 117.0, 124.0, 206.0, 71.0]
2025-05-11 04:54:42,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 30 minutes, 44 seconds)
2025-05-11 04:59:00,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:59:04,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 785.72742 ± 244.670
2025-05-11 04:59:04,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1254.196, 927.95374, 637.3297, 464.48364, 658.16516, 775.71924, 554.9456, 1133.9154, 858.60114, 591.9641]
2025-05-11 04:59:04,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [262.0, 189.0, 122.0, 104.0, 130.0, 161.0, 106.0, 218.0, 167.0, 111.0]
2025-05-11 04:59:04,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (785.73) for latency MM1Queue_a033_s075
2025-05-11 04:59:04,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 04:59:04,166 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 04:59:04,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 29 minutes, 50 seconds)
2025-05-11 05:03:14,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:03:17,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 603.76373 ± 128.171
2025-05-11 05:03:17,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [549.92267, 427.16782, 758.8777, 708.57056, 606.75714, 505.40424, 488.6187, 468.32834, 718.4624, 805.528]
2025-05-11 05:03:17,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 92.0, 146.0, 146.0, 118.0, 105.0, 96.0, 102.0, 137.0, 156.0]
2025-05-11 05:03:17,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 26 minutes, 26 seconds)
2025-05-11 05:07:33,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:07:36,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 633.14093 ± 182.345
2025-05-11 05:07:36,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [321.13687, 716.43134, 707.3482, 710.766, 530.34674, 583.87787, 370.83694, 964.58563, 791.34186, 634.73755]
2025-05-11 05:07:36,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 136.0, 139.0, 139.0, 109.0, 118.0, 72.0, 187.0, 159.0, 142.0]
2025-05-11 05:07:36,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 22 minutes, 9 seconds)
2025-05-11 05:11:50,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:11:53,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 599.65955 ± 223.642
2025-05-11 05:11:53,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [694.58795, 886.88434, 550.22797, 921.83746, 698.4863, 589.03125, 682.0577, 481.5184, 349.91718, 142.04715]
2025-05-11 05:11:53,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 178.0, 108.0, 179.0, 137.0, 113.0, 148.0, 101.0, 75.0, 28.0]
2025-05-11 05:11:53,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 17 minutes, 56 seconds)
2025-05-11 05:16:03,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:16:06,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 732.66180 ± 206.166
2025-05-11 05:16:06,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [794.46375, 785.2687, 830.61664, 1123.4735, 588.37555, 839.5851, 454.99588, 918.2555, 503.45978, 488.12424]
2025-05-11 05:16:06,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 153.0, 164.0, 225.0, 114.0, 164.0, 85.0, 193.0, 113.0, 94.0]
2025-05-11 05:16:06,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 12 minutes, 32 seconds)
2025-05-11 05:20:26,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:20:29,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 718.74060 ± 252.295
2025-05-11 05:20:29,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [791.48016, 447.02484, 540.0433, 1299.0223, 554.0254, 434.56335, 829.4737, 645.626, 967.0939, 679.0534]
2025-05-11 05:20:29,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 85.0, 108.0, 259.0, 104.0, 87.0, 158.0, 124.0, 192.0, 141.0]
2025-05-11 05:20:29,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 5 hours, 8 minutes, 28 seconds)
2025-05-11 05:24:38,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:24:41,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 666.62469 ± 184.432
2025-05-11 05:24:41,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [500.7019, 656.3201, 431.65622, 678.79095, 758.6388, 889.49493, 942.26373, 337.1605, 685.7248, 785.49536]
2025-05-11 05:24:41,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 124.0, 81.0, 140.0, 146.0, 169.0, 192.0, 76.0, 131.0, 155.0]
2025-05-11 05:24:41,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 5 hours, 3 minutes, 53 seconds)
2025-05-11 05:28:57,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:29:01,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 735.54388 ± 266.287
2025-05-11 05:29:01,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [878.2241, 726.9892, 469.70172, 468.34473, 1147.055, 487.34985, 784.61285, 361.4934, 1105.8356, 925.8324]
2025-05-11 05:29:01,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 144.0, 101.0, 107.0, 226.0, 94.0, 168.0, 80.0, 214.0, 183.0]
2025-05-11 05:29:01,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 59 minutes, 49 seconds)
2025-05-11 05:33:41,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:33:45,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 795.73474 ± 262.292
2025-05-11 05:33:45,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [591.63, 603.494, 683.7457, 405.81927, 720.34906, 1008.4372, 1041.017, 1345.7446, 902.227, 654.8839]
2025-05-11 05:33:45,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 111.0, 134.0, 74.0, 139.0, 204.0, 203.0, 274.0, 176.0, 125.0]
2025-05-11 05:33:45,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (795.73) for latency MM1Queue_a033_s075
2025-05-11 05:33:45,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 05:33:45,855 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 05:33:45,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 5 hours, 1 minute, 55 seconds)
2025-05-11 05:37:58,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:38:02,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 818.97888 ± 234.195
2025-05-11 05:38:02,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [788.7934, 1024.6084, 454.42117, 888.651, 1204.6667, 610.1485, 443.21292, 926.87714, 915.0055, 933.4041]
2025-05-11 05:38:02,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [149.0, 201.0, 84.0, 172.0, 240.0, 113.0, 85.0, 203.0, 178.0, 202.0]
2025-05-11 05:38:02,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (818.98) for latency MM1Queue_a033_s075
2025-05-11 05:38:02,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 05:38:02,174 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 05:38:02,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 58 minutes, 6 seconds)
2025-05-11 05:42:20,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:42:23,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 687.79688 ± 214.220
2025-05-11 05:42:23,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [507.63275, 458.22192, 570.4161, 742.73755, 1213.0659, 719.2465, 867.6332, 723.6047, 563.83575, 511.57434]
2025-05-11 05:42:23,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 85.0, 106.0, 140.0, 240.0, 140.0, 167.0, 138.0, 118.0, 94.0]
2025-05-11 05:42:23,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 53 minutes, 30 seconds)
2025-05-11 05:46:39,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:46:43,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 721.04724 ± 311.100
2025-05-11 05:46:43,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [568.49176, 479.29047, 1602.6008, 535.9531, 797.89465, 679.9689, 750.23724, 536.58105, 541.5078, 717.9461]
2025-05-11 05:46:43,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 102.0, 341.0, 102.0, 157.0, 134.0, 147.0, 101.0, 106.0, 141.0]
2025-05-11 05:46:43,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 50 minutes, 52 seconds)
2025-05-11 05:51:02,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:51:06,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 721.16705 ± 289.836
2025-05-11 05:51:06,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [640.36536, 835.31213, 1419.8914, 843.1421, 435.96533, 547.921, 722.17523, 883.5303, 549.98254, 333.3854]
2025-05-11 05:51:06,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 176.0, 291.0, 166.0, 84.0, 113.0, 137.0, 176.0, 116.0, 76.0]
2025-05-11 05:51:06,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 47 minutes, 5 seconds)
2025-05-11 05:55:20,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:55:25,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 922.50256 ± 344.414
2025-05-11 05:55:25,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [561.0374, 1734.7605, 918.7604, 1012.8306, 758.26044, 908.63007, 415.5162, 865.0484, 826.2684, 1223.9135]
2025-05-11 05:55:25,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 351.0, 176.0, 215.0, 161.0, 178.0, 80.0, 170.0, 180.0, 239.0]
2025-05-11 05:55:25,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (922.50) for latency MM1Queue_a033_s075
2025-05-11 05:55:25,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 05:55:25,381 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 05:55:25,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 37 minutes, 13 seconds)
2025-05-11 05:59:43,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:59:47,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 775.81415 ± 185.936
2025-05-11 05:59:47,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [746.50165, 1084.5682, 703.2836, 827.1208, 603.3638, 996.25183, 453.5405, 670.8757, 979.69836, 692.9374]
2025-05-11 05:59:47,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 218.0, 146.0, 160.0, 116.0, 202.0, 83.0, 137.0, 188.0, 131.0]
2025-05-11 05:59:47,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 34 minutes)
2025-05-11 06:04:05,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:04:09,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 681.68463 ± 232.833
2025-05-11 06:04:09,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [395.89606, 667.1426, 734.5467, 772.513, 593.03186, 637.8635, 800.7702, 475.52505, 474.83557, 1264.7216]
2025-05-11 06:04:09,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 142.0, 153.0, 148.0, 113.0, 136.0, 174.0, 105.0, 101.0, 249.0]
2025-05-11 06:04:09,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 29 minutes, 48 seconds)
2025-05-11 06:08:27,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:08:32,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 966.14581 ± 491.525
2025-05-11 06:08:32,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [578.047, 1268.7544, 745.5267, 889.1144, 626.1184, 2071.883, 1246.8842, 1300.3363, 251.8774, 682.9171]
2025-05-11 06:08:32,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 258.0, 143.0, 178.0, 135.0, 421.0, 244.0, 265.0, 49.0, 144.0]
2025-05-11 06:08:32,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (966.15) for latency MM1Queue_a033_s075
2025-05-11 06:08:32,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 06:08:32,329 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 06:08:32,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 26 minutes, 9 seconds)
2025-05-11 06:12:51,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:12:55,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 861.51965 ± 305.106
2025-05-11 06:12:55,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [468.70303, 763.98663, 702.9343, 942.2248, 989.9256, 1498.0043, 665.583, 1198.9092, 920.9744, 463.9509]
2025-05-11 06:12:55,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 151.0, 143.0, 189.0, 193.0, 299.0, 129.0, 238.0, 192.0, 97.0]
2025-05-11 06:12:55,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 21 minutes, 52 seconds)
2025-05-11 06:17:17,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:17:21,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 808.42230 ± 211.557
2025-05-11 06:17:21,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1106.1913, 580.8464, 715.4542, 993.3782, 579.7597, 628.0262, 620.15875, 1035.93, 1100.2095, 724.2687]
2025-05-11 06:17:21,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [233.0, 111.0, 134.0, 193.0, 109.0, 121.0, 119.0, 205.0, 208.0, 151.0]
2025-05-11 06:17:21,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 18 minutes, 50 seconds)
2025-05-11 06:21:36,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:21:39,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 748.23547 ± 386.742
2025-05-11 06:21:39,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [378.05847, 1747.9774, 628.2858, 509.19968, 471.93158, 502.92136, 1070.676, 604.54114, 675.0722, 893.6907]
2025-05-11 06:21:39,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 357.0, 118.0, 97.0, 91.0, 100.0, 209.0, 115.0, 128.0, 175.0]
2025-05-11 06:21:39,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 13 minutes, 50 seconds)
2025-05-11 06:25:58,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:26:03,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 925.57263 ± 426.908
2025-05-11 06:26:03,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1424.4381, 1388.5746, 920.86816, 670.919, 580.0052, 901.4248, 530.0033, 287.84235, 1703.5309, 848.1196]
2025-05-11 06:26:03,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [315.0, 277.0, 179.0, 128.0, 108.0, 174.0, 113.0, 58.0, 348.0, 162.0]
2025-05-11 06:26:03,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 4 hours, 9 minutes, 46 seconds)
2025-05-11 06:30:20,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:30:25,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 886.33997 ± 247.381
2025-05-11 06:30:25,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [734.71906, 1308.388, 496.2983, 1310.2347, 678.97595, 941.5469, 990.9428, 794.88696, 807.4636, 799.943]
2025-05-11 06:30:25,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 259.0, 90.0, 260.0, 129.0, 189.0, 188.0, 151.0, 156.0, 153.0]
2025-05-11 06:30:25,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 4 hours, 5 minutes, 5 seconds)
2025-05-11 06:34:48,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:34:52,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 873.91931 ± 604.829
2025-05-11 06:34:52,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2649.0, 339.82794, 684.17426, 664.22253, 711.6292, 780.15717, 830.2868, 639.614, 757.6483, 682.63245]
2025-05-11 06:34:52,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [542.0, 71.0, 127.0, 126.0, 141.0, 163.0, 161.0, 118.0, 143.0, 129.0]
2025-05-11 06:34:52,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 4 hours, 1 minute, 27 seconds)
2025-05-11 06:39:16,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:39:24,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1401.31799 ± 1127.109
2025-05-11 06:39:24,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [4067.689, 1214.3911, 688.9548, 2642.5283, 264.32642, 618.2286, 1858.2494, 473.19223, 1507.9486, 677.6712]
2025-05-11 06:39:24,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [853.0, 235.0, 134.0, 555.0, 52.0, 117.0, 395.0, 98.0, 284.0, 150.0]
2025-05-11 06:39:24,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (1401.32) for latency MM1Queue_a033_s075
2025-05-11 06:39:24,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 06:39:24,984 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 06:39:25,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 58 minutes, 13 seconds)
2025-05-11 06:43:39,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:43:45,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1143.96094 ± 688.318
2025-05-11 06:43:45,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [267.8167, 2832.365, 748.9302, 849.15936, 1324.2269, 808.5492, 785.70654, 1462.2485, 680.69073, 1679.9161]
2025-05-11 06:43:45,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [54.0, 600.0, 146.0, 169.0, 254.0, 162.0, 151.0, 282.0, 139.0, 325.0]
2025-05-11 06:43:45,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 54 minutes, 9 seconds)
2025-05-11 06:48:06,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:48:14,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1523.17944 ± 1272.152
2025-05-11 06:48:14,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [862.15063, 1242.9658, 1593.6641, 654.7179, 664.9625, 4723.167, 343.48715, 2927.7651, 773.50964, 1445.4045]
2025-05-11 06:48:14,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 236.0, 308.0, 127.0, 132.0, 1000.0, 77.0, 610.0, 155.0, 294.0]
2025-05-11 06:48:14,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (1523.18) for latency MM1Queue_a033_s075
2025-05-11 06:48:14,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 06:48:14,925 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 06:48:14,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 50 minutes, 43 seconds)
2025-05-11 06:52:31,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:52:39,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1470.49707 ± 871.600
2025-05-11 06:52:39,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1451.3206, 1054.9169, 863.8317, 3309.536, 879.3322, 808.16754, 2345.9045, 458.12833, 2444.6204, 1089.213]
2025-05-11 06:52:39,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [281.0, 209.0, 164.0, 687.0, 169.0, 160.0, 470.0, 85.0, 511.0, 215.0]
2025-05-11 06:52:39,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 46 minutes, 52 seconds)
2025-05-11 06:56:59,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:57:05,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1110.01013 ± 507.818
2025-05-11 06:57:05,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1045.2188, 1111.4941, 389.56168, 1593.3792, 842.58264, 1936.3811, 648.4152, 1906.2188, 664.17487, 962.67487]
2025-05-11 06:57:05,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [210.0, 220.0, 71.0, 328.0, 162.0, 394.0, 123.0, 388.0, 125.0, 187.0]
2025-05-11 06:57:05,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 42 minutes, 6 seconds)
2025-05-11 07:01:25,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:01:31,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1196.34937 ± 516.127
2025-05-11 07:01:31,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [762.45544, 2075.7717, 1750.8472, 1312.8876, 1059.5563, 1352.4053, 342.21448, 777.1623, 1725.1995, 804.99493]
2025-05-11 07:01:31,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [148.0, 420.0, 353.0, 257.0, 222.0, 265.0, 70.0, 144.0, 344.0, 148.0]
2025-05-11 07:01:31,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 36 minutes, 38 seconds)
2025-05-11 07:05:50,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:05:58,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1400.27747 ± 851.520
2025-05-11 07:05:58,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [645.8019, 2725.8904, 820.54236, 794.05945, 667.55536, 1776.9899, 711.2917, 1984.4756, 2974.3096, 901.8585]
2025-05-11 07:05:58,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 562.0, 155.0, 151.0, 131.0, 334.0, 137.0, 401.0, 606.0, 175.0]
2025-05-11 07:05:58,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 33 minutes, 15 seconds)
2025-05-11 07:10:15,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:10:26,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1919.90173 ± 1120.280
2025-05-11 07:10:26,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [4680.7666, 966.7121, 1478.596, 3294.3743, 1321.9149, 1697.4675, 878.799, 1473.2404, 1428.4905, 1978.6561]
2025-05-11 07:10:26,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 194.0, 291.0, 688.0, 259.0, 343.0, 178.0, 301.0, 303.0, 399.0]
2025-05-11 07:10:26,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (1919.90) for latency MM1Queue_a033_s075
2025-05-11 07:10:26,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 07:10:26,454 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 07:10:26,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 28 minutes, 36 seconds)
2025-05-11 07:14:50,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:15:01,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1917.46228 ± 1328.878
2025-05-11 07:15:01,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [3218.4568, 1255.2631, 1529.4823, 1650.6351, 556.2532, 926.5448, 3219.6235, 1187.5917, 738.04706, 4892.726]
2025-05-11 07:15:01,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [638.0, 243.0, 303.0, 332.0, 112.0, 184.0, 654.0, 225.0, 141.0, 1000.0]
2025-05-11 07:15:01,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 25 minutes, 43 seconds)
2025-05-11 07:19:41,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:19:49,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1483.88599 ± 744.605
2025-05-11 07:19:49,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2212.0508, 1243.9666, 2682.7722, 1210.3916, 718.6343, 1576.5746, 463.118, 2030.5725, 2208.812, 491.9678]
2025-05-11 07:19:49,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [455.0, 247.0, 556.0, 237.0, 142.0, 322.0, 86.0, 423.0, 437.0, 93.0]
2025-05-11 07:19:49,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 24 minutes, 37 seconds)
2025-05-11 07:23:56,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:24:14,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2753.29053 ± 1487.276
2025-05-11 07:24:14,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [3622.7778, 1005.19867, 4664.5317, 2455.2004, 4635.1714, 1271.3317, 2806.4307, 1607.1261, 794.6162, 4670.5195]
2025-05-11 07:24:14,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [783.0, 192.0, 1000.0, 498.0, 1000.0, 256.0, 557.0, 313.0, 153.0, 1000.0]
2025-05-11 07:24:14,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (2753.29) for latency MM1Queue_a033_s075
2025-05-11 07:24:14,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 07:24:14,361 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 07:24:14,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 19 minutes, 54 seconds)
2025-05-11 07:28:36,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:28:42,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1201.83423 ± 713.862
2025-05-11 07:28:42,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [197.52693, 668.01227, 2482.279, 170.08174, 1052.4347, 948.3014, 1976.3464, 1240.7595, 1531.5557, 1751.045]
2025-05-11 07:28:42,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [39.0, 144.0, 499.0, 33.0, 208.0, 202.0, 400.0, 252.0, 295.0, 377.0]
2025-05-11 07:28:42,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 3 hours, 15 minutes, 34 seconds)
2025-05-11 07:33:20,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:33:34,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2362.34155 ± 1525.914
2025-05-11 07:33:34,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [5007.656, 1172.0386, 2705.9194, 4927.0127, 2449.7954, 2113.9465, 2887.4453, 1236.8822, 367.4149, 755.3044]
2025-05-11 07:33:34,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 232.0, 538.0, 1000.0, 500.0, 420.0, 600.0, 246.0, 73.0, 149.0]
2025-05-11 07:33:34,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 3 hours, 14 minutes, 15 seconds)
2025-05-11 07:37:48,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:38:01,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2202.93091 ± 1329.627
2025-05-11 07:38:01,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2367.5928, 2202.1326, 1301.2427, 2010.3962, 4752.936, 2773.2302, 1657.2183, 310.54474, 585.6887, 4068.3257]
2025-05-11 07:38:01,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [472.0, 448.0, 265.0, 397.0, 1000.0, 559.0, 330.0, 73.0, 114.0, 816.0]
2025-05-11 07:38:01,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 3 hours, 8 minutes, 34 seconds)
2025-05-11 07:42:39,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:43:00,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3432.95630 ± 1235.772
2025-05-11 07:43:00,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2682.7544, 3862.4114, 4583.148, 2114.5344, 3690.396, 4681.793, 4669.6113, 1329.658, 1966.1302, 4749.123]
2025-05-11 07:43:00,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [565.0, 816.0, 956.0, 426.0, 799.0, 1000.0, 1000.0, 251.0, 403.0, 1000.0]
2025-05-11 07:43:00,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (3432.96) for latency MM1Queue_a033_s075
2025-05-11 07:43:00,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 07:43:00,997 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 07:43:01,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 3 hours, 5 minutes, 32 seconds)
2025-05-11 07:47:06,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:47:14,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1503.17358 ± 1239.405
2025-05-11 07:47:14,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [536.7033, 3413.371, 3631.2412, 2813.4229, 238.43492, 650.69946, 696.60626, 1307.0497, 1472.5707, 271.63687]
2025-05-11 07:47:14,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 657.0, 723.0, 557.0, 47.0, 131.0, 134.0, 244.0, 281.0, 53.0]
2025-05-11 07:47:14,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 59 minutes, 22 seconds)
2025-05-11 07:51:44,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:52:04,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3092.48364 ± 1458.595
2025-05-11 07:52:04,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [4240.259, 3740.7783, 4727.008, 4698.4595, 890.15265, 3978.4045, 554.238, 3364.4097, 1631.4214, 3099.705]
2025-05-11 07:52:04,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [876.0, 782.0, 1000.0, 1000.0, 176.0, 865.0, 105.0, 698.0, 347.0, 638.0]
2025-05-11 07:52:04,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 57 minutes, 35 seconds)
2025-05-11 07:56:21,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:56:46,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3783.46167 ± 1165.081
2025-05-11 07:56:46,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [3350.7483, 4845.0737, 1652.9058, 4776.934, 1973.4385, 2934.2578, 4703.1284, 4161.845, 4723.9707, 4712.3145]
2025-05-11 07:56:46,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [702.0, 1000.0, 358.0, 1000.0, 398.0, 582.0, 1000.0, 883.0, 1000.0, 1000.0]
2025-05-11 07:56:46,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (3783.46) for latency MM1Queue_a033_s075
2025-05-11 07:56:46,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 07:56:46,140 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 07:56:46,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 51 minutes, 41 seconds)
2025-05-11 08:01:01,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:01:20,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2877.82666 ± 1540.488
2025-05-11 08:01:20,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [3295.3218, 2473.283, 1643.1599, 4782.7188, 2775.3767, 4806.4746, 796.18317, 4687.149, 275.92905, 3242.668]
2025-05-11 08:01:20,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [695.0, 518.0, 329.0, 1000.0, 591.0, 1000.0, 161.0, 1000.0, 56.0, 670.0]
2025-05-11 08:01:20,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 47 minutes, 50 seconds)
2025-05-11 08:06:02,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:06:17,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2463.28857 ± 1498.567
2025-05-11 08:06:17,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2796.866, 1010.3653, 1783.7339, 192.09404, 4686.1963, 1760.832, 4129.9956, 2237.0046, 4711.353, 1324.446]
2025-05-11 08:06:17,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [604.0, 202.0, 358.0, 37.0, 1000.0, 347.0, 864.0, 439.0, 1000.0, 291.0]
2025-05-11 08:06:17,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 42 minutes, 52 seconds)
2025-05-11 08:10:05,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:10:23,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2927.58081 ± 1945.204
2025-05-11 08:10:23,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [4764.356, 2118.8018, 4810.348, 2540.8647, 512.35675, 4686.4233, 204.24452, 209.95708, 4752.206, 4676.249]
2025-05-11 08:10:23,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 422.0, 1000.0, 504.0, 97.0, 1000.0, 40.0, 41.0, 1000.0, 966.0]
2025-05-11 08:10:23,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 37 minutes, 27 seconds)
2025-05-11 08:14:42,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:14:55,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2180.18970 ± 1340.107
2025-05-11 08:14:55,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1710.8196, 3045.2283, 3201.6863, 2563.5544, 728.8529, 4924.0767, 1583.9504, 885.8264, 2884.8425, 273.05853]
2025-05-11 08:14:55,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [335.0, 619.0, 651.0, 521.0, 147.0, 1000.0, 303.0, 171.0, 579.0, 56.0]
2025-05-11 08:14:55,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 30 minutes, 47 seconds)
2025-05-11 08:19:25,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:19:39,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2419.63403 ± 1610.445
2025-05-11 08:19:39,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [3676.7097, 1076.4414, 526.11896, 742.08124, 523.25494, 2574.358, 2098.3958, 4768.8906, 4888.539, 3321.5515]
2025-05-11 08:19:39,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [765.0, 219.0, 97.0, 155.0, 95.0, 515.0, 456.0, 1000.0, 1000.0, 664.0]
2025-05-11 08:19:39,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 26 minutes, 29 seconds)
2025-05-11 08:23:49,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:24:08,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3156.86475 ± 1683.378
2025-05-11 08:24:08,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2172.322, 4694.339, 678.3649, 4755.247, 4769.643, 1432.5836, 4716.1562, 4833.0854, 2647.3455, 869.5581]
2025-05-11 08:24:08,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [425.0, 981.0, 126.0, 1000.0, 1000.0, 300.0, 1000.0, 1000.0, 547.0, 182.0]
2025-05-11 08:24:08,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 21 minutes, 24 seconds)
2025-05-11 08:28:09,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:28:30,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3232.64233 ± 1774.538
2025-05-11 08:28:30,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [4725.0137, 4430.33, 4707.36, 1187.7692, 1831.4991, 4633.9985, 4734.242, 514.0324, 4716.4814, 845.6976]
2025-05-11 08:28:30,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 906.0, 1000.0, 247.0, 349.0, 1000.0, 1000.0, 96.0, 1000.0, 175.0]
2025-05-11 08:28:30,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 13 minutes, 20 seconds)
2025-05-11 08:33:17,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:33:43,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 4036.49561 ± 976.647
2025-05-11 08:33:43,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [4640.4688, 4682.2285, 4700.336, 4684.86, 2990.7222, 1728.5305, 4304.333, 4694.1343, 4663.238, 3276.1038]
2025-05-11 08:33:43,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 614.0, 381.0, 883.0, 1000.0, 1000.0, 694.0]
2025-05-11 08:33:43,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (4036.50) for latency MM1Queue_a033_s075
2025-05-11 08:33:43,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 08:33:43,791 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 08:33:43,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 15 minutes, 21 seconds)
2025-05-11 08:37:49,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:37:59,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1788.09644 ± 1230.429
2025-05-11 08:37:59,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2506.049, 929.7266, 2200.443, 514.77875, 1719.4948, 509.2213, 4738.2705, 653.5158, 2456.4324, 1653.0327]
2025-05-11 08:37:59,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [487.0, 216.0, 441.0, 107.0, 361.0, 119.0, 1000.0, 129.0, 489.0, 340.0]
2025-05-11 08:37:59,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 2 hours, 9 minutes, 11 seconds)
2025-05-11 08:42:40,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:43:05,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3831.18042 ± 954.172
2025-05-11 08:43:05,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2011.056, 3542.7336, 4717.0557, 3407.4897, 4732.438, 4708.0903, 2368.304, 4420.8896, 3754.5042, 4649.242]
2025-05-11 08:43:05,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [422.0, 737.0, 1000.0, 704.0, 1000.0, 1000.0, 517.0, 904.0, 796.0, 1000.0]
2025-05-11 08:43:05,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 2 hours, 6 minutes, 29 seconds)
2025-05-11 08:47:09,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:47:31,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3504.79492 ± 1460.859
2025-05-11 08:47:31,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [4669.9946, 4597.1084, 4015.2048, 1413.0934, 4178.8535, 1066.5092, 4740.937, 1427.266, 4626.7974, 4312.1846]
2025-05-11 08:47:31,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 833.0, 297.0, 905.0, 223.0, 1000.0, 270.0, 1000.0, 905.0]
2025-05-11 08:47:31,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 2 hours, 1 minute, 36 seconds)
2025-05-11 08:52:02,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:52:21,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2945.42725 ± 1676.826
2025-05-11 08:52:21,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [3299.5034, 4758.7046, 4703.3057, 4808.8853, 1784.6011, 4774.461, 1113.2164, 614.31067, 2805.4395, 791.84296]
2025-05-11 08:52:21,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [685.0, 1000.0, 1000.0, 1000.0, 382.0, 1000.0, 227.0, 112.0, 564.0, 161.0]
2025-05-11 08:52:21,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 59 minutes, 14 seconds)
2025-05-11 08:56:44,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:57:04,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3295.38867 ± 1622.446
2025-05-11 08:57:04,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [782.3078, 1001.7503, 4623.83, 4650.664, 4698.6055, 1058.4802, 4763.609, 3102.2593, 3604.313, 4668.0654]
2025-05-11 08:57:04,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [149.0, 191.0, 1000.0, 1000.0, 1000.0, 221.0, 1000.0, 632.0, 742.0, 1000.0]
2025-05-11 08:57:04,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 52 minutes, 5 seconds)
2025-05-11 09:01:26,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:01:44,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2957.12305 ± 1471.953
2025-05-11 09:01:44,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [4962.0684, 2032.4281, 1265.067, 4253.197, 2376.0034, 4953.4062, 2178.061, 3107.9329, 3984.204, 458.863]
2025-05-11 09:01:44,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 414.0, 255.0, 875.0, 457.0, 1000.0, 438.0, 616.0, 802.0, 88.0]
2025-05-11 09:01:44,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 49 minutes, 15 seconds)
2025-05-11 09:06:11,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:06:36,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3901.13135 ± 1179.483
2025-05-11 09:06:36,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [5013.819, 4796.0615, 4861.3716, 1936.899, 4285.3306, 2974.3013, 4856.1157, 1991.2599, 3310.1113, 4986.0376]
2025-05-11 09:06:36,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 395.0, 852.0, 604.0, 1000.0, 393.0, 683.0, 1000.0]
2025-05-11 09:06:36,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 43 minutes, 29 seconds)
2025-05-11 09:10:48,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:11:02,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2266.08105 ± 1608.347
2025-05-11 09:11:02,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [626.8419, 1270.9652, 2693.866, 4970.5034, 456.21326, 1028.8456, 904.3405, 4843.267, 2432.2761, 3433.694]
2025-05-11 09:11:02,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 248.0, 539.0, 1000.0, 85.0, 200.0, 182.0, 1000.0, 499.0, 689.0]
2025-05-11 09:11:02,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 38 minutes, 43 seconds)
2025-05-11 09:15:22,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:15:47,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3724.00073 ± 1393.437
2025-05-11 09:15:47,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [4961.6685, 4846.016, 4919.4697, 1741.1525, 3774.5295, 4836.787, 806.34296, 4743.0273, 3553.336, 3057.6787]
2025-05-11 09:15:47,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 345.0, 744.0, 1000.0, 157.0, 1000.0, 720.0, 635.0]
2025-05-11 09:15:47,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 33 minutes, 42 seconds)
2025-05-11 09:20:31,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:20:51,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3259.27686 ± 1556.769
2025-05-11 09:20:51,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [4738.273, 354.0748, 4940.1753, 5143.934, 3177.76, 3238.4382, 2351.8489, 4782.4126, 2462.7593, 1403.0913]
2025-05-11 09:20:51,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 69.0, 1000.0, 1000.0, 673.0, 645.0, 460.0, 1000.0, 508.0, 266.0]
2025-05-11 09:20:51,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 30 minutes, 21 seconds)
2025-05-11 09:25:14,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:25:31,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2830.37744 ± 1619.877
2025-05-11 09:25:31,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1309.2814, 4679.9463, 4710.174, 4756.3022, 4812.1997, 2722.8188, 1758.6978, 1270.0979, 1000.6434, 1283.6129]
2025-05-11 09:25:31,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [273.0, 1000.0, 1000.0, 1000.0, 1000.0, 552.0, 373.0, 252.0, 195.0, 261.0]
2025-05-11 09:25:31,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 25 minutes, 36 seconds)
2025-05-11 09:29:37,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:29:49,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2122.45776 ± 969.701
2025-05-11 09:29:49,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1170.6058, 1258.739, 4025.6177, 2664.516, 712.5423, 2505.8137, 1246.1095, 2039.0592, 2852.2454, 2749.3308]
2025-05-11 09:29:49,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [248.0, 256.0, 831.0, 551.0, 136.0, 525.0, 269.0, 423.0, 574.0, 548.0]
2025-05-11 09:29:49,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 18 minutes, 55 seconds)
2025-05-11 09:34:08,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:34:26,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3123.35767 ± 1559.594
2025-05-11 09:34:26,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [4810.429, 3345.3992, 981.35126, 1676.6866, 2415.9817, 4684.872, 2512.7427, 4897.7925, 4982.628, 925.6938]
2025-05-11 09:34:26,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 670.0, 192.0, 318.0, 489.0, 955.0, 499.0, 1000.0, 1000.0, 187.0]
2025-05-11 09:34:26,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 14 minutes, 53 seconds)
2025-05-11 09:38:41,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:39:01,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3136.58960 ± 1410.013
2025-05-11 09:39:01,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [4752.7227, 4710.5605, 2052.437, 2660.3186, 4717.7344, 2881.337, 842.28235, 1692.2711, 4786.7314, 2269.5015]
2025-05-11 09:39:01,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 400.0, 521.0, 1000.0, 612.0, 158.0, 338.0, 1000.0, 478.0]
2025-05-11 09:39:01,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 9 minutes, 43 seconds)
2025-05-11 09:43:12,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:43:36,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3577.69653 ± 1471.982
2025-05-11 09:43:36,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1329.3181, 1280.2839, 3392.9004, 1661.2719, 4233.6943, 4873.338, 4733.674, 4714.856, 4737.8364, 4819.792]
2025-05-11 09:43:36,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [279.0, 268.0, 694.0, 341.0, 869.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 09:43:36,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 1 hour, 3 minutes, 41 seconds)
2025-05-11 09:48:14,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:48:31,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2745.96143 ± 1657.510
2025-05-11 09:48:31,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [4738.6694, 4172.5757, 4715.2026, 2497.1487, 1998.6321, 454.4439, 2088.6277, 4921.3716, 985.93805, 887.00336]
2025-05-11 09:48:31,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 889.0, 1000.0, 518.0, 409.0, 83.0, 445.0, 1000.0, 188.0, 188.0]
2025-05-11 09:48:31,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 59 minutes, 47 seconds)
2025-05-11 09:52:42,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:53:02,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3244.86426 ± 1703.832
2025-05-11 09:53:02,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [620.288, 5021.935, 4713.3926, 1063.314, 5006.5513, 1712.3169, 4723.422, 4794.1255, 1853.1385, 2940.1606]
2025-05-11 09:53:02,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 1000.0, 1000.0, 201.0, 1000.0, 336.0, 1000.0, 1000.0, 358.0, 573.0]
2025-05-11 09:53:02,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 55 minutes, 44 seconds)
2025-05-11 09:57:18,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:57:42,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3764.37231 ± 1532.553
2025-05-11 09:57:42,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [4824.0093, 619.22516, 4755.7563, 4712.778, 4775.604, 4650.5254, 1517.8257, 2370.1519, 4768.262, 4649.587]
2025-05-11 09:57:42,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 116.0, 1000.0, 1000.0, 1000.0, 1000.0, 302.0, 482.0, 1000.0, 1000.0]
2025-05-11 09:57:42,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 51 minutes, 12 seconds)
2025-05-11 10:01:57,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:02:21,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3548.02222 ± 1464.542
2025-05-11 10:02:21,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [4719.622, 4706.024, 2649.1028, 4662.6855, 3470.407, 4741.352, 762.46716, 4875.576, 3732.2166, 1160.7688]
2025-05-11 10:02:21,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 553.0, 1000.0, 714.0, 1000.0, 149.0, 1000.0, 782.0, 232.0]
2025-05-11 10:02:21,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 46 minutes, 39 seconds)
2025-05-11 10:07:01,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:07:14,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2219.22119 ± 1579.198
2025-05-11 10:07:14,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [4455.183, 3839.4155, 2875.8955, 551.16327, 1036.9097, 4925.8335, 1510.5712, 745.03253, 731.3638, 1520.8431]
2025-05-11 10:07:14,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [877.0, 802.0, 597.0, 103.0, 195.0, 1000.0, 313.0, 138.0, 140.0, 301.0]
2025-05-11 10:07:14,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 42 minutes, 33 seconds)
2025-05-11 10:11:30,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:11:51,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3129.69409 ± 1796.089
2025-05-11 10:11:51,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [4772.892, 4828.7407, 4806.923, 450.02493, 2703.3655, 1532.8234, 4903.993, 2001.9281, 4776.8813, 519.36835]
2025-05-11 10:11:51,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 83.0, 540.0, 301.0, 1000.0, 403.0, 1000.0, 98.0]
2025-05-11 10:11:51,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 37 minutes, 19 seconds)
2025-05-11 10:16:03,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:16:21,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2930.84692 ± 1738.111
2025-05-11 10:16:21,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [4366.809, 4698.312, 2780.4612, 270.84848, 4865.2554, 742.499, 4843.4165, 3421.996, 2723.2146, 595.6565]
2025-05-11 10:16:21,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [886.0, 1000.0, 553.0, 59.0, 1000.0, 139.0, 1000.0, 710.0, 560.0, 115.0]
2025-05-11 10:16:21,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 32 minutes, 37 seconds)
2025-05-11 10:21:02,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:21:29,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 4158.16309 ± 981.224
2025-05-11 10:21:29,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [4844.366, 3901.912, 2725.3733, 4894.7437, 2163.247, 3522.631, 4882.779, 4887.817, 4793.7603, 4965.006]
2025-05-11 10:21:29,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 788.0, 558.0, 1000.0, 455.0, 733.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 10:21:29,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (4158.16) for latency MM1Queue_a033_s075
2025-05-11 10:21:29,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 10:21:29,155 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 10:21:29,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 28 minutes, 31 seconds)
2025-05-11 10:25:33,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:25:54,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3553.83203 ± 1648.930
2025-05-11 10:25:54,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1729.2625, 4916.9277, 1629.7363, 4903.104, 4982.771, 4696.465, 1923.4913, 943.9882, 4955.786, 4856.7905]
2025-05-11 10:25:54,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [335.0, 1000.0, 324.0, 1000.0, 1000.0, 1000.0, 416.0, 186.0, 1000.0, 1000.0]
2025-05-11 10:25:54,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 23 minutes, 33 seconds)
2025-05-11 10:30:12,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:30:29,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2974.48047 ± 1709.136
2025-05-11 10:30:29,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2689.0967, 4923.4507, 751.5528, 4993.4614, 2184.3408, 1493.7545, 4983.6187, 873.3367, 1903.3337, 4948.8574]
2025-05-11 10:30:29,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [534.0, 1000.0, 146.0, 1000.0, 429.0, 285.0, 1000.0, 167.0, 370.0, 1000.0]
2025-05-11 10:30:29,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 18 minutes, 35 seconds)
2025-05-11 10:34:50,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:35:09,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3175.78491 ± 1612.802
2025-05-11 10:35:09,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2320.1501, 890.3409, 1990.6626, 2540.8318, 4595.8833, 582.056, 4546.239, 4733.4897, 4608.3394, 4949.8574]
2025-05-11 10:35:09,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [493.0, 178.0, 398.0, 512.0, 1000.0, 109.0, 962.0, 1000.0, 1000.0, 1000.0]
2025-05-11 10:35:09,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 13 minutes, 59 seconds)
2025-05-11 10:39:26,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:39:47,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3533.91919 ± 1748.586
2025-05-11 10:39:47,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2727.7253, 4878.069, 4962.1895, 137.32886, 4781.7866, 4827.3115, 2072.8748, 4922.898, 4845.9688, 1183.0409]
2025-05-11 10:39:47,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [558.0, 1000.0, 1000.0, 30.0, 1000.0, 1000.0, 429.0, 1000.0, 1000.0, 240.0]
2025-05-11 10:39:47,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 9 minutes, 22 seconds)
2025-05-11 10:43:46,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:43:59,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2263.61206 ± 1497.884
2025-05-11 10:43:59,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2300.0496, 421.7854, 4487.025, 1111.804, 4043.4812, 616.72266, 1313.6981, 2543.8962, 1301.2386, 4496.4204]
2025-05-11 10:43:59,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [464.0, 77.0, 894.0, 221.0, 829.0, 111.0, 254.0, 501.0, 253.0, 883.0]
2025-05-11 10:43:59,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 29 seconds)
2025-05-11 10:48:18,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:48:43,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 4031.93433 ± 1180.108
2025-05-11 10:48:43,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [4719.718, 929.3692, 4767.7676, 4781.9, 4203.7163, 2827.616, 4749.1953, 4822.4517, 4285.9307, 4231.681]
2025-05-11 10:48:43,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 183.0, 1000.0, 1000.0, 872.0, 558.0, 1000.0, 1000.0, 862.0, 853.0]
2025-05-11 10:48:43,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1251 [DEBUG]: Training session finished
