2025-05-09 07:54:22,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16
2025-05-09 07:54:22,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16
2025-05-09 07:54:22,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x73a9f6bc5c70>}
2025-05-09 07:54:22,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1111 [DEBUG]: using device: cpu
2025-05-09 07:54:22,784 baseline-bpql-noisy-humanoid:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-05-09 07:54:22,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1133 [INFO]: Creating new trainer
2025-05-09 07:54:22,793 baseline-bpql-noisy-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=648, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-05-09 07:54:22,793 baseline-bpql-noisy-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-09 07:54:24,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1194 [DEBUG]: Starting training session...
2025-05-09 07:54:24,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 1/100
2025-05-09 07:57:53,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:57:55,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 397.65536 ± 55.162
2025-05-09 07:57:55,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [387.267, 418.9329, 351.42084, 386.97006, 422.48767, 303.8346, 519.3105, 432.33502, 351.104, 402.89124]
2025-05-09 07:57:55,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 78.0, 66.0, 73.0, 81.0, 59.0, 100.0, 80.0, 66.0, 75.0]
2025-05-09 07:57:55,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (397.66) for latency MM1Queue_a033_s075
2025-05-09 07:57:55,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 07:57:55,511 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 07:57:55,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 47 minutes, 41 seconds)
2025-05-09 08:01:47,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:01:49,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 368.30731 ± 103.451
2025-05-09 08:01:49,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [319.01614, 188.0986, 389.79535, 501.6793, 405.55545, 542.55414, 229.5221, 409.95715, 335.55148, 361.34357]
2025-05-09 08:01:49,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 38.0, 76.0, 99.0, 79.0, 114.0, 43.0, 86.0, 63.0, 73.0]
2025-05-09 08:01:49,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 3 minutes, 16 seconds)
2025-05-09 08:05:39,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:05:41,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 374.23492 ± 158.484
2025-05-09 08:05:41,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [273.13882, 320.74866, 360.27753, 465.36282, 787.5909, 454.9346, 303.47415, 195.67336, 316.33112, 264.81744]
2025-05-09 08:05:41,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [57.0, 61.0, 70.0, 90.0, 164.0, 100.0, 58.0, 41.0, 64.0, 55.0]
2025-05-09 08:05:41,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 6 hours, 4 minutes, 40 seconds)
2025-05-09 08:09:34,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:09:36,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 453.18179 ± 76.036
2025-05-09 08:09:36,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [384.06366, 416.5293, 409.3816, 460.5054, 652.00226, 435.77536, 375.96082, 478.2301, 499.77628, 419.59332]
2025-05-09 08:09:36,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 79.0, 77.0, 94.0, 122.0, 82.0, 70.0, 93.0, 92.0, 79.0]
2025-05-09 08:09:36,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (453.18) for latency MM1Queue_a033_s075
2025-05-09 08:09:36,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 08:09:36,981 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 08:09:36,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 6 hours, 4 minutes, 52 seconds)
2025-05-09 08:13:29,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:13:31,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 460.61554 ± 150.761
2025-05-09 08:13:31,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [374.91245, 732.4128, 548.8767, 602.5092, 538.2372, 362.73917, 537.99, 383.04886, 340.842, 184.58708]
2025-05-09 08:13:31,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 141.0, 104.0, 116.0, 102.0, 70.0, 103.0, 72.0, 64.0, 36.0]
2025-05-09 08:13:31,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (460.62) for latency MM1Queue_a033_s075
2025-05-09 08:13:31,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 08:13:31,828 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 08:13:31,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 6 hours, 3 minutes, 13 seconds)
2025-05-09 08:17:23,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:17:25,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 380.32239 ± 175.699
2025-05-09 08:17:25,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [431.33633, 361.1858, 382.65103, 776.75964, 142.65932, 118.91646, 459.47894, 420.38617, 269.52557, 440.32462]
2025-05-09 08:17:25,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 69.0, 71.0, 166.0, 28.0, 23.0, 90.0, 86.0, 53.0, 81.0]
2025-05-09 08:17:25,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 6 hours, 6 minutes, 30 seconds)
2025-05-09 08:21:18,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:21:20,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 410.65997 ± 130.378
2025-05-09 08:21:20,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [421.81998, 160.85184, 538.1518, 578.02094, 568.4998, 353.97488, 325.64682, 322.61417, 318.6033, 518.416]
2025-05-09 08:21:20,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 31.0, 104.0, 122.0, 105.0, 75.0, 69.0, 70.0, 70.0, 98.0]
2025-05-09 08:21:20,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 6 hours, 2 minutes, 53 seconds)
2025-05-09 08:25:13,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:25:15,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 416.74945 ± 110.293
2025-05-09 08:25:15,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [415.87677, 327.03043, 393.933, 320.90094, 346.29526, 491.03613, 678.3246, 370.25266, 518.6337, 305.21082]
2025-05-09 08:25:15,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 70.0, 80.0, 68.0, 74.0, 90.0, 127.0, 72.0, 106.0, 67.0]
2025-05-09 08:25:15,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 6 hours, 8 seconds)
2025-05-09 08:29:11,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:29:13,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 490.90250 ± 99.208
2025-05-09 08:29:13,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [518.99005, 431.074, 460.40988, 429.19623, 422.47073, 484.1965, 480.73645, 484.5787, 423.74194, 773.6303]
2025-05-09 08:29:13,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 81.0, 86.0, 79.0, 78.0, 99.0, 92.0, 91.0, 81.0, 144.0]
2025-05-09 08:29:13,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (490.90) for latency MM1Queue_a033_s075
2025-05-09 08:29:13,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 08:29:13,471 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 08:29:13,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 56 minutes, 52 seconds)
2025-05-09 08:33:08,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:33:10,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 392.41364 ± 77.123
2025-05-09 08:33:10,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [487.80286, 317.46802, 445.05905, 388.74603, 496.47183, 376.40552, 372.52753, 415.48816, 404.44513, 219.72227]
2025-05-09 08:33:10,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 61.0, 83.0, 85.0, 93.0, 83.0, 71.0, 90.0, 76.0, 42.0]
2025-05-09 08:33:10,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 53 minutes, 28 seconds)
2025-05-09 08:37:06,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:37:08,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 464.29428 ± 88.774
2025-05-09 08:37:08,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [452.40585, 518.91376, 396.83887, 586.2477, 367.63425, 582.9424, 503.62003, 366.76144, 533.5869, 333.99182]
2025-05-09 08:37:08,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 97.0, 74.0, 111.0, 68.0, 127.0, 108.0, 70.0, 100.0, 61.0]
2025-05-09 08:37:08,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 51 minutes, 5 seconds)
2025-05-09 08:41:01,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:41:04,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 520.87903 ± 130.308
2025-05-09 08:41:04,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [756.9554, 612.7695, 495.15668, 396.9641, 527.1129, 387.46024, 712.314, 363.24628, 536.14166, 420.66928]
2025-05-09 08:41:04,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 114.0, 105.0, 73.0, 100.0, 83.0, 138.0, 77.0, 99.0, 90.0]
2025-05-09 08:41:04,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (520.88) for latency MM1Queue_a033_s075
2025-05-09 08:41:04,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 08:41:04,350 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 08:41:04,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 47 minutes, 21 seconds)
2025-05-09 08:45:01,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:45:03,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 507.46436 ± 97.562
2025-05-09 08:45:03,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [441.77957, 717.52625, 507.86407, 591.7987, 561.4164, 550.3703, 454.5734, 437.38324, 348.66525, 463.26642]
2025-05-09 08:45:03,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 149.0, 99.0, 111.0, 105.0, 114.0, 96.0, 85.0, 66.0, 87.0]
2025-05-09 08:45:03,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 44 minutes, 28 seconds)
2025-05-09 08:48:57,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:48:59,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 500.30884 ± 171.103
2025-05-09 08:48:59,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [459.14853, 556.88715, 342.64502, 986.68005, 435.872, 444.2374, 400.07785, 437.78424, 428.95993, 510.7965]
2025-05-09 08:48:59,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 103.0, 66.0, 186.0, 81.0, 84.0, 76.0, 83.0, 79.0, 107.0]
2025-05-09 08:48:59,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 40 minutes, 6 seconds)
2025-05-09 08:52:56,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:52:58,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 456.18280 ± 113.727
2025-05-09 08:52:58,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [545.0681, 428.3811, 715.2613, 418.2754, 405.7627, 478.83566, 469.65265, 467.56595, 246.78632, 386.23938]
2025-05-09 08:52:58,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 78.0, 135.0, 87.0, 86.0, 89.0, 97.0, 87.0, 48.0, 73.0]
2025-05-09 08:52:58,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 36 minutes, 40 seconds)
2025-05-09 08:56:53,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:56:55,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 481.88672 ± 233.248
2025-05-09 08:56:55,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [135.51097, 370.09094, 434.0735, 372.05423, 505.65854, 418.4704, 1107.7462, 511.07922, 475.62396, 488.55917]
2025-05-09 08:56:55,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 68.0, 81.0, 72.0, 98.0, 81.0, 210.0, 93.0, 89.0, 90.0]
2025-05-09 08:56:55,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 32 minutes, 24 seconds)
2025-05-09 09:00:52,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:00:54,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 508.15985 ± 167.275
2025-05-09 09:00:54,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [449.06894, 425.96924, 447.6945, 447.74814, 957.9669, 431.67847, 372.7778, 657.4372, 498.77402, 392.48355]
2025-05-09 09:00:54,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 81.0, 86.0, 86.0, 180.0, 79.0, 71.0, 120.0, 94.0, 73.0]
2025-05-09 09:00:54,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 29 minutes, 16 seconds)
2025-05-09 09:04:49,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:04:51,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 463.04816 ± 134.653
2025-05-09 09:04:51,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [528.0726, 395.11273, 391.95416, 598.0492, 426.98083, 512.11755, 492.67838, 733.489, 248.60144, 303.42566]
2025-05-09 09:04:51,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 74.0, 79.0, 113.0, 79.0, 95.0, 92.0, 141.0, 51.0, 65.0]
2025-05-09 09:04:51,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 24 minutes, 35 seconds)
2025-05-09 09:08:50,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:08:52,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 421.11612 ± 151.553
2025-05-09 09:08:52,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [655.93567, 401.89227, 366.7997, 364.8513, 418.18655, 327.81387, 616.75555, 124.51789, 347.0977, 587.3105]
2025-05-09 09:08:52,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 85.0, 77.0, 78.0, 91.0, 70.0, 117.0, 24.0, 74.0, 125.0]
2025-05-09 09:08:52,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 22 minutes)
2025-05-09 09:12:45,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:12:48,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 545.63226 ± 89.356
2025-05-09 09:12:48,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [750.61273, 472.22122, 525.4119, 440.41257, 513.68933, 494.64478, 507.48154, 513.51666, 574.2325, 664.0998]
2025-05-09 09:12:48,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 89.0, 95.0, 82.0, 97.0, 109.0, 95.0, 95.0, 106.0, 122.0]
2025-05-09 09:12:48,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (545.63) for latency MM1Queue_a033_s075
2025-05-09 09:12:48,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 09:12:48,208 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 09:12:48,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 17 minutes, 17 seconds)
2025-05-09 09:16:44,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:16:46,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 415.44815 ± 139.748
2025-05-09 09:16:46,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [400.23398, 443.5959, 620.5247, 166.09254, 531.09314, 155.10283, 438.1522, 484.8586, 461.1059, 453.7219]
2025-05-09 09:16:46,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 82.0, 133.0, 32.0, 98.0, 30.0, 82.0, 92.0, 86.0, 87.0]
2025-05-09 09:16:46,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 13 minutes, 38 seconds)
2025-05-09 09:20:42,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:20:44,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 450.69672 ± 113.274
2025-05-09 09:20:44,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [330.6145, 443.7983, 437.75317, 769.5783, 420.49832, 455.72128, 424.55487, 458.54605, 359.19595, 406.7067]
2025-05-09 09:20:44,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 82.0, 82.0, 153.0, 85.0, 84.0, 91.0, 90.0, 76.0, 75.0]
2025-05-09 09:20:44,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 9 minutes, 24 seconds)
2025-05-09 09:24:41,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:24:43,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 486.46475 ± 95.470
2025-05-09 09:24:43,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [439.89502, 357.68616, 433.39508, 481.0625, 418.5413, 633.3661, 371.84525, 518.239, 583.4147, 627.2025]
2025-05-09 09:24:43,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 66.0, 82.0, 89.0, 80.0, 134.0, 70.0, 97.0, 108.0, 118.0]
2025-05-09 09:24:43,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 6 minutes, 1 second)
2025-05-09 09:28:39,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:28:42,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 570.61182 ± 101.681
2025-05-09 09:28:42,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [559.0005, 505.33832, 510.09064, 646.4534, 476.03015, 414.4313, 761.4234, 709.9272, 573.22125, 550.2023]
2025-05-09 09:28:42,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 94.0, 94.0, 122.0, 91.0, 78.0, 164.0, 154.0, 108.0, 104.0]
2025-05-09 09:28:42,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (570.61) for latency MM1Queue_a033_s075
2025-05-09 09:28:42,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 09:28:42,532 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 09:28:42,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 1 minute, 28 seconds)
2025-05-09 09:32:38,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:32:41,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 619.47937 ± 205.045
2025-05-09 09:32:41,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [568.516, 770.86066, 460.94473, 629.2774, 493.9389, 570.6214, 561.15094, 1140.9039, 345.39673, 653.18317]
2025-05-09 09:32:41,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 144.0, 85.0, 130.0, 92.0, 105.0, 105.0, 241.0, 67.0, 122.0]
2025-05-09 09:32:41,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (619.48) for latency MM1Queue_a033_s075
2025-05-09 09:32:41,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 09:32:41,938 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 09:32:41,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 58 minutes, 26 seconds)
2025-05-09 09:36:38,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:36:41,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 542.75061 ± 93.438
2025-05-09 09:36:41,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [754.8309, 433.21744, 499.23648, 497.61774, 528.5369, 637.423, 472.77893, 478.67227, 501.57858, 623.61414]
2025-05-09 09:36:41,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 81.0, 110.0, 96.0, 98.0, 118.0, 101.0, 88.0, 92.0, 129.0]
2025-05-09 09:36:41,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 54 minutes, 37 seconds)
2025-05-09 09:40:37,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:40:40,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 642.41895 ± 91.592
2025-05-09 09:40:40,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [653.0562, 428.66995, 533.7191, 695.60675, 738.8367, 631.04626, 676.6385, 732.2209, 709.36285, 625.0322]
2025-05-09 09:40:40,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 81.0, 111.0, 131.0, 139.0, 122.0, 132.0, 140.0, 133.0, 126.0]
2025-05-09 09:40:40,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (642.42) for latency MM1Queue_a033_s075
2025-05-09 09:40:40,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 09:40:40,900 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 09:40:40,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 51 minutes, 6 seconds)
2025-05-09 09:44:37,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:44:39,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 554.10486 ± 87.304
2025-05-09 09:44:39,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [507.12158, 445.69986, 509.86035, 467.4852, 617.5145, 447.5069, 663.8441, 686.4858, 642.68634, 552.8438]
2025-05-09 09:44:39,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 98.0, 95.0, 101.0, 122.0, 84.0, 125.0, 125.0, 138.0, 104.0]
2025-05-09 09:44:39,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 47 minutes, 7 seconds)
2025-05-09 09:48:36,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:48:39,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 581.33630 ± 133.639
2025-05-09 09:48:39,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [531.4095, 546.26624, 404.48312, 487.39505, 638.6239, 622.4915, 667.8974, 859.587, 387.7781, 667.4311]
2025-05-09 09:48:39,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 114.0, 89.0, 90.0, 126.0, 131.0, 128.0, 180.0, 86.0, 121.0]
2025-05-09 09:48:39,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 43 minutes, 10 seconds)
2025-05-09 09:52:34,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:52:37,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 664.46564 ± 234.075
2025-05-09 09:52:37,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [454.92227, 625.6835, 864.2878, 879.00494, 1051.8486, 562.30164, 410.89294, 470.94052, 384.6481, 940.1262]
2025-05-09 09:52:37,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 119.0, 180.0, 181.0, 197.0, 104.0, 78.0, 89.0, 71.0, 183.0]
2025-05-09 09:52:37,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (664.47) for latency MM1Queue_a033_s075
2025-05-09 09:52:37,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 09:52:37,885 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 09:52:37,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 39 minutes, 3 seconds)
2025-05-09 09:56:36,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:56:39,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 645.23547 ± 155.946
2025-05-09 09:56:39,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [522.3603, 760.1607, 422.2467, 835.28906, 955.61334, 588.3709, 693.0377, 517.92865, 611.22906, 546.11835]
2025-05-09 09:56:39,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 147.0, 77.0, 162.0, 201.0, 124.0, 135.0, 96.0, 128.0, 98.0]
2025-05-09 09:56:39,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 35 minutes, 31 seconds)
2025-05-09 10:00:34,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:00:37,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 544.19440 ± 164.016
2025-05-09 10:00:37,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [463.62347, 622.4643, 746.1297, 477.46036, 536.6266, 606.6663, 570.969, 766.3844, 150.11833, 501.50146]
2025-05-09 10:00:37,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 116.0, 156.0, 87.0, 110.0, 116.0, 104.0, 145.0, 29.0, 108.0]
2025-05-09 10:00:37,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 31 minutes, 14 seconds)
2025-05-09 10:04:35,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:04:38,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 657.26697 ± 146.034
2025-05-09 10:04:38,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [690.53076, 878.2935, 396.5699, 600.6439, 599.7758, 726.24634, 532.38293, 553.1713, 701.8958, 893.159]
2025-05-09 10:04:38,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 168.0, 73.0, 130.0, 109.0, 154.0, 102.0, 102.0, 148.0, 173.0]
2025-05-09 10:04:38,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 27 minutes, 46 seconds)
2025-05-09 10:08:36,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:08:39,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 620.18378 ± 208.216
2025-05-09 10:08:39,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [588.4423, 355.9822, 798.7481, 580.74896, 353.6598, 1033.2981, 447.77988, 594.6005, 590.9899, 857.5883]
2025-05-09 10:08:39,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 67.0, 151.0, 128.0, 75.0, 218.0, 82.0, 111.0, 124.0, 170.0]
2025-05-09 10:08:39,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 24 minutes)
2025-05-09 10:12:36,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:12:38,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 599.55261 ± 107.971
2025-05-09 10:12:38,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [638.53296, 433.41302, 750.919, 559.5374, 716.40564, 568.11774, 581.5857, 413.30014, 712.8277, 620.8869]
2025-05-09 10:12:38,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 80.0, 139.0, 106.0, 149.0, 106.0, 108.0, 89.0, 135.0, 117.0]
2025-05-09 10:12:38,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 20 minutes, 13 seconds)
2025-05-09 10:16:33,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:16:36,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 562.31970 ± 136.217
2025-05-09 10:16:36,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [669.4698, 728.89575, 556.97723, 446.65567, 731.77216, 450.6448, 517.6505, 735.31726, 427.5755, 358.23785]
2025-05-09 10:16:36,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 136.0, 105.0, 98.0, 142.0, 94.0, 104.0, 136.0, 81.0, 77.0]
2025-05-09 10:16:36,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 15 minutes, 21 seconds)
2025-05-09 10:20:33,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:20:36,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 687.64783 ± 177.334
2025-05-09 10:20:36,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [723.54913, 1108.7874, 741.9032, 533.9856, 806.74335, 628.0427, 544.8892, 617.20215, 435.0254, 736.35016]
2025-05-09 10:20:36,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 231.0, 138.0, 99.0, 171.0, 121.0, 104.0, 133.0, 86.0, 132.0]
2025-05-09 10:20:36,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (687.65) for latency MM1Queue_a033_s075
2025-05-09 10:20:36,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 10:20:36,959 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 10:20:36,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 11 minutes, 52 seconds)
2025-05-09 10:24:33,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:24:35,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 506.63306 ± 199.817
2025-05-09 10:24:35,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [160.23741, 696.06573, 492.39426, 130.6592, 535.16394, 553.5499, 684.8139, 490.15625, 758.4464, 564.84314]
2025-05-09 10:24:35,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 132.0, 89.0, 25.0, 100.0, 104.0, 128.0, 89.0, 154.0, 123.0]
2025-05-09 10:24:35,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 7 minutes, 19 seconds)
2025-05-09 10:28:31,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:28:34,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 670.78955 ± 173.761
2025-05-09 10:28:34,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [960.6972, 629.5489, 656.7273, 842.27673, 446.5762, 646.6076, 573.1759, 643.3085, 904.3929, 404.58423]
2025-05-09 10:28:34,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [180.0, 122.0, 122.0, 154.0, 86.0, 127.0, 108.0, 135.0, 164.0, 78.0]
2025-05-09 10:28:34,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 3 minutes, 9 seconds)
2025-05-09 10:32:33,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:32:36,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 711.56348 ± 154.300
2025-05-09 10:32:36,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [505.89276, 937.8078, 647.0281, 755.7604, 711.9744, 568.35675, 879.0983, 942.1484, 538.6301, 628.93744]
2025-05-09 10:32:36,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 171.0, 121.0, 145.0, 139.0, 126.0, 165.0, 181.0, 108.0, 122.0]
2025-05-09 10:32:36,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (711.56) for latency MM1Queue_a033_s075
2025-05-09 10:32:36,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 10:32:37,002 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 10:32:37,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 59 minutes, 36 seconds)
2025-05-09 10:36:31,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:36:35,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 794.80518 ± 234.236
2025-05-09 10:36:35,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1317.7795, 712.39374, 699.06396, 525.00214, 708.8571, 825.25446, 928.4064, 1059.7192, 622.4559, 549.11926]
2025-05-09 10:36:35,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [255.0, 146.0, 145.0, 102.0, 141.0, 170.0, 189.0, 191.0, 119.0, 104.0]
2025-05-09 10:36:35,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (794.81) for latency MM1Queue_a033_s075
2025-05-09 10:36:35,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 10:36:35,019 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 10:36:35,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 55 minutes, 44 seconds)
2025-05-09 10:40:31,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:40:34,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 695.19910 ± 178.416
2025-05-09 10:40:34,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [522.478, 805.8695, 898.243, 747.125, 409.27066, 739.42163, 987.342, 678.90015, 445.25635, 718.0851]
2025-05-09 10:40:34,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 151.0, 160.0, 141.0, 77.0, 131.0, 182.0, 135.0, 83.0, 132.0]
2025-05-09 10:40:34,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 51 minutes, 34 seconds)
2025-05-09 10:44:31,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:44:35,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 779.23029 ± 296.309
2025-05-09 10:44:35,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [511.81003, 1218.4438, 602.4836, 482.9833, 952.7441, 943.85364, 461.8408, 1055.0411, 1142.4816, 420.6211]
2025-05-09 10:44:35,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 247.0, 128.0, 89.0, 184.0, 199.0, 87.0, 203.0, 231.0, 80.0]
2025-05-09 10:44:35,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 47 minutes, 53 seconds)
2025-05-09 10:48:33,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:48:38,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 906.85626 ± 151.470
2025-05-09 10:48:38,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [664.67554, 796.8529, 938.0481, 910.2584, 952.29816, 1125.2666, 751.6524, 1049.7764, 1114.8418, 764.8928]
2025-05-09 10:48:38,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 154.0, 175.0, 172.0, 183.0, 221.0, 136.0, 200.0, 228.0, 143.0]
2025-05-09 10:48:38,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (906.86) for latency MM1Queue_a033_s075
2025-05-09 10:48:38,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 10:48:38,416 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 10:48:38,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 44 minutes, 39 seconds)
2025-05-09 10:52:31,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:52:35,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 736.31024 ± 177.004
2025-05-09 10:52:35,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [911.9701, 793.90186, 961.3836, 502.69788, 669.71967, 1009.61597, 721.6994, 441.0992, 677.98145, 673.0332]
2025-05-09 10:52:35,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 143.0, 188.0, 90.0, 133.0, 192.0, 151.0, 84.0, 138.0, 128.0]
2025-05-09 10:52:35,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 39 minutes, 41 seconds)
2025-05-09 10:56:32,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:56:35,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 717.67163 ± 183.531
2025-05-09 10:56:35,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [984.27057, 707.1238, 1105.9652, 650.9788, 424.28854, 640.9105, 605.52484, 723.36066, 647.7989, 686.49445]
2025-05-09 10:56:35,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [203.0, 147.0, 201.0, 129.0, 78.0, 117.0, 127.0, 148.0, 135.0, 149.0]
2025-05-09 10:56:35,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 36 minutes, 8 seconds)
2025-05-09 11:00:30,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:00:35,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 995.20978 ± 228.496
2025-05-09 11:00:35,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1322.1925, 861.6002, 1220.9319, 845.91534, 739.9991, 755.8879, 895.7463, 1184.7938, 791.596, 1333.4352]
2025-05-09 11:00:35,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [247.0, 184.0, 220.0, 151.0, 138.0, 157.0, 172.0, 239.0, 162.0, 249.0]
2025-05-09 11:00:35,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (995.21) for latency MM1Queue_a033_s075
2025-05-09 11:00:35,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 11:00:35,622 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 11:00:35,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 32 minutes, 9 seconds)
2025-05-09 11:04:32,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:04:36,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 824.81311 ± 218.477
2025-05-09 11:04:36,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [716.54736, 820.5759, 718.8139, 772.3365, 843.0191, 1444.8582, 841.4573, 590.2659, 763.58484, 736.6721]
2025-05-09 11:04:36,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 173.0, 136.0, 149.0, 170.0, 277.0, 154.0, 109.0, 143.0, 148.0]
2025-05-09 11:04:36,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 28 minutes, 11 seconds)
2025-05-09 11:08:37,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:08:41,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 930.17041 ± 218.093
2025-05-09 11:08:41,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [801.5048, 1120.8561, 844.3431, 1132.7659, 1181.5438, 909.4775, 1257.8456, 569.44635, 776.31793, 707.6025]
2025-05-09 11:08:41,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [168.0, 212.0, 156.0, 231.0, 229.0, 180.0, 240.0, 108.0, 144.0, 138.0]
2025-05-09 11:08:41,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 24 minutes, 34 seconds)
2025-05-09 11:12:37,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:12:42,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1105.19360 ± 343.506
2025-05-09 11:12:42,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [947.37585, 1734.8203, 734.8011, 1542.469, 1456.8942, 1016.08154, 662.8443, 887.1239, 1185.4978, 884.0276]
2025-05-09 11:12:42,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [177.0, 326.0, 147.0, 278.0, 262.0, 196.0, 132.0, 178.0, 208.0, 183.0]
2025-05-09 11:12:42,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (1105.19) for latency MM1Queue_a033_s075
2025-05-09 11:12:42,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 11:12:42,727 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 11:12:42,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 21 minutes, 14 seconds)
2025-05-09 11:16:37,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:16:40,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 670.14294 ± 151.292
2025-05-09 11:16:40,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [618.1487, 885.2682, 767.9067, 606.5062, 486.75778, 593.7929, 960.7673, 705.6295, 594.7703, 481.88217]
2025-05-09 11:16:40,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 156.0, 153.0, 127.0, 90.0, 129.0, 182.0, 143.0, 119.0, 91.0]
2025-05-09 11:16:40,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 16 minutes, 49 seconds)
2025-05-09 11:20:35,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:20:40,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 866.61658 ± 387.807
2025-05-09 11:20:40,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [497.10657, 630.7311, 787.0325, 1496.1165, 1455.0322, 1223.9697, 732.45447, 416.93997, 423.5122, 1003.27094]
2025-05-09 11:20:40,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 123.0, 145.0, 315.0, 281.0, 259.0, 144.0, 78.0, 77.0, 192.0]
2025-05-09 11:20:40,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 12 minutes, 43 seconds)
2025-05-09 11:24:36,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:24:41,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 932.19464 ± 256.747
2025-05-09 11:24:41,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [944.2782, 988.6357, 911.9038, 846.807, 755.3768, 524.4558, 1505.2427, 1095.4974, 1085.8478, 663.90173]
2025-05-09 11:24:41,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 175.0, 169.0, 153.0, 160.0, 113.0, 281.0, 204.0, 208.0, 140.0]
2025-05-09 11:24:41,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 8 minutes, 46 seconds)
2025-05-09 11:28:36,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:28:40,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 923.98450 ± 271.059
2025-05-09 11:28:40,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [839.0029, 490.30927, 996.5185, 795.024, 940.0761, 539.1858, 1313.7747, 991.55817, 949.93634, 1384.4589]
2025-05-09 11:28:40,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 99.0, 190.0, 149.0, 189.0, 115.0, 241.0, 183.0, 188.0, 249.0]
2025-05-09 11:28:40,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 3 minutes, 52 seconds)
2025-05-09 11:32:35,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:32:40,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1000.86914 ± 363.142
2025-05-09 11:32:40,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [750.74945, 1338.1182, 1235.2872, 1774.6018, 820.05835, 967.9775, 779.56134, 382.99844, 1087.8186, 871.52]
2025-05-09 11:32:40,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [143.0, 259.0, 251.0, 360.0, 149.0, 200.0, 138.0, 70.0, 202.0, 183.0]
2025-05-09 11:32:40,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 59 minutes, 38 seconds)
2025-05-09 11:36:39,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:36:43,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 869.90656 ± 225.545
2025-05-09 11:36:43,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [705.76025, 1024.1125, 662.11237, 1158.8344, 625.3891, 872.0715, 702.2268, 1317.0638, 678.6401, 952.85406]
2025-05-09 11:36:43,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 193.0, 137.0, 239.0, 115.0, 161.0, 137.0, 243.0, 125.0, 211.0]
2025-05-09 11:36:43,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 56 minutes, 22 seconds)
2025-05-09 11:40:41,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:40:47,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1176.41931 ± 636.235
2025-05-09 11:40:47,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1647.3789, 2418.8477, 497.1019, 428.18307, 1778.873, 832.2544, 1758.3927, 807.15674, 893.76373, 702.241]
2025-05-09 11:40:47,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [322.0, 450.0, 104.0, 79.0, 344.0, 155.0, 350.0, 159.0, 159.0, 152.0]
2025-05-09 11:40:47,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (1176.42) for latency MM1Queue_a033_s075
2025-05-09 11:40:47,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 11:40:47,131 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 11:40:47,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 52 minutes, 59 seconds)
2025-05-09 11:44:44,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:44:49,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 853.80255 ± 225.268
2025-05-09 11:44:49,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1024.2693, 1228.9092, 563.41364, 454.08182, 740.0156, 876.06134, 799.63654, 954.8091, 1105.1104, 791.71924]
2025-05-09 11:44:49,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [200.0, 253.0, 112.0, 85.0, 137.0, 161.0, 155.0, 196.0, 208.0, 158.0]
2025-05-09 11:44:49,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 49 minutes, 8 seconds)
2025-05-09 11:48:44,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:48:49,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1084.10474 ± 541.494
2025-05-09 11:48:49,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1375.8975, 928.75385, 668.05255, 1470.9781, 1181.2351, 742.79175, 2419.108, 868.5519, 770.0399, 415.6388]
2025-05-09 11:48:49,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [264.0, 188.0, 122.0, 280.0, 236.0, 130.0, 456.0, 186.0, 158.0, 80.0]
2025-05-09 11:48:50,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 45 minutes, 14 seconds)
2025-05-09 11:52:45,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:52:51,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1185.62793 ± 596.038
2025-05-09 11:52:51,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [900.0996, 1487.1592, 1057.0951, 450.66544, 973.6843, 1303.4276, 2509.248, 981.41364, 1787.8325, 405.65466]
2025-05-09 11:52:51,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [180.0, 299.0, 216.0, 85.0, 201.0, 265.0, 499.0, 191.0, 339.0, 87.0]
2025-05-09 11:52:51,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (1185.63) for latency MM1Queue_a033_s075
2025-05-09 11:52:51,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 11:52:51,742 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 11:52:51,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 41 minutes, 31 seconds)
2025-05-09 11:56:45,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:56:50,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1032.24744 ± 412.117
2025-05-09 11:56:50,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1620.3733, 699.3832, 1160.644, 1348.5719, 637.6122, 1671.2472, 667.02075, 918.0327, 408.25598, 1191.3344]
2025-05-09 11:56:50,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [330.0, 127.0, 223.0, 246.0, 136.0, 337.0, 119.0, 190.0, 75.0, 216.0]
2025-05-09 11:56:50,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 36 minutes, 56 seconds)
2025-05-09 12:00:47,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:00:51,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 855.62421 ± 323.315
2025-05-09 12:00:51,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [777.6474, 1256.1512, 548.968, 1194.7261, 1060.921, 664.0562, 1051.163, 774.2664, 150.76125, 1077.5814]
2025-05-09 12:00:51,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [143.0, 245.0, 104.0, 231.0, 202.0, 119.0, 197.0, 150.0, 29.0, 203.0]
2025-05-09 12:00:51,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 32 minutes, 35 seconds)
2025-05-09 12:04:48,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:04:53,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1071.13416 ± 466.485
2025-05-09 12:04:53,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1758.7484, 1794.3206, 802.0731, 847.54474, 1120.7891, 738.0685, 438.3624, 1659.6132, 884.2282, 667.59393]
2025-05-09 12:04:53,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [328.0, 364.0, 143.0, 173.0, 203.0, 154.0, 82.0, 307.0, 172.0, 138.0]
2025-05-09 12:04:53,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 28 minutes, 32 seconds)
2025-05-09 12:08:49,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:08:54,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1021.05579 ± 383.111
2025-05-09 12:08:54,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [545.94617, 897.7764, 510.825, 1081.3331, 1045.6327, 1378.1948, 906.67255, 1839.3392, 727.3521, 1277.4844]
2025-05-09 12:08:54,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 171.0, 98.0, 212.0, 188.0, 269.0, 175.0, 346.0, 153.0, 251.0]
2025-05-09 12:08:54,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 24 minutes, 33 seconds)
2025-05-09 12:12:53,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:12:58,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 911.51044 ± 355.392
2025-05-09 12:12:58,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [550.42114, 755.7024, 1722.176, 983.2336, 706.1407, 968.2468, 537.6088, 1336.0553, 628.03265, 927.4871]
2025-05-09 12:12:58,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 163.0, 354.0, 179.0, 128.0, 186.0, 116.0, 271.0, 131.0, 177.0]
2025-05-09 12:12:58,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 20 minutes, 46 seconds)
2025-05-09 12:16:54,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:17:01,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1407.77783 ± 1007.902
2025-05-09 12:17:01,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [3303.7673, 1354.9474, 933.8627, 688.0022, 547.0696, 871.19, 3367.842, 1529.3, 593.24426, 888.55414]
2025-05-09 12:17:01,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [623.0, 289.0, 187.0, 131.0, 98.0, 159.0, 647.0, 281.0, 109.0, 181.0]
2025-05-09 12:17:01,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (1407.78) for latency MM1Queue_a033_s075
2025-05-09 12:17:01,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 12:17:01,442 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 12:17:01,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 17 minutes, 12 seconds)
2025-05-09 12:20:58,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:21:05,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1321.25061 ± 498.443
2025-05-09 12:21:05,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1060.9512, 1784.3934, 1290.9401, 628.93304, 1566.0287, 1297.2524, 736.00275, 2344.03, 889.1532, 1614.8214]
2025-05-09 12:21:05,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [211.0, 340.0, 249.0, 124.0, 311.0, 247.0, 156.0, 432.0, 160.0, 290.0]
2025-05-09 12:21:05,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 13 minutes, 28 seconds)
2025-05-09 12:25:07,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:25:13,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1145.54956 ± 378.971
2025-05-09 12:25:13,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1293.9761, 1705.4988, 699.40155, 1495.9598, 1418.7981, 817.23486, 601.1559, 889.9681, 1587.1122, 946.3908]
2025-05-09 12:25:13,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [253.0, 315.0, 124.0, 272.0, 255.0, 148.0, 123.0, 178.0, 289.0, 209.0]
2025-05-09 12:25:13,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 10 minutes, 6 seconds)
2025-05-09 12:29:02,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:29:07,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1071.22595 ± 381.075
2025-05-09 12:29:07,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1769.6854, 1672.6688, 1130.948, 1040.1613, 1026.0907, 1181.8668, 750.15955, 591.1253, 581.839, 967.7149]
2025-05-09 12:29:07,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [344.0, 305.0, 237.0, 212.0, 186.0, 235.0, 132.0, 105.0, 105.0, 170.0]
2025-05-09 12:29:07,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 5 minutes, 18 seconds)
2025-05-09 12:33:03,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:33:11,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1654.44409 ± 698.463
2025-05-09 12:33:11,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2076.982, 1130.1002, 821.70355, 1395.5967, 697.2279, 1231.3296, 1524.0131, 2423.4849, 2832.3718, 2411.6309]
2025-05-09 12:33:11,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [384.0, 205.0, 178.0, 283.0, 151.0, 247.0, 306.0, 448.0, 542.0, 479.0]
2025-05-09 12:33:11,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (1654.44) for latency MM1Queue_a033_s075
2025-05-09 12:33:11,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 12:33:11,746 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 12:33:11,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 1 minute, 20 seconds)
2025-05-09 12:37:23,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:37:31,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1581.92749 ± 625.237
2025-05-09 12:37:31,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [597.17017, 2059.2703, 2193.485, 1019.82294, 869.3221, 1400.7062, 1464.2928, 1478.284, 2699.4797, 2037.4426]
2025-05-09 12:37:31,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 400.0, 411.0, 189.0, 160.0, 282.0, 299.0, 274.0, 516.0, 403.0]
2025-05-09 12:37:31,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 58 minutes, 51 seconds)
2025-05-09 12:41:15,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:41:22,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1438.76611 ± 364.356
2025-05-09 12:41:22,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1017.7334, 1306.2692, 1720.9628, 1675.5981, 1402.0962, 1959.4879, 1192.0299, 1484.0127, 1876.3483, 753.1228]
2025-05-09 12:41:22,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [197.0, 240.0, 311.0, 340.0, 275.0, 357.0, 246.0, 279.0, 346.0, 147.0]
2025-05-09 12:41:22,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 53 minutes, 38 seconds)
2025-05-09 12:45:18,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:45:27,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1713.72852 ± 932.327
2025-05-09 12:45:27,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1385.1854, 4270.261, 2024.2635, 1456.1299, 1412.9813, 1240.9268, 1169.7229, 2061.0574, 673.5321, 1443.2239]
2025-05-09 12:45:27,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [255.0, 823.0, 391.0, 265.0, 283.0, 248.0, 222.0, 402.0, 124.0, 280.0]
2025-05-09 12:45:27,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (1713.73) for latency MM1Queue_a033_s075
2025-05-09 12:45:27,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 12:45:27,102 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 12:45:27,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 49 minutes, 14 seconds)
2025-05-09 12:49:27,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:49:34,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1427.35730 ± 488.179
2025-05-09 12:49:34,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2057.1252, 966.96844, 1158.4104, 1312.6497, 1711.8154, 2422.156, 1582.0486, 1060.3096, 1220.6437, 781.4455]
2025-05-09 12:49:34,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [376.0, 174.0, 226.0, 259.0, 322.0, 453.0, 296.0, 194.0, 245.0, 141.0]
2025-05-09 12:49:34,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 46 minutes, 21 seconds)
2025-05-09 12:53:28,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:53:34,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1180.97754 ± 615.907
2025-05-09 12:53:34,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1304.2747, 1195.209, 822.8971, 2296.7178, 2337.6897, 498.23694, 979.188, 993.3517, 840.96716, 541.24365]
2025-05-09 12:53:34,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [252.0, 232.0, 155.0, 456.0, 460.0, 93.0, 177.0, 197.0, 160.0, 99.0]
2025-05-09 12:53:34,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 41 minutes, 53 seconds)
2025-05-09 12:57:38,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:57:50,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2336.08179 ± 1588.051
2025-05-09 12:57:50,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1273.7454, 679.7709, 2075.0806, 2185.7551, 875.32544, 964.79895, 5331.992, 2240.9182, 5203.124, 2530.307]
2025-05-09 12:57:50,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [236.0, 129.0, 376.0, 382.0, 180.0, 189.0, 1000.0, 413.0, 962.0, 480.0]
2025-05-09 12:57:50,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (2336.08) for latency MM1Queue_a033_s075
2025-05-09 12:57:50,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 12:57:50,482 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 12:57:50,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 37 minutes, 33 seconds)
2025-05-09 13:01:46,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:01:56,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 1952.94141 ± 941.613
2025-05-09 13:01:56,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [4146.9844, 1816.2076, 2788.8328, 1357.2802, 1662.523, 606.589, 2403.0981, 1861.4053, 1023.3029, 1863.1908]
2025-05-09 13:01:56,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [804.0, 328.0, 508.0, 253.0, 290.0, 109.0, 443.0, 335.0, 207.0, 341.0]
2025-05-09 13:01:56,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 34 minutes, 33 seconds)
2025-05-09 13:05:56,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:06:13,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3194.36768 ± 1505.789
2025-05-09 13:06:13,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2645.355, 2386.6392, 1840.7035, 1788.627, 1235.2289, 3333.422, 5357.25, 2695.6536, 5362.4326, 5298.364]
2025-05-09 13:06:13,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [521.0, 467.0, 369.0, 358.0, 239.0, 660.0, 1000.0, 510.0, 1000.0, 1000.0]
2025-05-09 13:06:13,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (3194.37) for latency MM1Queue_a033_s075
2025-05-09 13:06:13,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 13:06:13,176 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 13:06:13,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 31 minutes, 22 seconds)
2025-05-09 13:10:24,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:10:40,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3121.25342 ± 1470.254
2025-05-09 13:10:40,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1180.0631, 1505.8994, 1346.7815, 5321.7637, 3453.819, 5399.651, 2737.5876, 4023.1648, 2419.9272, 3823.875]
2025-05-09 13:10:40,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [234.0, 277.0, 249.0, 1000.0, 633.0, 1000.0, 503.0, 733.0, 445.0, 716.0]
2025-05-09 13:10:40,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 28 minutes, 35 seconds)
2025-05-09 13:14:31,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:14:41,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2031.69495 ± 1683.639
2025-05-09 13:14:41,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [792.46515, 1459.8804, 1573.4313, 5305.574, 710.1584, 2185.5994, 887.7638, 5249.9507, 693.2522, 1458.8734]
2025-05-09 13:14:41,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 276.0, 307.0, 1000.0, 124.0, 425.0, 168.0, 1000.0, 139.0, 270.0]
2025-05-09 13:14:41,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 24 minutes, 29 seconds)
2025-05-09 13:18:42,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:18:57,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2734.76709 ± 1545.468
2025-05-09 13:18:57,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [5101.5654, 2821.9402, 3706.6318, 526.85205, 630.36005, 1826.8756, 1699.2826, 2357.961, 4989.095, 3687.1082]
2025-05-09 13:18:57,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 544.0, 738.0, 95.0, 113.0, 360.0, 350.0, 473.0, 1000.0, 758.0]
2025-05-09 13:18:57,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 20 minutes, 13 seconds)
2025-05-09 13:23:00,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:23:16,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2917.06006 ± 1700.872
2025-05-09 13:23:16,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1104.6667, 3222.421, 4829.265, 995.7764, 3646.1968, 1973.3226, 5300.269, 802.8034, 5315.7783, 1980.1019]
2025-05-09 13:23:16,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [213.0, 596.0, 907.0, 209.0, 705.0, 378.0, 1000.0, 153.0, 984.0, 407.0]
2025-05-09 13:23:16,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 16 minutes, 46 seconds)
2025-05-09 13:27:24,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:27:37,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2482.98486 ± 1641.916
2025-05-09 13:27:37,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [852.2364, 5348.357, 1209.5203, 2347.5127, 3528.373, 3186.107, 484.64307, 701.9505, 4863.2734, 2307.8745]
2025-05-09 13:27:37,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 1000.0, 221.0, 437.0, 658.0, 622.0, 90.0, 126.0, 927.0, 417.0]
2025-05-09 13:27:37,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 12 minutes, 45 seconds)
2025-05-09 13:31:34,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:31:55,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 4062.27026 ± 1382.437
2025-05-09 13:31:55,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [5375.0557, 2241.8462, 5405.9937, 4332.3823, 4496.43, 3858.4492, 978.3405, 3879.908, 4498.2266, 5556.0723]
2025-05-09 13:31:55,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 407.0, 1000.0, 761.0, 877.0, 703.0, 200.0, 714.0, 803.0, 1000.0]
2025-05-09 13:31:55,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (4062.27) for latency MM1Queue_a033_s075
2025-05-09 13:31:55,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 13:31:55,191 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 13:31:55,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 8 minutes)
2025-05-09 13:36:00,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:36:15,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2887.26709 ± 2017.002
2025-05-09 13:36:15,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [4346.539, 5398.4956, 351.9467, 2402.1624, 790.52655, 1197.3969, 2826.81, 5451.2783, 635.1994, 5472.3145]
2025-05-09 13:36:15,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [775.0, 1000.0, 67.0, 431.0, 151.0, 232.0, 503.0, 1000.0, 115.0, 1000.0]
2025-05-09 13:36:15,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 4 minutes, 40 seconds)
2025-05-09 13:40:06,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:40:16,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2094.41919 ± 1433.605
2025-05-09 13:40:16,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [3147.7466, 472.8878, 1580.4097, 5500.77, 1131.6228, 1103.0116, 2690.4363, 1173.0812, 1115.2979, 3028.9258]
2025-05-09 13:40:16,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [640.0, 86.0, 328.0, 1000.0, 189.0, 209.0, 509.0, 213.0, 237.0, 566.0]
2025-05-09 13:40:16,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 59 minutes, 43 seconds)
2025-05-09 13:44:22,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:44:43,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 4102.92041 ± 1719.901
2025-05-09 13:44:43,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [5554.5273, 2491.5068, 5277.017, 3987.0857, 1088.0256, 5250.216, 5314.5845, 5414.9395, 1242.2446, 5409.056]
2025-05-09 13:44:43,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 508.0, 1000.0, 756.0, 237.0, 1000.0, 1000.0, 1000.0, 224.0, 1000.0]
2025-05-09 13:44:43,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (4102.92) for latency MM1Queue_a033_s075
2025-05-09 13:44:43,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 13:44:43,930 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 13:44:43,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 55 minutes, 48 seconds)
2025-05-09 13:48:46,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:49:15,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 5269.08252 ± 48.094
2025-05-09 13:49:15,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [5242.8247, 5323.8623, 5185.654, 5275.352, 5278.2734, 5212.7593, 5316.7, 5245.776, 5348.5127, 5261.1104]
2025-05-09 13:49:15,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 13:49:15,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (5269.08) for latency MM1Queue_a033_s075
2025-05-09 13:49:15,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 13:49:15,433 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 13:49:15,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 51 minutes, 55 seconds)
2025-05-09 13:53:19,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:53:45,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 4803.17188 ± 1239.527
2025-05-09 13:53:45,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [3325.0625, 5471.63, 1581.1997, 5368.3213, 5422.5825, 5417.5474, 5446.8438, 5351.561, 5238.2793, 5408.6875]
2025-05-09 13:53:45,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [643.0, 1000.0, 314.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 13:53:45,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 48 minutes, 2 seconds)
2025-05-09 13:57:35,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:57:54,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3703.65234 ± 1931.408
2025-05-09 13:57:54,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [5509.7666, 989.9926, 5315.2036, 3602.5828, 5474.668, 972.34326, 3793.0974, 808.6289, 5487.9575, 5082.2837]
2025-05-09 13:57:54,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 185.0, 1000.0, 670.0, 1000.0, 185.0, 717.0, 150.0, 1000.0, 948.0]
2025-05-09 13:57:54,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 43 minutes, 18 seconds)
2025-05-09 14:02:05,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:02:27,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 4279.08496 ± 1640.787
2025-05-09 14:02:27,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [2283.242, 5097.4644, 5478.783, 5523.715, 2564.4065, 805.8209, 5566.469, 5332.6807, 4707.9995, 5430.2705]
2025-05-09 14:02:27,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [428.0, 908.0, 1000.0, 1000.0, 522.0, 170.0, 1000.0, 1000.0, 869.0, 1000.0]
2025-05-09 14:02:27,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 39 minutes, 55 seconds)
2025-05-09 14:06:25,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:06:54,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 5304.21826 ± 325.411
2025-05-09 14:06:54,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [5454.669, 5359.1, 4333.5034, 5411.3545, 5415.873, 5476.0063, 5360.291, 5399.21, 5403.1235, 5429.0493]
2025-05-09 14:06:54,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 833.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:06:54,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (5304.22) for latency MM1Queue_a033_s075
2025-05-09 14:06:54,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 14:06:54,363 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 14:06:54,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 35 minutes, 28 seconds)
2025-05-09 14:10:44,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:11:01,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3224.38477 ± 2068.016
2025-05-09 14:11:01,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [538.4505, 5296.8154, 704.94507, 1566.3494, 5412.942, 2760.6787, 5052.5537, 671.5389, 5356.331, 4883.2417]
2025-05-09 14:11:01,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 1000.0, 152.0, 329.0, 1000.0, 511.0, 977.0, 142.0, 1000.0, 899.0]
2025-05-09 14:11:01,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 30 minutes, 28 seconds)
2025-05-09 14:14:58,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:15:19,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 4156.18213 ± 1585.505
2025-05-09 14:15:19,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [3665.15, 5491.4272, 4112.879, 573.8486, 5590.2534, 5566.5254, 4776.1704, 5440.7573, 4236.2417, 2108.565]
2025-05-09 14:15:19,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [660.0, 1000.0, 756.0, 102.0, 1000.0, 1000.0, 859.0, 1000.0, 779.0, 377.0]
2025-05-09 14:15:19,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 25 minutes, 52 seconds)
2025-05-09 14:19:15,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:19:43,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 5435.28760 ± 68.342
2025-05-09 14:19:43,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [5495.3667, 5349.451, 5518.4297, 5380.5063, 5405.497, 5454.317, 5472.66, 5315.055, 5528.6025, 5432.9937]
2025-05-09 14:19:43,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:19:43,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1226 [INFO]: New best (5435.29) for latency MM1Queue_a033_s075
2025-05-09 14:19:43,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 14:19:43,444 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 14:19:43,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 21 minutes, 48 seconds)
2025-05-09 14:23:32,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:23:48,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 3151.91162 ± 2010.946
2025-05-09 14:23:48,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [5334.058, 2514.68, 638.12103, 5493.921, 2381.537, 5532.43, 1061.5936, 391.45825, 2767.8164, 5403.5]
2025-05-09 14:23:48,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [954.0, 445.0, 112.0, 1000.0, 431.0, 1000.0, 188.0, 72.0, 500.0, 1000.0]
2025-05-09 14:23:48,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 17 minutes, 4 seconds)
2025-05-09 14:27:56,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:28:08,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2399.76611 ± 1505.988
2025-05-09 14:28:08,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [1270.8494, 560.4157, 3520.8276, 1716.5876, 1783.869, 2126.9111, 928.48224, 2274.382, 5644.3877, 4170.9487]
2025-05-09 14:28:08,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [245.0, 127.0, 626.0, 317.0, 323.0, 389.0, 169.0, 408.0, 1000.0, 800.0]
2025-05-09 14:28:08,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 12 minutes, 44 seconds)
2025-05-09 14:32:04,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:32:19,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 2940.69336 ± 1816.394
2025-05-09 14:32:19,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [734.5578, 913.8952, 2610.498, 5492.8784, 587.2797, 5465.6294, 4380.9575, 4535.2017, 2069.0427, 2616.9922]
2025-05-09 14:32:19,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 177.0, 477.0, 1000.0, 104.0, 1000.0, 810.0, 813.0, 382.0, 496.0]
2025-05-09 14:32:19,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 8 minutes, 30 seconds)
2025-05-09 14:36:03,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:36:31,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 5425.74463 ± 307.092
2025-05-09 14:36:31,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [5520.3125, 5479.951, 5552.217, 5548.287, 4519.3813, 5604.5854, 5526.787, 5585.597, 5394.7646, 5525.5625]
2025-05-09 14:36:31,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 851.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:36:31,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 14 seconds)
2025-05-09 14:40:46,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:41:11,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1221 [DEBUG]: Total Reward: 4434.89648 ± 1238.717
2025-05-09 14:41:11,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1222 [DEBUG]: All rewards: [5297.1035, 5318.9097, 5336.9604, 5301.99, 5312.8516, 3064.5571, 3870.8076, 5243.1875, 1517.4961, 4085.1006]
2025-05-09 14:41:11,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 570.0, 717.0, 1000.0, 294.0, 805.0]
2025-05-09 14:41:11,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1251 [DEBUG]: Training session finished
