2025-05-10 22:04:02,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mem2
2025-05-10 22:04:02,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mem2
2025-05-10 22:04:02,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7ec5931cee80>}
2025-05-10 22:04:02,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1111 [DEBUG]: using device: cpu
2025-05-10 22:04:02,598 baseline-bpql-noisy-ant:77 [WARNING]: args.assumed_delay != args.horizon: 2 != 24
2025-05-10 22:04:02,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1133 [INFO]: Creating new trainer
2025-05-10 22:04:02,609 baseline-bpql-noisy-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=43, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-05-10 22:04:02,609 baseline-bpql-noisy-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-10 22:04:02,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1194 [DEBUG]: Starting training session...
2025-05-10 22:04:02,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 1/100
2025-05-10 22:07:38,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:07:54,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: -1039.45471 ± 547.768
2025-05-10 22:07:54,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [-933.3699, -1479.4363, -9.038289, -978.8423, -1485.7576, -1364.4249, -19.202896, -1200.8591, -1432.5511, -1491.0647]
2025-05-10 22:07:54,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 13.0, 1000.0, 1000.0, 1000.0, 39.0, 1000.0, 1000.0, 1000.0]
2025-05-10 22:07:54,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (-1039.45) for latency ExtremeClogL1U23
2025-05-10 22:07:54,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:07:54,056 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 22:07:54,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 6 hours, 21 minutes, 19 seconds)
2025-05-10 22:10:52,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:10:57,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: -75.84890 ± 88.736
2025-05-10 22:10:57,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [24.451855, -30.728054, -53.79974, -123.58122, -68.63715, -7.1611304, 22.865705, -203.36661, -257.2163, -61.31634]
2025-05-10 22:10:57,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [79.0, 95.0, 116.0, 1000.0, 154.0, 115.0, 61.0, 230.0, 1000.0, 175.0]
2025-05-10 22:10:57,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (-75.85) for latency ExtremeClogL1U23
2025-05-10 22:10:57,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:10:57,994 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 22:10:58,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 38 minutes, 57 seconds)
2025-05-10 22:13:48,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:13:56,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 6.71350 ± 32.865
2025-05-10 22:13:56,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [48.772446, 14.875688, -17.349985, 26.070889, -8.13861, -26.594074, -17.621758, 10.924873, -35.810944, 72.00652]
2025-05-10 22:13:56,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [290.0, 542.0, 1000.0, 163.0, 1000.0, 196.0, 75.0, 895.0, 181.0, 583.0]
2025-05-10 22:13:56,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (6.71) for latency ExtremeClogL1U23
2025-05-10 22:13:56,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:13:56,793 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 22:13:56,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 20 minutes)
2025-05-10 22:16:57,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:17:10,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 190.91081 ± 133.963
2025-05-10 22:17:10,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [182.62988, 34.20214, 212.5011, 123.52742, 455.02737, 6.314033, 306.00073, 72.94066, 328.1036, 187.86119]
2025-05-10 22:17:10,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [911.0, 166.0, 1000.0, 567.0, 1000.0, 17.0, 1000.0, 224.0, 1000.0, 846.0]
2025-05-10 22:17:10,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (190.91) for latency ExtremeClogL1U23
2025-05-10 22:17:10,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:17:10,659 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 22:17:10,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 15 minutes, 4 seconds)
2025-05-10 22:20:05,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:20:22,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 456.53766 ± 143.999
2025-05-10 22:20:22,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [486.65952, 421.07074, 546.91785, 439.3027, 394.923, 544.0286, 449.1255, 614.92206, 79.05691, 589.3694]
2025-05-10 22:20:22,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 147.0, 1000.0]
2025-05-10 22:20:22,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (456.54) for latency ExtremeClogL1U23
2025-05-10 22:20:22,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:20:22,233 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 22:20:22,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 10 minutes, 6 seconds)
2025-05-10 22:23:27,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:23:45,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 627.43518 ± 114.416
2025-05-10 22:23:45,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [689.6453, 778.48517, 331.42126, 723.4145, 571.12427, 659.5099, 630.6279, 641.9824, 585.711, 662.43036]
2025-05-10 22:23:45,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 482.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 22:23:45,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (627.44) for latency ExtremeClogL1U23
2025-05-10 22:23:45,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:23:45,558 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 22:23:45,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 58 minutes, 8 seconds)
2025-05-10 22:26:35,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:26:47,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 493.04248 ± 293.678
2025-05-10 22:26:47,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [700.263, 793.47485, 707.215, 71.68375, 660.9283, 439.7045, 815.0202, 19.055983, 598.0838, 124.99536]
2025-05-10 22:26:47,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [871.0, 1000.0, 1000.0, 62.0, 1000.0, 494.0, 1000.0, 14.0, 1000.0, 128.0]
2025-05-10 22:26:47,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 54 minutes, 15 seconds)
2025-05-10 22:29:51,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:30:06,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 236.21594 ± 309.913
2025-05-10 22:30:06,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [-100.13889, 24.5191, 650.557, 268.9319, 607.6503, 626.20935, -111.76999, 242.9758, 366.10938, -212.88457]
2025-05-10 22:30:06,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 26.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 233.0, 1000.0, 1000.0]
2025-05-10 22:30:06,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 57 minutes, 29 seconds)
2025-05-10 22:33:10,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:33:19,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 460.87735 ± 362.410
2025-05-10 22:33:19,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [55.207237, 21.641361, 34.846825, 900.07574, 113.627815, 823.4099, 964.7891, 598.75916, 691.92993, 404.48648]
2025-05-10 22:33:19,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [34.0, 19.0, 36.0, 1000.0, 146.0, 1000.0, 1000.0, 651.0, 1000.0, 365.0]
2025-05-10 22:33:19,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 53 minutes, 54 seconds)
2025-05-10 22:36:12,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:36:29,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 772.20660 ± 269.655
2025-05-10 22:36:29,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [690.76013, 727.31683, 730.97394, 1246.5707, 703.59174, 651.4592, 243.53888, 1063.875, 607.5239, 1056.4554]
2025-05-10 22:36:29,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 210.0, 911.0, 1000.0, 1000.0]
2025-05-10 22:36:29,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (772.21) for latency ExtremeClogL1U23
2025-05-10 22:36:29,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:36:29,304 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 22:36:29,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 50 minutes, 7 seconds)
2025-05-10 22:39:20,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:39:32,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 577.40515 ± 193.237
2025-05-10 22:39:32,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [546.41345, 890.70166, 158.63559, 561.37134, 586.25256, 645.51965, 540.6085, 542.68866, 450.21048, 851.64966]
2025-05-10 22:39:32,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [373.0, 1000.0, 94.0, 1000.0, 375.0, 492.0, 1000.0, 1000.0, 400.0, 711.0]
2025-05-10 22:39:32,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 40 minutes, 46 seconds)
2025-05-10 22:42:36,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:42:46,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 560.26398 ± 256.680
2025-05-10 22:42:46,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [655.6902, 641.3475, 284.045, 377.9262, 588.1766, 136.84096, 395.86957, 595.6639, 966.0553, 961.0243]
2025-05-10 22:42:46,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 502.0, 166.0, 256.0, 377.0, 107.0, 280.0, 1000.0, 1000.0, 1000.0]
2025-05-10 22:42:46,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 41 minutes, 17 seconds)
2025-05-10 22:45:32,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:45:44,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 605.88135 ± 249.184
2025-05-10 22:45:44,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [492.59198, 462.10352, 688.4616, 375.50967, 806.60645, 1032.5265, 707.1896, 651.5071, 86.02953, 756.2882]
2025-05-10 22:45:44,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [437.0, 339.0, 1000.0, 386.0, 1000.0, 783.0, 1000.0, 1000.0, 55.0, 1000.0]
2025-05-10 22:45:44,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 31 minutes, 59 seconds)
2025-05-10 22:48:44,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:48:51,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 397.99615 ± 327.976
2025-05-10 22:48:51,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [758.39215, 1019.52954, 448.2191, 59.216496, 118.51455, 652.25494, 86.56596, 176.15381, 62.095512, 599.0198]
2025-05-10 22:48:51,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 829.0, 286.0, 39.0, 67.0, 1000.0, 66.0, 111.0, 33.0, 452.0]
2025-05-10 22:48:51,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 27 minutes, 10 seconds)
2025-05-10 22:52:04,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:52:14,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 736.39899 ± 443.786
2025-05-10 22:52:14,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [835.4696, 1033.4042, 1058.2596, 712.2476, 288.96234, 1116.1635, 136.00262, 102.86236, 552.3176, 1528.3003]
2025-05-10 22:52:14,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 598.0, 1000.0, 431.0, 252.0, 1000.0, 128.0, 157.0, 334.0, 1000.0]
2025-05-10 22:52:14,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 27 minutes, 49 seconds)
2025-05-10 22:55:11,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:55:27,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1193.92603 ± 418.283
2025-05-10 22:55:27,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1562.988, 709.5448, 923.68854, 1482.2955, 1485.0261, 839.2923, 1717.1133, 396.59097, 1474.8414, 1347.8785]
2025-05-10 22:55:27,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 276.0, 1000.0, 1000.0]
2025-05-10 22:55:27,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (1193.93) for latency ExtremeClogL1U23
2025-05-10 22:55:27,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:55:27,836 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 22:55:27,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 27 minutes, 37 seconds)
2025-05-10 22:58:39,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:58:58,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1302.92346 ± 455.290
2025-05-10 22:58:58,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1251.2489, 1639.6602, 747.29895, 987.6666, 948.1057, 1772.4558, 622.8171, 1842.4688, 1247.0593, 1970.4524]
2025-05-10 22:58:58,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 787.0, 1000.0]
2025-05-10 22:58:58,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (1302.92) for latency ExtremeClogL1U23
2025-05-10 22:58:58,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:58:58,862 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 22:58:58,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 29 minutes, 6 seconds)
2025-05-10 23:01:48,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:01:58,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 511.92715 ± 282.394
2025-05-10 23:01:58,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [248.06587, 751.4442, 881.95746, 150.7294, 224.09207, 499.99045, 293.89844, 841.096, 346.25735, 881.7405]
2025-05-10 23:01:58,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [155.0, 1000.0, 1000.0, 127.0, 128.0, 1000.0, 168.0, 1000.0, 179.0, 1000.0]
2025-05-10 23:01:58,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 26 minutes, 10 seconds)
2025-05-10 23:05:08,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:05:19,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1020.54364 ± 571.350
2025-05-10 23:05:19,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1794.3978, 1698.2681, 785.69086, 337.7704, 1889.2786, 872.9253, 614.79944, 487.8713, 1314.9401, 409.49414]
2025-05-10 23:05:19,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 440.0, 174.0, 913.0, 1000.0, 391.0, 200.0, 703.0, 256.0]
2025-05-10 23:05:19,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 26 minutes, 43 seconds)
2025-05-10 23:08:12,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:08:19,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 364.17789 ± 359.978
2025-05-10 23:08:19,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [28.106592, 337.75537, 842.11975, 970.7662, 35.141407, 126.0487, 82.463356, 80.38626, 866.6824, 272.30847]
2025-05-10 23:08:19,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [23.0, 287.0, 1000.0, 1000.0, 33.0, 78.0, 111.0, 56.0, 1000.0, 191.0]
2025-05-10 23:08:19,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 17 minutes, 11 seconds)
2025-05-10 23:11:20,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:11:34,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1147.95386 ± 502.209
2025-05-10 23:11:34,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1492.2828, 1964.8823, 1014.7559, 210.80032, 596.28174, 842.20844, 1359.9443, 1775.9854, 1098.6027, 1123.7948]
2025-05-10 23:11:34,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 95.0, 321.0, 381.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 23:11:34,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 14 minutes, 28 seconds)
2025-05-10 23:14:48,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:15:04,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1416.65076 ± 424.853
2025-05-10 23:15:04,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1688.146, 1848.8198, 1535.8982, 1056.6873, 1934.4615, 1568.4814, 1261.4651, 1788.0354, 879.95245, 604.5612]
2025-05-10 23:15:04,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 876.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 469.0, 303.0]
2025-05-10 23:15:04,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (1416.65) for latency ExtremeClogL1U23
2025-05-10 23:15:04,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 23:15:04,979 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 23:15:04,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 11 minutes, 11 seconds)
2025-05-10 23:18:04,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:18:19,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1158.99243 ± 457.921
2025-05-10 23:18:19,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1292.4365, 638.665, 192.63998, 1225.5829, 1540.5303, 1452.8342, 1354.9943, 1266.0586, 1833.0973, 793.0852]
2025-05-10 23:18:19,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 387.0, 80.0, 1000.0, 1000.0, 1000.0, 692.0, 604.0, 1000.0, 1000.0]
2025-05-10 23:18:19,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 11 minutes, 42 seconds)
2025-05-10 23:21:16,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:21:31,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1085.13586 ± 553.455
2025-05-10 23:21:31,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1094.7455, 1007.6624, 842.0362, 1091.3713, 450.11826, 1538.4635, 2016.3724, 819.6706, 148.29134, 1842.6271]
2025-05-10 23:21:31,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 209.0, 1000.0, 1000.0, 579.0, 102.0, 1000.0]
2025-05-10 23:21:31,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 6 minutes, 11 seconds)
2025-05-10 23:24:24,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:24:39,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1290.59644 ± 607.029
2025-05-10 23:24:39,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1461.1514, 31.298857, 1009.45734, 1686.328, 1127.8193, 1474.7926, 812.9581, 2099.3088, 2192.3882, 1010.46246]
2025-05-10 23:24:39,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 31.0, 530.0, 855.0, 1000.0, 658.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 23:24:39,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 5 minutes)
2025-05-10 23:27:42,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:27:56,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1259.75806 ± 680.042
2025-05-10 23:27:56,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [796.2539, 1178.7524, 825.86444, 1879.9491, 2294.2314, 556.7533, 2142.4285, 217.93555, 1790.9685, 914.44434]
2025-05-10 23:27:56,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 614.0, 1000.0, 1000.0, 1000.0, 257.0, 1000.0, 107.0, 1000.0, 397.0]
2025-05-10 23:27:56,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 2 minutes, 12 seconds)
2025-05-10 23:30:54,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:31:09,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1667.55469 ± 602.583
2025-05-10 23:31:09,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2129.1956, 549.17285, 1903.581, 2060.5095, 1756.0553, 1994.2305, 2359.4827, 780.04755, 1043.5854, 2099.6848]
2025-05-10 23:31:09,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 260.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 317.0, 487.0, 1000.0]
2025-05-10 23:31:09,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (1667.55) for latency ExtremeClogL1U23
2025-05-10 23:31:09,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 23:31:09,050 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 23:31:09,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 54 minutes, 35 seconds)
2025-05-10 23:34:20,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:34:37,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1887.74866 ± 700.945
2025-05-10 23:34:37,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2452.7993, 2447.2102, 2350.067, 1091.2484, 975.6707, 597.02295, 2499.302, 2457.405, 2327.1858, 1679.5737]
2025-05-10 23:34:37,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 238.0, 1000.0, 925.0, 1000.0, 1000.0]
2025-05-10 23:34:37,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (1887.75) for latency ExtremeClogL1U23
2025-05-10 23:34:37,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 23:34:37,758 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 23:34:37,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 54 minutes, 49 seconds)
2025-05-10 23:37:27,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:37:38,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1129.21265 ± 898.056
2025-05-10 23:37:38,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1114.4646, 581.29346, 167.74548, 207.51395, 2276.8884, 1124.0043, 2583.9053, 2304.1067, 80.394356, 851.8105]
2025-05-10 23:37:38,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 225.0, 90.0, 124.0, 1000.0, 611.0, 1000.0, 1000.0, 44.0, 1000.0]
2025-05-10 23:37:38,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 48 minutes, 56 seconds)
2025-05-10 23:40:36,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:40:47,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1274.85706 ± 678.338
2025-05-10 23:40:47,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1352.3926, 866.8417, 1394.3461, 2327.6685, 98.40486, 2076.1492, 1866.1917, 433.76846, 850.5677, 1482.2388]
2025-05-10 23:40:47,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [633.0, 1000.0, 564.0, 1000.0, 96.0, 661.0, 775.0, 225.0, 313.0, 714.0]
2025-05-10 23:40:47,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 45 minutes, 55 seconds)
2025-05-10 23:43:43,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:43:57,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1497.94263 ± 871.629
2025-05-10 23:43:57,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [555.368, 644.1706, 1998.9456, 2353.9983, 2487.0254, 748.45044, 1454.9263, 2205.214, 2458.5918, 72.735725]
2025-05-10 23:43:57,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [255.0, 295.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 41.0]
2025-05-10 23:43:57,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 41 minutes, 7 seconds)
2025-05-10 23:46:59,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:47:13,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1669.17737 ± 898.996
2025-05-10 23:47:13,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2505.1072, 796.66284, 1849.3175, 2420.8418, 129.1287, 2143.0493, 196.71799, 2392.8235, 1733.2496, 2524.8757]
2025-05-10 23:47:13,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 61.0, 1000.0, 82.0, 1000.0, 695.0, 1000.0]
2025-05-10 23:47:13,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 38 minutes, 39 seconds)
2025-05-10 23:50:05,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:50:15,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1460.05396 ± 1014.651
2025-05-10 23:50:15,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2279.7542, 244.66801, 111.109474, 2315.4224, 1323.972, 2463.6155, 2596.0347, 84.37807, 2412.9893, 768.59644]
2025-05-10 23:50:15,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 121.0, 66.0, 1000.0, 487.0, 1000.0, 1000.0, 43.0, 1000.0, 276.0]
2025-05-10 23:50:15,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 29 minutes, 23 seconds)
2025-05-10 23:53:11,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:53:27,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1888.98828 ± 801.036
2025-05-10 23:53:27,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1450.0349, 2519.6575, 2101.9026, 958.7794, 179.0533, 2051.1577, 3058.6143, 2619.9336, 2130.5215, 1820.2284]
2025-05-10 23:53:27,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [628.0, 1000.0, 1000.0, 1000.0, 115.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 23:53:27,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (1888.99) for latency ExtremeClogL1U23
2025-05-10 23:53:27,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 23:53:27,434 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 23:53:27,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 28 minutes, 43 seconds)
2025-05-10 23:56:27,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:56:41,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2056.15967 ± 920.511
2025-05-10 23:56:41,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2268.3052, 2618.203, 2833.1128, 2444.8477, 245.67177, 2873.8496, 420.48117, 2759.8545, 1701.8356, 2395.433]
2025-05-10 23:56:41,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 95.0, 1000.0, 179.0, 1000.0, 588.0, 1000.0]
2025-05-10 23:56:41,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (2056.16) for latency ExtremeClogL1U23
2025-05-10 23:56:41,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 23:56:41,053 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 23:56:41,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 26 minutes, 37 seconds)
2025-05-10 23:59:34,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:59:49,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2130.46436 ± 919.213
2025-05-10 23:59:49,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [411.17493, 3088.6045, 2758.608, 2727.7861, 1170.4376, 2722.0588, 780.98755, 2185.8875, 2802.2527, 2656.848]
2025-05-10 23:59:49,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [130.0, 1000.0, 1000.0, 1000.0, 505.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 23:59:49,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (2130.46) for latency ExtremeClogL1U23
2025-05-10 23:59:49,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 23:59:49,716 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 23:59:49,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 23 minutes, 7 seconds)
2025-05-11 00:02:51,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:03:06,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2017.49646 ± 1052.051
2025-05-11 00:03:06,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1276.4362, 2691.0408, 2394.4521, 799.3458, 3106.7515, 967.69495, 2765.3125, 3090.7988, 129.74004, 2953.393]
2025-05-11 00:03:06,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [482.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 74.0, 1000.0]
2025-05-11 00:03:06,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 20 minutes, 8 seconds)
2025-05-11 00:06:14,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:06:30,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2070.43384 ± 613.979
2025-05-11 00:06:30,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [894.8011, 2546.1792, 2435.923, 2604.4187, 2612.3064, 1489.2408, 2472.365, 2306.4783, 1135.0004, 2207.6245]
2025-05-11 00:06:30,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 853.0, 1000.0, 1000.0, 590.0, 1000.0, 846.0, 1000.0, 894.0]
2025-05-11 00:06:30,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 21 minutes, 33 seconds)
2025-05-11 00:09:14,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:09:29,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1609.91235 ± 865.359
2025-05-11 00:09:29,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1292.5656, 2822.252, 2931.8962, 1073.4703, 201.52332, 2590.7483, 1778.5975, 1279.7489, 744.919, 1383.402]
2025-05-11 00:09:29,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 976.0, 1000.0, 618.0, 83.0, 1000.0, 783.0, 514.0, 1000.0, 1000.0]
2025-05-11 00:09:29,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 15 minutes, 32 seconds)
2025-05-11 00:12:32,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:12:45,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1698.60522 ± 939.077
2025-05-11 00:12:45,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [845.70337, 1540.549, 2913.2388, 2570.0527, 2071.523, 920.27136, 2733.1282, 203.07605, 2531.8, 656.7094]
2025-05-11 00:12:45,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 507.0, 1000.0, 1000.0, 1000.0, 375.0, 1000.0, 85.0, 1000.0, 200.0]
2025-05-11 00:12:45,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 12 minutes, 49 seconds)
2025-05-11 00:15:33,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:15:44,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1675.00159 ± 915.626
2025-05-11 00:15:44,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [171.55606, 1574.2115, 2635.173, 1827.6902, 2641.467, 2167.4233, 2922.0518, 1261.7788, 1341.3989, 207.2665]
2025-05-11 00:15:44,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [75.0, 602.0, 1000.0, 1000.0, 915.0, 816.0, 1000.0, 493.0, 469.0, 137.0]
2025-05-11 00:15:44,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 7 minutes, 47 seconds)
2025-05-11 00:18:51,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:19:09,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2082.51514 ± 652.186
2025-05-11 00:19:09,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2621.9346, 2568.6025, 835.12726, 1445.5115, 2764.6484, 2701.8623, 1243.6038, 1889.0182, 2363.1406, 2391.7012]
2025-05-11 00:19:09,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 519.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 00:19:09,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 6 minutes, 11 seconds)
2025-05-11 00:22:12,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:22:27,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2253.24609 ± 807.769
2025-05-11 00:22:27,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3333.763, 3216.9539, 2225.981, 3134.3096, 2747.138, 1754.0531, 1275.4774, 2203.6802, 840.8659, 1800.2412]
2025-05-11 00:22:27,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 550.0, 887.0, 323.0, 600.0]
2025-05-11 00:22:27,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (2253.25) for latency ExtremeClogL1U23
2025-05-11 00:22:27,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-11 00:22:27,827 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 00:22:27,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 1 minute, 51 seconds)
2025-05-11 00:25:28,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:25:46,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2260.81299 ± 619.162
2025-05-11 00:25:46,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2034.3772, 1363.9135, 2836.2632, 2616.4988, 1866.0369, 2686.053, 2726.6497, 2733.9077, 2720.9614, 1023.46985]
2025-05-11 00:25:46,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 633.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 00:25:46,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (2260.81) for latency ExtremeClogL1U23
2025-05-11 00:25:46,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-11 00:25:46,440 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 00:25:46,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 2 minutes, 26 seconds)
2025-05-11 00:28:47,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:29:04,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2555.89478 ± 657.033
2025-05-11 00:29:04,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3119.333, 2701.1091, 3080.465, 2949.5557, 1838.9579, 2753.5188, 2846.6167, 952.2242, 2259.9546, 3057.2117]
2025-05-11 00:29:04,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 626.0, 1000.0, 1000.0, 313.0, 1000.0, 1000.0]
2025-05-11 00:29:04,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (2555.89) for latency ExtremeClogL1U23
2025-05-11 00:29:04,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-11 00:29:04,387 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 00:29:04,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 59 minutes, 31 seconds)
2025-05-11 00:32:05,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:32:20,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1882.41663 ± 1107.473
2025-05-11 00:32:20,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [649.6008, 2665.8855, 3086.4033, 583.74243, 2875.7017, 3044.52, 821.0667, 1593.8618, 3058.2441, 445.13818]
2025-05-11 00:32:20,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 225.0, 1000.0, 1000.0, 391.0, 1000.0, 1000.0, 143.0]
2025-05-11 00:32:20,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 59 minutes, 12 seconds)
2025-05-11 00:35:14,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:35:28,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1929.71484 ± 1070.473
2025-05-11 00:35:28,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2864.5222, 930.5644, 2240.2827, 134.84903, 2919.3809, 1139.1898, 541.0876, 2449.8142, 2984.0215, 3093.4355]
2025-05-11 00:35:28,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 407.0, 1000.0, 57.0, 1000.0, 413.0, 354.0, 1000.0, 1000.0, 1000.0]
2025-05-11 00:35:28,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 52 minutes, 50 seconds)
2025-05-11 00:38:42,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:38:58,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2450.87305 ± 530.190
2025-05-11 00:38:58,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2859.6436, 1917.3712, 2893.5784, 2217.6238, 2319.2283, 1221.8633, 3065.269, 2475.1792, 2770.095, 2768.878]
2025-05-11 00:38:58,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 731.0, 1000.0, 720.0, 1000.0, 459.0, 1000.0, 899.0, 1000.0, 1000.0]
2025-05-11 00:38:58,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 51 minutes, 38 seconds)
2025-05-11 00:41:49,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:42:05,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1965.10840 ± 1097.276
2025-05-11 00:42:05,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2831.6904, 17.779049, 3239.3694, 2556.884, 1660.6338, 3290.751, 906.18854, 960.30023, 2962.384, 1225.1034]
2025-05-11 00:42:05,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [925.0, 15.0, 1000.0, 1000.0, 516.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 00:42:05,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 46 minutes, 27 seconds)
2025-05-11 00:45:00,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:45:18,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2665.40845 ± 826.895
2025-05-11 00:45:18,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3279.0706, 1596.3364, 3192.805, 3231.1912, 2408.6602, 3089.649, 3298.6658, 3461.8945, 2233.7886, 862.0212]
2025-05-11 00:45:18,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 691.0, 1000.0]
2025-05-11 00:45:18,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (2665.41) for latency ExtremeClogL1U23
2025-05-11 00:45:18,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-11 00:45:18,694 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 00:45:18,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 42 minutes, 22 seconds)
2025-05-11 00:48:36,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:48:53,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2097.12720 ± 840.637
2025-05-11 00:48:53,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2912.6262, 495.3738, 3156.9978, 2058.6353, 2100.8928, 2727.776, 2426.6858, 2752.956, 1334.6191, 1004.7089]
2025-05-11 00:48:53,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 213.0, 1000.0, 1000.0, 864.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 00:48:53,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 42 minutes, 15 seconds)
2025-05-11 00:51:47,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:52:01,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2146.27441 ± 1047.817
2025-05-11 00:52:01,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2653.4321, 2829.7224, 3175.4749, 3009.1365, 3124.4343, 1355.4644, 1164.9598, 156.76193, 2972.1357, 1021.22]
2025-05-11 00:52:01,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 432.0, 592.0, 68.0, 1000.0, 389.0]
2025-05-11 00:52:01,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 38 minutes, 56 seconds)
2025-05-11 00:54:52,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:55:09,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2479.02905 ± 894.170
2025-05-11 00:55:09,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2598.6394, 2910.2617, 2815.884, 3283.401, 2307.4443, 2711.2551, 3316.3892, 3249.1284, 875.38556, 722.5045]
2025-05-11 00:55:09,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 294.0]
2025-05-11 00:55:09,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 32 minutes, 13 seconds)
2025-05-11 00:58:19,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:58:33,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1989.98792 ± 1019.315
2025-05-11 00:58:33,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3080.321, 2680.964, 3157.2366, 665.59863, 2542.5828, 822.9463, 1471.2534, 2687.1243, 282.5893, 2509.2627]
2025-05-11 00:58:33,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 326.0, 1000.0, 394.0, 763.0, 1000.0, 93.0, 1000.0]
2025-05-11 00:58:33,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 31 minutes, 25 seconds)
2025-05-11 01:01:32,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:01:49,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2612.18823 ± 707.621
2025-05-11 01:01:49,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3010.702, 3119.2788, 2707.2678, 2909.1733, 2936.7937, 2430.5498, 602.3179, 3150.5452, 2741.2886, 2513.964]
2025-05-11 01:01:49,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 176.0, 1000.0, 1000.0, 1000.0]
2025-05-11 01:01:49,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 28 minutes, 39 seconds)
2025-05-11 01:04:45,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:05:01,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2282.47461 ± 866.501
2025-05-11 01:05:01,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3303.1099, 1315.4083, 3285.7776, 2090.0493, 2861.0515, 3140.4475, 1203.0988, 1685.1656, 1048.3376, 2892.298]
2025-05-11 01:05:01,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 978.0, 1000.0, 379.0, 503.0, 1000.0, 1000.0]
2025-05-11 01:05:01,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 21 minutes, 58 seconds)
2025-05-11 01:08:16,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:08:32,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2591.24658 ± 959.940
2025-05-11 01:08:32,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1613.1671, 2393.819, 2722.0598, 144.01572, 3026.4558, 2969.6843, 3384.161, 3031.2854, 3231.0684, 3396.7495]
2025-05-11 01:08:32,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 720.0, 1000.0, 54.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 01:08:32,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 22 minutes, 6 seconds)
2025-05-11 01:11:20,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:11:38,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2849.75806 ± 487.497
2025-05-11 01:11:38,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2950.8384, 3114.087, 3090.5317, 2906.9937, 2922.953, 3084.882, 1508.9999, 3213.057, 2502.6746, 3202.5623]
2025-05-11 01:11:38,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 955.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 824.0, 1000.0]
2025-05-11 01:11:38,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (2849.76) for latency ExtremeClogL1U23
2025-05-11 01:11:38,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-11 01:11:38,709 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 01:11:38,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 18 minutes, 27 seconds)
2025-05-11 01:14:47,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:15:04,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2643.80615 ± 395.932
2025-05-11 01:15:04,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2956.9004, 2753.8528, 2661.9102, 2626.0886, 2225.3796, 2539.247, 2673.5513, 3117.7957, 3135.772, 1747.5638]
2025-05-11 01:15:04,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 522.0]
2025-05-11 01:15:04,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 15 minutes, 32 seconds)
2025-05-11 01:18:05,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:18:22,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2411.27637 ± 959.781
2025-05-11 01:18:22,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3186.8635, 1043.4879, 2954.9402, 3059.3965, 2854.5916, 3102.597, 3045.871, 974.7278, 838.711, 3051.5754]
2025-05-11 01:18:22,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 271.0, 1000.0]
2025-05-11 01:18:22,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 12 minutes, 24 seconds)
2025-05-11 01:21:17,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:21:30,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2131.65381 ± 1131.656
2025-05-11 01:21:30,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2979.2185, 2585.9302, 406.21207, 2806.6973, 2886.4128, 2981.2136, 393.91327, 3095.1738, 2736.42, 445.34735]
2025-05-11 01:21:30,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 783.0, 200.0, 1000.0, 1000.0, 1000.0, 173.0, 1000.0, 1000.0, 202.0]
2025-05-11 01:21:30,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 8 minutes, 36 seconds)
2025-05-11 01:24:43,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:25:00,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2655.17725 ± 915.755
2025-05-11 01:25:00,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2774.7712, 3224.4385, 3250.18, 3275.5889, 656.66895, 3380.8535, 1361.4409, 3265.855, 2072.6667, 3289.3105]
2025-05-11 01:25:00,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 236.0, 1000.0, 1000.0, 1000.0, 695.0, 1000.0]
2025-05-11 01:25:00,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 5 minutes, 3 seconds)
2025-05-11 01:27:53,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:28:09,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2617.68701 ± 1063.271
2025-05-11 01:28:09,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [206.90486, 3053.0205, 3383.5635, 2886.7869, 3579.388, 3255.1416, 1124.714, 2141.7366, 3271.68, 3273.9358]
2025-05-11 01:28:09,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [83.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 745.0, 1000.0, 1000.0]
2025-05-11 01:28:09,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 2 minutes, 13 seconds)
2025-05-11 01:31:20,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:31:37,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2748.78394 ± 948.236
2025-05-11 01:31:37,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2915.7712, 1786.0153, 3087.224, 3637.2517, 3564.9385, 2813.8516, 3158.5552, 3018.0496, 297.2014, 3208.9814]
2025-05-11 01:31:37,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 111.0, 994.0]
2025-05-11 01:31:37,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 59 minutes, 9 seconds)
2025-05-11 01:34:24,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:34:42,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 3120.72998 ± 339.941
2025-05-11 01:34:42,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3428.9312, 3188.5298, 2467.5134, 3178.5361, 2891.578, 3276.296, 3514.562, 2591.9138, 3441.8018, 3227.6414]
2025-05-11 01:34:42,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 713.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 01:34:42,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (3120.73) for latency ExtremeClogL1U23
2025-05-11 01:34:42,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-11 01:34:42,385 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 01:34:42,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 54 minutes, 17 seconds)
2025-05-11 01:37:47,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:38:03,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2642.24976 ± 866.913
2025-05-11 01:38:03,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [206.71143, 3003.3484, 3089.2783, 3254.3171, 2638.8262, 2118.9578, 3007.5767, 2993.6926, 3115.3232, 2994.467]
2025-05-11 01:38:03,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [68.0, 1000.0, 1000.0, 1000.0, 1000.0, 615.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 01:38:03,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 52 minutes, 27 seconds)
2025-05-11 01:41:01,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:41:18,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2749.64893 ± 646.528
2025-05-11 01:41:18,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2558.0386, 2963.564, 2782.4563, 2667.6711, 3291.393, 3162.6309, 3070.9133, 2785.2134, 3268.3354, 946.2728]
2025-05-11 01:41:18,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 991.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 264.0]
2025-05-11 01:41:18,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 47 minutes, 36 seconds)
2025-05-11 01:44:22,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:44:39,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2500.08179 ± 708.684
2025-05-11 01:44:39,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3057.6821, 2982.537, 1175.4475, 1062.9279, 2666.1987, 2859.694, 2825.792, 2498.3477, 3023.723, 2848.4685]
2025-05-11 01:44:39,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 366.0, 1000.0, 1000.0, 891.0, 1000.0, 1000.0, 1000.0]
2025-05-11 01:44:39,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 45 minutes, 36 seconds)
2025-05-11 01:47:48,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:48:01,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2149.80884 ± 948.318
2025-05-11 01:48:01,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3385.072, 2814.4158, 776.66455, 2415.4216, 3334.8591, 2916.4229, 1590.1389, 1222.6014, 774.83466, 2267.6565]
2025-05-11 01:48:01,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 240.0, 818.0, 988.0, 1000.0, 587.0, 450.0, 387.0, 731.0]
2025-05-11 01:48:01,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 41 minutes, 40 seconds)
2025-05-11 01:50:49,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:51:07,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2865.10229 ± 585.353
2025-05-11 01:51:07,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3434.762, 2779.0215, 3100.3862, 3360.9011, 3006.3547, 2747.2876, 2759.183, 2988.0962, 3223.7664, 1251.2665]
2025-05-11 01:51:07,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 421.0]
2025-05-11 01:51:07,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 38 minutes, 28 seconds)
2025-05-11 01:54:12,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:54:28,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2264.85083 ± 1051.000
2025-05-11 01:54:28,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1112.4109, 1402.8004, 3669.248, 662.4111, 2385.8586, 2056.2366, 3401.9717, 1327.7788, 3322.3184, 3307.4722]
2025-05-11 01:54:28,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [342.0, 1000.0, 1000.0, 1000.0, 1000.0, 659.0, 1000.0, 450.0, 1000.0, 1000.0]
2025-05-11 01:54:28,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 35 minutes, 14 seconds)
2025-05-11 01:57:19,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:57:35,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2740.77490 ± 920.311
2025-05-11 01:57:35,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3077.272, 3453.2812, 3644.1624, 2774.4036, 3537.4402, 3204.493, 625.9191, 2974.255, 1456.9675, 2659.557]
2025-05-11 01:57:35,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 222.0, 1000.0, 508.0, 989.0]
2025-05-11 01:57:35,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 31 minutes, 12 seconds)
2025-05-11 02:00:45,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:01:01,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2631.21875 ± 612.930
2025-05-11 02:01:01,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2762.3135, 2800.218, 2877.0454, 3151.203, 1187.0271, 2555.2698, 2912.045, 3131.998, 1794.6351, 3140.4346]
2025-05-11 02:01:01,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [814.0, 1000.0, 1000.0, 1000.0, 616.0, 940.0, 1000.0, 1000.0, 578.0, 1000.0]
2025-05-11 02:01:01,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 28 minutes, 19 seconds)
2025-05-11 02:04:05,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:04:19,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2456.00830 ± 988.503
2025-05-11 02:04:19,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1757.3953, 3475.395, 2575.6963, 2909.8328, 2725.0662, 1044.9426, 3108.3796, 3353.455, 411.82336, 3198.0955]
2025-05-11 02:04:19,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [516.0, 1000.0, 1000.0, 1000.0, 1000.0, 498.0, 981.0, 1000.0, 139.0, 1000.0]
2025-05-11 02:04:19,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 24 minutes, 42 seconds)
2025-05-11 02:07:08,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:07:27,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 3359.64917 ± 288.237
2025-05-11 02:07:27,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2931.0747, 3551.6729, 3053.5781, 3511.8345, 3167.825, 3178.756, 3727.9402, 3660.7195, 3095.7253, 3717.3682]
2025-05-11 02:07:27,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [837.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 02:07:27,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (3359.65) for latency ExtremeClogL1U23
2025-05-11 02:07:27,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-11 02:07:27,046 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 02:07:27,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 21 minutes, 39 seconds)
2025-05-11 02:10:39,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:10:56,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2774.27417 ± 609.781
2025-05-11 02:10:56,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1457.6091, 3276.188, 3066.688, 3306.2634, 2709.3982, 1803.9744, 3286.6506, 3149.0305, 2898.447, 2788.4934]
2025-05-11 02:10:56,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [420.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 02:10:56,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 19 minutes)
2025-05-11 02:13:42,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:13:59,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2854.99634 ± 830.166
2025-05-11 02:13:59,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1393.8472, 2717.281, 3283.3848, 3687.636, 2850.9583, 1178.965, 3359.3105, 3417.0742, 3210.3755, 3451.1313]
2025-05-11 02:13:59,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 406.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 02:13:59,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 15 minutes, 24 seconds)
2025-05-11 02:17:04,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:17:20,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2920.34277 ± 614.646
2025-05-11 02:17:20,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2525.4377, 3442.8804, 3318.6794, 3055.0786, 2747.053, 3415.5303, 3353.866, 1283.3416, 3079.3748, 2982.186]
2025-05-11 02:17:20,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 451.0, 1000.0, 1000.0]
2025-05-11 02:17:20,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 11 minutes, 50 seconds)
2025-05-11 02:20:28,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:20:43,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2347.29810 ± 872.786
2025-05-11 02:20:43,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3668.9722, 1512.657, 2820.327, 577.2264, 1636.0953, 3066.126, 1951.4835, 2815.4915, 2436.301, 2988.2998]
2025-05-11 02:20:43,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 309.0, 501.0, 1000.0, 552.0, 843.0, 1000.0, 1000.0]
2025-05-11 02:20:43,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 8 minutes, 55 seconds)
2025-05-11 02:23:47,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:24:03,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2605.86963 ± 901.808
2025-05-11 02:24:03,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3298.37, 3099.2056, 1056.2853, 3114.902, 1060.1226, 3487.389, 1747.3835, 2745.7092, 3398.6506, 3050.6792]
2025-05-11 02:24:03,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 333.0, 1000.0, 396.0, 1000.0, 601.0, 1000.0, 1000.0, 1000.0]
2025-05-11 02:24:03,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 6 minutes, 26 seconds)
2025-05-11 02:27:07,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:27:25,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2889.08594 ± 961.999
2025-05-11 02:27:25,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3452.0388, 3095.0305, 3825.3237, 3153.5989, 176.37311, 2970.5562, 2551.3093, 2958.5884, 3227.7473, 3480.2935]
2025-05-11 02:27:25,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 57.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 02:27:25,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 2 minutes, 38 seconds)
2025-05-11 02:30:36,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:30:54,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2899.17065 ± 310.011
2025-05-11 02:30:54,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2113.1953, 3184.5015, 2830.3435, 2834.2947, 2756.619, 3031.039, 3094.9502, 3318.1873, 2935.2454, 2893.3308]
2025-05-11 02:30:54,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [770.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 02:30:54,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 55 seconds)
2025-05-11 02:34:02,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:34:16,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2701.73486 ± 824.727
2025-05-11 02:34:16,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1233.6343, 3004.355, 3060.5442, 1708.7314, 3267.6038, 3474.239, 3498.4207, 3290.2244, 1493.0442, 2986.5503]
2025-05-11 02:34:16,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [391.0, 1000.0, 1000.0, 532.0, 1000.0, 1000.0, 1000.0, 1000.0, 467.0, 1000.0]
2025-05-11 02:34:16,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 57 minutes, 34 seconds)
2025-05-11 02:37:10,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:37:23,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2150.14990 ± 1111.911
2025-05-11 02:37:23,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1047.2144, 2807.9453, 3108.2522, 3145.438, 2178.5625, 621.31866, 1458.3573, 3372.4998, 3355.7378, 406.17136]
2025-05-11 02:37:23,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [468.0, 1000.0, 1000.0, 1000.0, 674.0, 321.0, 491.0, 1000.0, 1000.0, 234.0]
2025-05-11 02:37:23,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 53 minutes, 18 seconds)
2025-05-11 02:40:15,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:40:30,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2662.85498 ± 1011.417
2025-05-11 02:40:30,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3412.173, 2296.8, 3235.581, 3321.2734, 3406.6284, 3032.7341, 1321.5549, 2868.2424, 3443.7847, 289.7758]
2025-05-11 02:40:30,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 412.0, 847.0, 1000.0, 94.0]
2025-05-11 02:40:30,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 49 minutes, 20 seconds)
2025-05-11 02:43:29,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:43:45,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2664.90918 ± 977.322
2025-05-11 02:43:45,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [877.4765, 3684.2605, 858.1642, 3267.2288, 3287.2783, 2795.208, 3313.8604, 2246.5642, 2842.737, 3476.3152]
2025-05-11 02:43:45,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 293.0, 1000.0, 1000.0, 835.0, 1000.0, 769.0, 824.0, 1000.0]
2025-05-11 02:43:45,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 45 minutes, 45 seconds)
2025-05-11 02:46:48,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:47:02,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2553.65405 ± 1033.178
2025-05-11 02:47:02,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3114.058, 3320.2607, 3139.7847, 1477.5969, 1836.4315, 3285.2761, 191.28572, 3168.3218, 2345.448, 3658.078]
2025-05-11 02:47:02,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 492.0, 555.0, 1000.0, 72.0, 1000.0, 808.0, 1000.0]
2025-05-11 02:47:02,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 41 minutes, 57 seconds)
2025-05-11 02:49:59,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:50:16,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2551.26440 ± 1019.349
2025-05-11 02:50:16,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3439.7664, 3185.3408, 3471.6743, 2988.8022, 3123.9854, 2872.6523, 3351.5132, 1083.222, 1229.7456, 765.9415]
2025-05-11 02:50:16,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 326.0, 1000.0]
2025-05-11 02:50:16,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 38 minutes, 23 seconds)
2025-05-11 02:53:11,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:53:27,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2730.37817 ± 909.025
2025-05-11 02:53:27,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3543.7676, 1197.3148, 1075.61, 2977.247, 3236.5693, 3816.3787, 2997.2793, 2938.1584, 2089.7969, 3431.6592]
2025-05-11 02:53:27,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 360.0, 334.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 02:53:27,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 35 minutes, 20 seconds)
2025-05-11 02:56:32,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:56:45,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2197.29346 ± 1128.574
2025-05-11 02:56:45,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3120.6294, 3120.0837, 996.293, 578.3358, 2751.205, 2619.9927, 3435.336, 420.0209, 1557.6532, 3373.3867]
2025-05-11 02:56:45,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 281.0, 205.0, 1000.0, 1000.0, 1000.0, 122.0, 543.0, 1000.0]
2025-05-11 02:56:45,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 32 minutes, 30 seconds)
2025-05-11 02:59:39,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:59:57,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2725.52563 ± 799.589
2025-05-11 02:59:57,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2824.8079, 2257.6775, 3250.453, 3206.8079, 3324.489, 708.2391, 3410.0579, 2999.0457, 3215.7085, 2057.969]
2025-05-11 02:59:57,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 787.0, 1000.0, 1000.0]
2025-05-11 02:59:57,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 29 minutes, 10 seconds)
2025-05-11 03:02:56,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:03:12,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2622.86621 ± 1077.211
2025-05-11 03:03:12,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3385.5063, 2822.096, 1208.1973, 3495.818, 165.57684, 3304.5247, 3344.6663, 2007.958, 3109.9302, 3384.388]
2025-05-11 03:03:12,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 328.0, 1000.0, 63.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 03:03:12,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 25 minutes, 50 seconds)
2025-05-11 03:06:04,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:06:22,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2958.78760 ± 268.230
2025-05-11 03:06:22,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3024.9976, 3203.2998, 3133.7847, 3068.311, 2740.3694, 2310.369, 2918.5334, 2926.5293, 3334.1445, 2927.5361]
2025-05-11 03:06:22,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 03:06:22,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 22 minutes, 32 seconds)
2025-05-11 03:09:29,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:09:45,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2579.27661 ± 962.441
2025-05-11 03:09:45,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3013.504, 3540.5837, 2198.0598, 2760.7683, 3410.3943, 3241.0918, 3258.3438, 1289.0306, 409.2901, 2671.697]
2025-05-11 03:09:45,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 905.0, 1000.0, 1000.0, 938.0, 1000.0, 117.0, 1000.0]
2025-05-11 03:09:45,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 19 minutes, 34 seconds)
2025-05-11 03:12:48,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:13:03,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2624.94092 ± 822.166
2025-05-11 03:13:03,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3025.4514, 3056.4368, 3153.0603, 2706.1694, 3453.384, 2785.455, 834.34076, 2698.702, 1280.9944, 3255.4146]
2025-05-11 03:13:03,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [956.0, 1000.0, 1000.0, 789.0, 1000.0, 1000.0, 294.0, 1000.0, 460.0, 1000.0]
2025-05-11 03:13:03,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 16 minutes, 18 seconds)
2025-05-11 03:16:03,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:16:18,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2575.21191 ± 1140.198
2025-05-11 03:16:18,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [505.35886, 3519.9307, 3131.7844, 3144.239, 2195.049, 3201.5913, 3012.5977, 3361.0344, 3383.6682, 296.8644]
2025-05-11 03:16:18,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [163.0, 1000.0, 920.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 93.0]
2025-05-11 03:16:18,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 13 minutes, 4 seconds)
2025-05-11 03:19:10,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:19:26,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2673.06445 ± 901.872
2025-05-11 03:19:26,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3151.555, 3322.331, 3365.352, 858.21375, 3071.4954, 2807.5728, 939.842, 3249.2268, 3036.7961, 2928.2598]
2025-05-11 03:19:26,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 227.0, 1000.0, 1000.0, 326.0, 1000.0, 1000.0, 1000.0]
2025-05-11 03:19:26,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 9 minutes, 44 seconds)
2025-05-11 03:22:39,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:22:57,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2849.27661 ± 762.153
2025-05-11 03:22:57,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3000.9663, 2791.3933, 3229.6858, 2967.208, 2261.0469, 3588.9802, 3073.9048, 3448.2634, 3309.294, 822.023]
2025-05-11 03:22:57,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [869.0, 1000.0, 1000.0, 978.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 03:22:57,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 37 seconds)
2025-05-11 03:25:58,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:26:16,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2360.78662 ± 1111.534
2025-05-11 03:26:16,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2814.416, 3397.7988, 2651.4922, 901.80304, 3119.5078, 3203.49, 3530.1096, 505.2542, 2741.7073, 742.28815]
2025-05-11 03:26:16,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 840.0, 1000.0, 1000.0, 1000.0, 1000.0, 146.0, 1000.0, 1000.0]
2025-05-11 03:26:16,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 18 seconds)
2025-05-11 03:29:23,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:29:40,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2707.08325 ± 491.109
2025-05-11 03:29:40,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1950.2252, 2940.8179, 2551.3457, 1651.8774, 2759.9634, 2841.631, 3214.7356, 3108.2505, 2982.704, 3069.2822]
2025-05-11 03:29:40,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [568.0, 1000.0, 1000.0, 651.0, 1000.0, 1000.0, 1000.0, 1000.0, 845.0, 1000.0]
2025-05-11 03:29:40,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1251 [DEBUG]: Training session finished
