2025-05-07 11:23:09,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mem4
2025-05-07 11:23:09,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mem4
2025-05-07 11:23:09,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1008 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7af079dcea90>}
2025-05-07 11:23:09,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1009 [DEBUG]: using device: cpu
2025-05-07 11:23:09,986 baseline-bpql-noisy-ant:77 [WARNING]: args.assumed_delay != args.horizon: 4 != 32
2025-05-07 11:23:09,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1031 [INFO]: Creating new trainer
2025-05-07 11:23:09,992 baseline-bpql-noisy-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=59, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-05-07 11:23:09,992 baseline-bpql-noisy-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-07 11:23:10,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1092 [DEBUG]: Starting training session...
2025-05-07 11:23:10,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 1/100
2025-05-07 11:25:19,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:25:26,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: -817.87476 ± 761.896
2025-05-07 11:25:26,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [-151.87497, -1596.195, -1535.2511, -42.868004, -1594.976, -1572.5942, -29.864532, -1595.1858, -27.949713, -31.988209]
2025-05-07 11:25:26,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [116.0, 1000.0, 1000.0, 45.0, 1000.0, 1000.0, 25.0, 1000.0, 25.0, 25.0]
2025-05-07 11:25:26,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (-817.87) for latency ExtremeSparseL4U32
2025-05-07 11:25:26,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-07 11:25:26,588 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 11:25:26,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 45 minutes, 3 seconds)
2025-05-07 11:27:26,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:27:28,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: -71.96792 ± 103.727
2025-05-07 11:27:28,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [-29.190836, 28.273058, -51.56184, 5.4722157, -354.30652, -103.215614, -32.51747, -86.92963, -101.07556, 5.3730145]
2025-05-07 11:27:28,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [69.0, 120.0, 118.0, 56.0, 1000.0, 183.0, 99.0, 250.0, 229.0, 33.0]
2025-05-07 11:27:28,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (-71.97) for latency ExtremeSparseL4U32
2025-05-07 11:27:28,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-07 11:27:28,736 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 11:27:28,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 31 minutes, 8 seconds)
2025-05-07 11:30:25,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:30:29,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 3.90106 ± 25.295
2025-05-07 11:30:29,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [6.8123093, 49.314964, 24.31574, 14.617805, 19.146051, -5.8097568, -2.3691292, -46.79835, 5.1663795, -25.385374]
2025-05-07 11:30:29,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [73.0, 146.0, 159.0, 79.0, 105.0, 354.0, 137.0, 967.0, 29.0, 199.0]
2025-05-07 11:30:29,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (3.90) for latency ExtremeSparseL4U32
2025-05-07 11:30:29,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-07 11:30:29,651 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 11:30:29,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 56 minutes, 49 seconds)
2025-05-07 11:33:37,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:33:52,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 394.42139 ± 147.763
2025-05-07 11:33:52,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [423.02057, 499.36884, 157.11778, 410.31488, 487.41293, 384.27734, 411.20074, 563.74677, 522.7867, 84.96723]
2025-05-07 11:33:52,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 232.0, 1000.0, 1000.0, 761.0, 1000.0, 1000.0, 1000.0, 226.0]
2025-05-07 11:33:52,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (394.42) for latency ExtremeSparseL4U32
2025-05-07 11:33:52,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-07 11:33:52,310 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 11:33:52,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 16 minutes, 50 seconds)
2025-05-07 11:36:49,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:37:00,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 329.79953 ± 227.892
2025-05-07 11:37:00,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [33.239723, 556.9108, 89.612526, -2.067321, 467.16986, 213.47702, 650.67017, 493.46295, 540.26245, 255.25722]
2025-05-07 11:37:00,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [47.0, 1000.0, 101.0, 15.0, 1000.0, 313.0, 1000.0, 1000.0, 1000.0, 605.0]
2025-05-07 11:37:00,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 23 minutes, 4 seconds)
2025-05-07 11:39:43,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:39:53,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 411.91904 ± 272.358
2025-05-07 11:39:53,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [523.75946, 646.19775, 732.8213, 771.7579, 459.09085, 590.8268, 76.756035, 172.44156, 135.90157, 9.637313]
2025-05-07 11:39:53,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [625.0, 1000.0, 1000.0, 1000.0, 661.0, 709.0, 89.0, 273.0, 185.0, 15.0]
2025-05-07 11:39:53,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (411.92) for latency ExtremeSparseL4U32
2025-05-07 11:39:53,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-07 11:39:53,260 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 11:39:53,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 31 minutes, 33 seconds)
2025-05-07 11:43:03,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:43:12,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 365.99408 ± 204.226
2025-05-07 11:43:12,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [68.881454, 333.0995, 46.572964, 252.2818, 557.9649, 720.57715, 355.73413, 474.6554, 550.1562, 300.01706]
2025-05-07 11:43:12,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [42.0, 363.0, 73.0, 345.0, 1000.0, 1000.0, 553.0, 583.0, 706.0, 343.0]
2025-05-07 11:43:12,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 52 minutes, 31 seconds)
2025-05-07 11:46:00,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:46:15,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 613.18494 ± 202.680
2025-05-07 11:46:15,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [174.22061, 498.0956, 806.08856, 833.92645, 428.2892, 667.1253, 803.9428, 771.60126, 481.24527, 667.3139]
2025-05-07 11:46:15,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [166.0, 499.0, 1000.0, 1000.0, 653.0, 1000.0, 1000.0, 1000.0, 516.0, 1000.0]
2025-05-07 11:46:15,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (613.18) for latency ExtremeSparseL4U32
2025-05-07 11:46:15,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-07 11:46:15,472 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 11:46:15,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 50 minutes, 3 seconds)
2025-05-07 11:49:16,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:49:27,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 493.07245 ± 261.848
2025-05-07 11:49:27,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [677.9001, 825.8188, 653.4878, 643.0124, 422.2807, 178.98355, 65.16346, 117.973854, 685.14435, 660.9594]
2025-05-07 11:49:27,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 596.0, 412.0, 163.0, 73.0, 96.0, 647.0, 1000.0]
2025-05-07 11:49:27,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 43 minutes, 34 seconds)
2025-05-07 11:52:18,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:52:26,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 459.69302 ± 275.046
2025-05-07 11:52:26,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [850.4545, 588.8273, 77.74837, 404.54926, 857.1551, 194.74486, 373.3378, 307.3649, 770.6241, 172.12418]
2025-05-07 11:52:26,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 559.0, 74.0, 257.0, 1000.0, 139.0, 275.0, 282.0, 1000.0, 151.0]
2025-05-07 11:52:26,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 37 minutes, 48 seconds)
2025-05-07 11:55:29,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:55:32,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 215.79118 ± 263.914
2025-05-07 11:55:32,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [84.91662, 871.188, 44.44859, 143.93987, 479.63123, 79.35223, 49.688034, 3.1796806, 36.71297, 364.85468]
2025-05-07 11:55:32,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [79.0, 1000.0, 39.0, 166.0, 301.0, 59.0, 55.0, 27.0, 24.0, 303.0]
2025-05-07 11:55:32,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 38 minutes, 39 seconds)
2025-05-07 11:58:24,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:58:32,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 477.65103 ± 338.495
2025-05-07 11:58:32,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [226.77373, 744.70197, 672.483, 955.00946, 850.35156, 76.524574, 93.23958, 806.8086, 116.34186, 234.27573]
2025-05-07 11:58:32,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [178.0, 1000.0, 549.0, 668.0, 592.0, 53.0, 158.0, 1000.0, 107.0, 141.0]
2025-05-07 11:58:32,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 30 minutes)
2025-05-07 12:01:35,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:01:40,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 421.22379 ± 493.417
2025-05-07 12:01:40,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [80.054886, 1176.7257, 1497.9049, 136.81468, 322.32938, 86.859024, 640.5522, 154.15477, 97.35025, 19.492386]
2025-05-07 12:01:40,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [95.0, 912.0, 1000.0, 104.0, 232.0, 61.0, 398.0, 91.0, 67.0, 17.0]
2025-05-07 12:01:40,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 28 minutes, 21 seconds)
2025-05-07 12:04:43,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:04:52,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 582.57996 ± 352.195
2025-05-07 12:04:52,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1070.1478, 102.138054, 17.647438, 291.26935, 768.7899, 763.2305, 926.78217, 894.86334, 309.22543, 681.70557]
2025-05-07 12:04:52,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 83.0, 17.0, 205.0, 618.0, 481.0, 1000.0, 562.0, 212.0, 1000.0]
2025-05-07 12:04:52,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 25 minutes, 21 seconds)
2025-05-07 12:07:39,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:07:44,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 370.98383 ± 258.971
2025-05-07 12:07:44,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [224.72542, 486.51123, 580.7397, 699.64343, 580.4041, 740.1411, 36.200268, 168.33633, 93.383446, 99.753494]
2025-05-07 12:07:44,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [148.0, 353.0, 382.0, 411.0, 404.0, 1000.0, 28.0, 130.0, 51.0, 54.0]
2025-05-07 12:07:44,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 20 minutes)
2025-05-07 12:10:40,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:10:51,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 579.90930 ± 400.206
2025-05-07 12:10:51,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [49.08106, 802.0235, 68.99188, 868.2299, 1098.0416, 677.5291, 190.92442, 205.64548, 667.7099, 1170.9159]
2025-05-07 12:10:51,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [27.0, 1000.0, 56.0, 1000.0, 1000.0, 1000.0, 131.0, 166.0, 550.0, 801.0]
2025-05-07 12:10:51,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 17 minutes, 23 seconds)
2025-05-07 12:13:52,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:14:05,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 799.81775 ± 439.467
2025-05-07 12:14:05,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [188.48909, 1314.226, 1188.0797, 1583.935, 345.64822, 566.5921, 352.2771, 783.81305, 1015.4976, 659.61975]
2025-05-07 12:14:05,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [95.0, 1000.0, 1000.0, 1000.0, 243.0, 344.0, 308.0, 1000.0, 1000.0, 443.0]
2025-05-07 12:14:05,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (799.82) for latency ExtremeSparseL4U32
2025-05-07 12:14:05,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-07 12:14:05,066 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 12:14:05,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 17 minutes, 54 seconds)
2025-05-07 12:16:56,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:17:13,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1007.80450 ± 390.018
2025-05-07 12:17:13,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [699.8437, 921.69476, 1535.3496, 678.6622, 1825.3203, 617.9814, 950.1077, 803.4267, 734.38214, 1311.2766]
2025-05-07 12:17:13,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [424.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 490.0, 1000.0, 1000.0, 1000.0]
2025-05-07 12:17:13,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (1007.80) for latency ExtremeSparseL4U32
2025-05-07 12:17:13,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-07 12:17:13,792 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 12:17:13,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 15 minutes)
2025-05-07 12:20:12,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:20:31,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1506.37451 ± 357.491
2025-05-07 12:20:31,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1528.6394, 1788.6309, 1733.5084, 1344.9994, 1606.8705, 1757.1843, 1806.2435, 690.45233, 1035.8965, 1771.319]
2025-05-07 12:20:31,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 12:20:31,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (1506.37) for latency ExtremeSparseL4U32
2025-05-07 12:20:31,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-07 12:20:31,621 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 12:20:31,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 13 minutes, 28 seconds)
2025-05-07 12:23:19,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:23:33,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1178.15503 ± 482.688
2025-05-07 12:23:33,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1643.3722, 1339.2382, 1603.4523, 754.3408, 796.92365, 754.7457, 1432.2852, 1730.1849, 1509.6725, 217.33513]
2025-05-07 12:23:33,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 512.0, 517.0, 447.0, 893.0, 1000.0, 1000.0, 98.0]
2025-05-07 12:23:33,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 13 minutes, 1 second)
2025-05-07 12:26:33,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:26:49,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1336.11987 ± 532.682
2025-05-07 12:26:49,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [144.93661, 1699.3119, 1570.4568, 862.4955, 1939.695, 1979.1907, 919.61566, 1498.227, 1489.8845, 1257.3834]
2025-05-07 12:26:49,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [108.0, 1000.0, 1000.0, 570.0, 1000.0, 1000.0, 1000.0, 828.0, 1000.0, 782.0]
2025-05-07 12:26:49,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 12 minutes, 4 seconds)
2025-05-07 12:29:39,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:29:54,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1343.34863 ± 556.735
2025-05-07 12:29:54,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [612.8487, 1682.4921, 1457.2249, 1932.6973, 1473.4069, 905.38354, 1799.0659, 158.25653, 1648.4808, 1763.6302]
2025-05-07 12:29:54,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [376.0, 1000.0, 1000.0, 1000.0, 1000.0, 527.0, 1000.0, 108.0, 1000.0, 1000.0]
2025-05-07 12:29:54,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 6 minutes, 44 seconds)
2025-05-07 12:32:42,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:32:53,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 804.13757 ± 564.829
2025-05-07 12:32:53,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1467.1066, 52.018585, 86.989075, 153.76958, 1743.8947, 780.63416, 1386.3767, 669.61554, 813.5276, 887.4429]
2025-05-07 12:32:53,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [793.0, 24.0, 53.0, 80.0, 1000.0, 393.0, 1000.0, 1000.0, 456.0, 468.0]
2025-05-07 12:32:53,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 1 minute, 3 seconds)
2025-05-07 12:35:53,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:36:03,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 989.56952 ± 422.713
2025-05-07 12:36:03,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [748.24506, 696.36957, 215.85735, 1690.2527, 1345.9469, 721.0617, 658.0337, 1309.5134, 1271.9314, 1238.4839]
2025-05-07 12:36:03,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [394.0, 341.0, 136.0, 886.0, 1000.0, 300.0, 355.0, 734.0, 573.0, 729.0]
2025-05-07 12:36:03,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 56 minutes, 1 second)
2025-05-07 12:39:06,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:39:21,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1290.26819 ± 559.468
2025-05-07 12:39:21,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [807.7521, 629.7076, 1797.2279, 2003.7305, 1572.8334, 953.0639, 1841.961, 1935.8926, 814.1846, 546.3284]
2025-05-07 12:39:21,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 550.0, 1000.0, 1000.0, 373.0, 332.0]
2025-05-07 12:39:21,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 57 minutes, 2 seconds)
2025-05-07 12:42:13,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:42:23,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1033.02246 ± 606.974
2025-05-07 12:42:23,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1659.4352, 910.4269, 349.53482, 750.61475, 1630.1398, 909.34674, 1632.483, 494.3436, 73.40595, 1920.4934]
2025-05-07 12:42:23,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 451.0, 203.0, 434.0, 1000.0, 437.0, 1000.0, 309.0, 47.0, 954.0]
2025-05-07 12:42:23,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 50 minutes, 22 seconds)
2025-05-07 12:45:15,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:45:28,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1061.01282 ± 671.457
2025-05-07 12:45:28,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1113.1805, 1870.6619, 1707.3965, 1710.3906, 844.11786, 415.55014, 794.0788, 1905.8726, 186.6679, 62.211308]
2025-05-07 12:45:28,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 599.0, 211.0, 1000.0, 1000.0, 77.0, 58.0]
2025-05-07 12:45:28,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 47 minutes, 16 seconds)
2025-05-07 12:48:18,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:48:35,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1791.03772 ± 552.349
2025-05-07 12:48:35,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2232.5474, 1285.213, 1973.2993, 1485.7412, 1993.6304, 1937.4337, 438.96533, 2317.7744, 2329.5366, 1916.2367]
2025-05-07 12:48:35,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 734.0, 1000.0, 808.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 12:48:35,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (1791.04) for latency ExtremeSparseL4U32
2025-05-07 12:48:35,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-07 12:48:35,980 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 12:48:35,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 46 minutes, 18 seconds)
2025-05-07 12:51:35,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:51:45,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1308.00403 ± 875.083
2025-05-07 12:51:45,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [54.04783, 1888.5813, 2102.3792, 2384.4316, 2172.805, 897.035, 2201.1748, 226.78337, 529.248, 623.5549]
2025-05-07 12:51:45,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [31.0, 1000.0, 1000.0, 1000.0, 1000.0, 461.0, 1000.0, 135.0, 244.0, 337.0]
2025-05-07 12:51:45,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 42 minutes, 59 seconds)
2025-05-07 12:54:44,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:55:00,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1580.42004 ± 712.011
2025-05-07 12:55:00,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [798.0638, 669.8954, 241.1266, 2249.9075, 2304.4487, 1803.1588, 1674.0259, 1742.9154, 1931.2573, 2389.4016]
2025-05-07 12:55:00,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [430.0, 1000.0, 120.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 12:55:00,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 39 minutes)
2025-05-07 12:57:46,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:58:03,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1786.03052 ± 517.435
2025-05-07 12:58:03,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [375.62042, 2104.3953, 1735.4508, 1680.4475, 1635.6721, 2279.9934, 2026.5992, 2263.3477, 1920.1927, 1838.5859]
2025-05-07 12:58:03,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [290.0, 1000.0, 897.0, 957.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 12:58:03,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 36 minutes, 14 seconds)
2025-05-07 13:00:48,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:01:02,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1660.95374 ± 624.614
2025-05-07 13:01:02,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1157.4326, 2348.9514, 869.30756, 2126.274, 1685.4973, 2382.8865, 983.9978, 2252.8767, 743.1632, 2059.1487]
2025-05-07 13:01:02,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [542.0, 1000.0, 424.0, 1000.0, 730.0, 1000.0, 422.0, 1000.0, 1000.0, 860.0]
2025-05-07 13:01:02,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 31 minutes, 51 seconds)
2025-05-07 13:04:10,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:04:23,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1473.40405 ± 520.651
2025-05-07 13:04:23,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1668.8826, 1309.6326, 2189.1973, 2164.37, 1190.7878, 1013.8498, 1968.7733, 959.59845, 1693.8297, 575.1192]
2025-05-07 13:04:23,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [698.0, 668.0, 1000.0, 944.0, 488.0, 1000.0, 1000.0, 451.0, 1000.0, 250.0]
2025-05-07 13:04:23,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 31 minutes, 42 seconds)
2025-05-07 13:07:11,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:07:30,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1870.62659 ± 466.166
2025-05-07 13:07:30,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2200.9138, 2089.1619, 2285.2869, 1903.2959, 1682.0675, 711.88837, 2171.7798, 1389.293, 2163.5874, 2108.992]
2025-05-07 13:07:30,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 777.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:07:30,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (1870.63) for latency ExtremeSparseL4U32
2025-05-07 13:07:30,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-07 13:07:30,598 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 13:07:30,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 27 minutes, 55 seconds)
2025-05-07 13:10:19,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:10:35,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1898.83325 ± 609.035
2025-05-07 13:10:35,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2309.5166, 548.7178, 2248.526, 2235.9888, 2315.8354, 909.01733, 2342.2295, 2242.444, 1838.1108, 1997.9453]
2025-05-07 13:10:35,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 256.0, 949.0, 1000.0, 1000.0, 509.0, 1000.0, 1000.0, 1000.0, 919.0]
2025-05-07 13:10:35,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (1898.83) for latency ExtremeSparseL4U32
2025-05-07 13:10:35,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-07 13:10:35,925 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 13:10:35,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 22 minutes, 43 seconds)
2025-05-07 13:13:30,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:13:47,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2090.30127 ± 529.255
2025-05-07 13:13:47,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1518.8976, 2448.0034, 2553.3794, 2140.3882, 2079.1287, 2267.7046, 1919.187, 2625.7585, 827.45325, 2523.1138]
2025-05-07 13:13:47,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [551.0, 1000.0, 1000.0, 1000.0, 825.0, 1000.0, 785.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:13:47,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (2090.30) for latency ExtremeSparseL4U32
2025-05-07 13:13:47,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-07 13:13:47,761 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 13:13:47,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 21 minutes, 30 seconds)
2025-05-07 13:16:38,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:16:56,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2069.79810 ± 307.807
2025-05-07 13:16:56,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2451.4238, 2198.8481, 1705.4553, 1611.1938, 1854.2094, 2597.512, 2189.3289, 2132.074, 2182.0286, 1775.9059]
2025-05-07 13:16:56,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 734.0, 774.0, 1000.0, 1000.0, 1000.0, 1000.0, 738.0]
2025-05-07 13:16:56,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 20 minutes, 11 seconds)
2025-05-07 13:19:57,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:20:12,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1926.30396 ± 621.257
2025-05-07 13:20:12,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2437.1025, 1440.395, 2293.5786, 973.9016, 680.8678, 2480.9392, 2300.9177, 2202.9915, 2045.691, 2406.655]
2025-05-07 13:20:12,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 626.0, 1000.0, 401.0, 298.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:20:12,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 16 minutes, 5 seconds)
2025-05-07 13:23:01,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:23:15,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1825.46655 ± 695.900
2025-05-07 13:23:15,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1915.0436, 2449.377, 1868.6407, 489.50214, 2230.4067, 2398.157, 2201.6545, 1135.9376, 2662.2588, 903.68964]
2025-05-07 13:23:15,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [810.0, 1000.0, 1000.0, 274.0, 1000.0, 1000.0, 1000.0, 405.0, 1000.0, 408.0]
2025-05-07 13:23:15,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 12 minutes, 9 seconds)
2025-05-07 13:26:09,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:26:25,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1602.05249 ± 873.763
2025-05-07 13:26:25,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2227.3098, 161.361, 2395.1677, 2046.9136, 2279.93, 915.5489, 2576.3674, 1426.9249, 75.02019, 1915.9803]
2025-05-07 13:26:25,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 73.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 45.0, 1000.0]
2025-05-07 13:26:25,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 9 minutes, 50 seconds)
2025-05-07 13:29:11,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:29:28,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1917.97791 ± 878.924
2025-05-07 13:29:28,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2499.452, 2538.3394, 2418.059, 2429.6606, 2844.8098, 2211.0964, 2312.056, 928.58636, 61.089333, 936.63025]
2025-05-07 13:29:28,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 878.0, 1000.0, 32.0, 1000.0]
2025-05-07 13:29:28,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 4 minutes, 59 seconds)
2025-05-07 13:32:34,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:32:47,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1493.86499 ± 748.388
2025-05-07 13:32:47,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2217.3245, 2715.5825, 1787.486, 461.69977, 2233.5486, 1852.2148, 1133.6373, 630.3851, 567.28937, 1339.4823]
2025-05-07 13:32:47,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 189.0, 1000.0, 820.0, 495.0, 1000.0, 198.0, 555.0]
2025-05-07 13:32:47,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 3 minutes, 56 seconds)
2025-05-07 13:35:35,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:35:53,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2364.29297 ± 366.960
2025-05-07 13:35:53,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2528.4392, 2497.5264, 2278.247, 2301.8328, 1411.3381, 2619.7922, 2593.285, 2078.8198, 2656.526, 2677.122]
2025-05-07 13:35:53,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 602.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:35:53,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (2364.29) for latency ExtremeSparseL4U32
2025-05-07 13:35:53,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-07 13:35:53,022 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 13:35:53,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 58 minutes, 39 seconds)
2025-05-07 13:38:51,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:39:08,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2359.11670 ± 380.259
2025-05-07 13:39:08,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1839.6208, 2440.7166, 1696.6295, 1841.1305, 2597.556, 2656.055, 2497.1382, 2671.156, 2659.9001, 2691.2644]
2025-05-07 13:39:08,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [776.0, 1000.0, 667.0, 708.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:39:08,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 57 minutes, 46 seconds)
2025-05-07 13:42:01,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:42:19,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2052.08887 ± 590.595
2025-05-07 13:42:19,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2169.5027, 2205.602, 2300.6797, 1893.1805, 2527.122, 2733.6294, 814.8884, 2499.8206, 1106.1848, 2270.278]
2025-05-07 13:42:19,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 509.0, 1000.0]
2025-05-07 13:42:19,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 54 minutes, 55 seconds)
2025-05-07 13:45:05,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:45:25,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2082.41870 ± 687.423
2025-05-07 13:45:25,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2338.7178, 2516.7002, 703.1731, 2841.6377, 785.59985, 2407.5361, 2303.703, 2277.8223, 2341.021, 2308.2778]
2025-05-07 13:45:25,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:45:25,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 52 minutes, 11 seconds)
2025-05-07 13:48:15,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:48:32,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2133.03296 ± 664.657
2025-05-07 13:48:32,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [863.8281, 2755.0774, 2682.8086, 2490.6333, 1640.3733, 2352.7288, 2305.9827, 2553.6384, 2656.0393, 1029.2206]
2025-05-07 13:48:32,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [314.0, 1000.0, 1000.0, 1000.0, 668.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:48:32,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 46 minutes, 52 seconds)
2025-05-07 13:51:30,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:51:39,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1368.20471 ± 1043.228
2025-05-07 13:51:39,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [619.71967, 2707.597, 715.0603, 117.119125, 391.23834, 2445.939, 2667.2427, 2318.2424, 45.582375, 1654.3062]
2025-05-07 13:51:39,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [221.0, 1000.0, 296.0, 50.0, 137.0, 1000.0, 1000.0, 1000.0, 34.0, 644.0]
2025-05-07 13:51:39,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 44 minutes, 7 seconds)
2025-05-07 13:54:40,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:54:57,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2198.53394 ± 838.544
2025-05-07 13:54:57,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2292.6802, 2634.148, 2425.1892, 188.1304, 2843.913, 1070.237, 2843.1157, 2261.8923, 2938.179, 2487.8564]
2025-05-07 13:54:57,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 87.0, 1000.0, 1000.0, 1000.0, 813.0, 1000.0, 912.0]
2025-05-07 13:54:57,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 41 minutes, 22 seconds)
2025-05-07 13:57:43,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:57:59,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1753.38184 ± 790.245
2025-05-07 13:57:59,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1546.0201, 2918.9238, 2014.3629, 1345.6638, 2288.5706, 975.5178, 643.2069, 2715.2937, 685.4078, 2400.8513]
2025-05-07 13:57:59,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [571.0, 1000.0, 756.0, 599.0, 1000.0, 433.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:57:59,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 36 minutes, 41 seconds)
2025-05-07 14:00:46,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:01:02,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2302.04834 ± 721.822
2025-05-07 14:01:02,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2830.745, 461.92395, 2789.8496, 2642.4922, 1499.7734, 2897.2769, 2703.1865, 2349.6487, 2439.5935, 2405.995]
2025-05-07 14:01:02,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 157.0, 1000.0, 1000.0, 543.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:01:02,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 33 minutes, 9 seconds)
2025-05-07 14:04:00,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:04:17,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2203.76465 ± 758.698
2025-05-07 14:04:17,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [689.5728, 2578.5735, 2279.42, 2955.656, 2732.7158, 2763.336, 2646.3162, 2675.9778, 964.55457, 1751.5236]
2025-05-07 14:04:17,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [258.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 705.0]
2025-05-07 14:04:17,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 31 minutes, 15 seconds)
2025-05-07 14:07:09,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:07:25,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2102.09570 ± 792.972
2025-05-07 14:07:25,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2360.774, 2649.2593, 2503.9465, 2662.4846, 2573.8662, 2501.9927, 133.58669, 2338.3306, 2248.8367, 1047.8782]
2025-05-07 14:07:25,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 86.0, 1000.0, 1000.0, 433.0]
2025-05-07 14:07:25,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 28 minutes, 6 seconds)
2025-05-07 14:10:33,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:10:50,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2371.85107 ± 400.684
2025-05-07 14:10:50,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2042.8484, 1664.7031, 2962.2568, 2764.7654, 1830.5244, 2499.3555, 2626.4905, 2213.9531, 2441.1934, 2672.4187]
2025-05-07 14:10:50,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 589.0, 1000.0, 1000.0, 739.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:10:50,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (2371.85) for latency ExtremeSparseL4U32
2025-05-07 14:10:50,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-07 14:10:50,557 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 14:10:50,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 26 minutes, 9 seconds)
2025-05-07 14:13:31,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:13:50,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2171.31299 ± 587.195
2025-05-07 14:13:50,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2011.0656, 1781.565, 2443.9006, 2246.1118, 2828.9395, 2116.162, 663.5, 2339.1685, 2565.812, 2716.907]
2025-05-07 14:13:50,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 800.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:13:50,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 22 minutes, 36 seconds)
2025-05-07 14:16:53,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:17:07,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1806.30981 ± 955.554
2025-05-07 14:17:07,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2598.0107, 2617.944, 2016.312, 2269.0476, 1165.3442, 975.509, 2703.6956, 652.0948, 83.67029, 2981.47]
2025-05-07 14:17:07,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 839.0, 1000.0, 341.0, 1000.0, 227.0, 46.0, 1000.0]
2025-05-07 14:17:07,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 21 minutes, 32 seconds)
2025-05-07 14:19:55,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:20:14,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2414.67212 ± 781.147
2025-05-07 14:20:14,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1172.1184, 3222.7603, 2450.0378, 739.8593, 2949.0007, 2577.816, 2377.4514, 3149.2075, 2667.4792, 2840.9897]
2025-05-07 14:20:14,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:20:14,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (2414.67) for latency ExtremeSparseL4U32
2025-05-07 14:20:14,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-07 14:20:14,842 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 14:20:14,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 17 minutes, 14 seconds)
2025-05-07 14:23:17,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:23:31,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1878.25256 ± 790.175
2025-05-07 14:23:31,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2310.641, 1061.1627, 2727.7732, 1096.9011, 718.1042, 1282.9297, 2644.0515, 1454.0988, 2527.5693, 2959.2932]
2025-05-07 14:23:31,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 451.0, 1000.0, 411.0, 312.0, 446.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:23:31,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 15 minutes, 16 seconds)
2025-05-07 14:26:12,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:26:27,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2074.84204 ± 820.175
2025-05-07 14:26:27,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [909.2353, 2305.9824, 1410.967, 2725.5444, 2681.6184, 1125.8251, 2976.8896, 991.47644, 3047.3647, 2573.516]
2025-05-07 14:26:27,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [289.0, 1000.0, 545.0, 1000.0, 1000.0, 1000.0, 1000.0, 337.0, 1000.0, 1000.0]
2025-05-07 14:26:27,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 8 minutes, 2 seconds)
2025-05-07 14:29:39,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:29:56,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2096.60327 ± 798.049
2025-05-07 14:29:56,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [940.4321, 2653.7244, 2733.336, 2747.4443, 2432.4214, 291.41058, 2303.8264, 2351.82, 2681.4973, 1830.1206]
2025-05-07 14:29:56,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 127.0, 867.0, 997.0, 1000.0, 1000.0]
2025-05-07 14:29:56,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 8 minutes, 49 seconds)
2025-05-07 14:32:36,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:32:52,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1902.54260 ± 857.579
2025-05-07 14:32:52,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1529.2375, 2166.6646, 1776.2242, 2783.7998, 718.1325, 332.58823, 1580.8417, 2258.6438, 3167.3228, 2711.9707]
2025-05-07 14:32:52,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 125.0, 648.0, 920.0, 1000.0, 1000.0]
2025-05-07 14:32:52,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 2 minutes, 48 seconds)
2025-05-07 14:36:05,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:36:23,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2571.52686 ± 355.080
2025-05-07 14:36:23,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2361.0247, 2885.6475, 2560.1423, 1637.5017, 2780.1768, 2787.9563, 2753.7769, 2668.4749, 2867.065, 2413.5037]
2025-05-07 14:36:23,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 972.0, 595.0, 1000.0, 1000.0, 1000.0, 955.0, 1000.0, 1000.0]
2025-05-07 14:36:23,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (2571.53) for latency ExtremeSparseL4U32
2025-05-07 14:36:23,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-07 14:36:23,173 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 14:36:23,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 2 minutes, 39 seconds)
2025-05-07 14:39:16,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:39:33,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2378.48315 ± 729.847
2025-05-07 14:39:33,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2241.544, 3043.876, 2220.7517, 393.79367, 2700.38, 2970.824, 2899.7327, 2602.1726, 2106.7292, 2605.029]
2025-05-07 14:39:33,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 171.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:39:33,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 58 minutes, 39 seconds)
2025-05-07 14:42:28,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:42:41,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2055.85156 ± 1119.013
2025-05-07 14:42:41,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1061.5212, 25.3462, 2975.1218, 2911.767, 1049.0238, 764.1909, 2913.0461, 2988.755, 2940.683, 2929.0598]
2025-05-07 14:42:41,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [355.0, 22.0, 1000.0, 1000.0, 337.0, 263.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:42:41,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 56 minutes, 50 seconds)
2025-05-07 14:45:33,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:45:49,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2542.40625 ± 780.169
2025-05-07 14:45:49,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2623.44, 2630.7192, 460.64932, 3084.73, 3093.3162, 2947.4792, 2165.1614, 2985.0051, 2179.2717, 3254.2903]
2025-05-07 14:45:49,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 147.0, 1000.0, 1000.0, 1000.0, 693.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:45:49,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 51 minutes, 14 seconds)
2025-05-07 14:48:48,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:49:01,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1754.90100 ± 950.283
2025-05-07 14:49:01,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2522.325, 2153.0923, 1316.8474, 1217.6537, 2468.8818, 2551.2476, 1416.2816, 61.13597, 3266.1133, 575.432]
2025-05-07 14:49:01,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 462.0, 452.0, 1000.0, 1000.0, 1000.0, 35.0, 1000.0, 207.0]
2025-05-07 14:49:01,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 49 minutes, 48 seconds)
2025-05-07 14:51:52,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:52:08,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2252.05322 ± 757.418
2025-05-07 14:52:08,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1132.8866, 1229.7749, 2205.5186, 2596.9585, 3137.8215, 1654.1702, 1650.593, 2568.2961, 3274.1301, 3070.3806]
2025-05-07 14:52:08,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 427.0, 1000.0, 1000.0, 1000.0, 1000.0, 565.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:52:08,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 43 minutes, 59 seconds)
2025-05-07 14:55:01,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:55:12,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1589.52344 ± 946.892
2025-05-07 14:55:12,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2282.8228, 1406.0862, 960.076, 1097.3486, 660.4421, 490.82083, 2982.396, 2399.8457, 3042.8179, 572.57855]
2025-05-07 14:55:12,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 308.0, 383.0, 214.0, 205.0, 1000.0, 831.0, 1000.0, 365.0]
2025-05-07 14:55:12,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 40 minutes, 7 seconds)
2025-05-07 14:58:07,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:58:22,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2257.95825 ± 1045.628
2025-05-07 14:58:22,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [236.5383, 802.70306, 3179.52, 2600.3767, 1124.3801, 3090.7002, 2626.6768, 2889.6797, 2785.6604, 3243.3486]
2025-05-07 14:58:22,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [105.0, 1000.0, 1000.0, 1000.0, 428.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:58:22,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 37 minutes, 13 seconds)
2025-05-07 15:01:31,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:01:45,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2410.22827 ± 1027.627
2025-05-07 15:01:45,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1415.6672, 2991.876, 3371.2903, 1882.3259, 2972.7988, 1315.1693, 355.08575, 3016.855, 3369.98, 3411.2341]
2025-05-07 15:01:45,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 614.0, 1000.0, 524.0, 110.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:01:45,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 35 minutes, 34 seconds)
2025-05-07 15:04:44,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:04:57,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1896.61987 ± 876.122
2025-05-07 15:04:57,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2767.4038, 1126.1266, 2986.7883, 2815.9343, 1006.1235, 2601.5723, 2452.3071, 439.7922, 1536.6233, 1233.5278]
2025-05-07 15:04:57,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 559.0, 1000.0, 1000.0, 365.0, 1000.0, 1000.0, 144.0, 1000.0, 474.0]
2025-05-07 15:04:57,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 32 minutes, 24 seconds)
2025-05-07 15:07:46,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:08:04,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2624.93530 ± 639.001
2025-05-07 15:08:04,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2817.9636, 1378.0721, 2756.3906, 3052.619, 3118.982, 2856.6409, 2921.5361, 2946.0056, 1351.4248, 3049.7168]
2025-05-07 15:08:04,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 625.0, 1000.0]
2025-05-07 15:08:04,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (2624.94) for latency ExtremeSparseL4U32
2025-05-07 15:08:04,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-07 15:08:04,388 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 15:08:04,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 29 minutes, 12 seconds)
2025-05-07 15:10:59,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:11:17,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2455.43311 ± 513.559
2025-05-07 15:11:17,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2651.217, 2816.113, 2273.0461, 1205.1975, 2319.4592, 2811.023, 2461.4463, 3173.2725, 2745.908, 2097.6467]
2025-05-07 15:11:17,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 473.0, 957.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:11:17,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 26 minutes, 51 seconds)
2025-05-07 15:14:10,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:14:26,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2410.13770 ± 905.486
2025-05-07 15:14:26,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2611.899, 3366.0686, 984.5528, 2773.7827, 3457.7417, 2772.8127, 2995.5945, 2872.2136, 1264.4819, 1002.2289]
2025-05-07 15:14:26,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 305.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 317.0]
2025-05-07 15:14:26,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 23 minutes, 35 seconds)
2025-05-07 15:17:26,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:17:42,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2212.17773 ± 891.849
2025-05-07 15:17:42,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2572.7166, 945.2742, 938.3285, 1441.0834, 3314.3545, 3067.2383, 2634.4355, 2725.5112, 1343.5908, 3139.246]
2025-05-07 15:17:42,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 318.0, 284.0, 1000.0, 1000.0, 1000.0, 1000.0, 964.0, 1000.0, 1000.0]
2025-05-07 15:17:42,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 19 minutes, 42 seconds)
2025-05-07 15:20:41,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:20:57,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2232.98584 ± 1010.114
2025-05-07 15:20:57,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [3204.4382, 2970.9414, 906.20435, 989.3839, 2846.0286, 3294.3423, 2091.5234, 464.49786, 3061.4773, 2501.0212]
2025-05-07 15:20:57,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 334.0, 1000.0, 1000.0, 1000.0, 169.0, 1000.0, 1000.0]
2025-05-07 15:20:57,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 16 minutes, 47 seconds)
2025-05-07 15:23:52,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:24:09,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2677.34839 ± 823.345
2025-05-07 15:24:09,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2889.2817, 2455.4788, 2872.0037, 3104.3452, 3263.3972, 2853.5828, 3090.6245, 291.94916, 2817.8254, 3134.9946]
2025-05-07 15:24:09,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 137.0, 1000.0, 1000.0]
2025-05-07 15:24:09,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (2677.35) for latency ExtremeSparseL4U32
2025-05-07 15:24:09,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-07 15:24:09,862 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 15:24:09,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 14 minutes, 1 second)
2025-05-07 15:27:08,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:27:21,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1728.02808 ± 1152.900
2025-05-07 15:27:21,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [281.91797, 3142.73, 2784.0398, 2995.186, 2715.2983, 749.9479, 322.2275, 745.3706, 877.00616, 2666.5579]
2025-05-07 15:27:21,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [161.0, 1000.0, 1000.0, 1000.0, 1000.0, 294.0, 95.0, 273.0, 1000.0, 1000.0]
2025-05-07 15:27:21,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 10 minutes, 41 seconds)
2025-05-07 15:30:26,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:30:41,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2235.29541 ± 896.148
2025-05-07 15:30:41,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [3144.668, 720.2978, 385.35138, 2288.4028, 3113.512, 2320.725, 2856.4163, 2374.296, 2388.1968, 2761.0898]
2025-05-07 15:30:41,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 318.0, 112.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:30:41,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 8 minutes, 14 seconds)
2025-05-07 15:33:21,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:33:42,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2392.14600 ± 855.460
2025-05-07 15:33:42,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [3050.696, 2897.898, 1859.9545, 1231.9558, 3241.5635, 3045.8328, 882.078, 1590.2014, 3189.8076, 2931.471]
2025-05-07 15:33:42,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 335.0, 551.0, 1000.0, 1000.0]
2025-05-07 15:33:42,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 4 minutes)
2025-05-07 15:37:28,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:37:45,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 3000.00366 ± 754.089
2025-05-07 15:37:45,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2806.2795, 3279.1646, 3039.758, 3011.4119, 3457.7776, 3318.3174, 3521.9568, 3513.3184, 3214.566, 837.4856]
2025-05-07 15:37:45,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 320.0]
2025-05-07 15:37:45,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (3000.00) for latency ExtremeSparseL4U32
2025-05-07 15:37:45,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-07 15:37:45,062 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 15:37:45,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 3 minutes, 49 seconds)
2025-05-07 15:40:32,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:40:50,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2739.09619 ± 457.691
2025-05-07 15:40:50,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [3270.7256, 2708.004, 2319.868, 3145.9338, 2718.8909, 3315.687, 2585.8467, 1738.2124, 2563.8767, 3023.9158]
2025-05-07 15:40:50,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:40:50,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 3 seconds)
2025-05-07 15:43:37,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:43:53,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2600.50073 ± 771.869
2025-05-07 15:43:53,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2736.8376, 2490.8726, 3352.772, 2595.9004, 3319.4678, 2852.7615, 3077.843, 1127.3502, 1200.5176, 3250.6865]
2025-05-07 15:43:53,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 324.0, 444.0, 1000.0]
2025-05-07 15:43:53,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 84/100 (estimated time remaining: 56 minutes, 13 seconds)
2025-05-07 15:46:49,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:47:06,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2555.97021 ± 907.224
2025-05-07 15:47:06,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1976.5935, 2979.7917, 3157.8037, 2942.599, 2890.924, 2927.1086, 3471.5784, 1018.5911, 806.7955, 3387.917]
2025-05-07 15:47:06,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [706.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 361.0, 1000.0, 1000.0]
2025-05-07 15:47:06,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 85/100 (estimated time remaining: 52 minutes, 32 seconds)
2025-05-07 15:50:03,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:50:20,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2499.49268 ± 895.272
2025-05-07 15:50:20,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1497.3053, 3105.5735, 2534.63, 3025.357, 624.78107, 2926.3916, 3227.6167, 1534.356, 3420.8682, 3098.0479]
2025-05-07 15:50:20,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 175.0, 875.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:50:20,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 86/100 (estimated time remaining: 49 minutes, 53 seconds)
2025-05-07 15:53:19,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:53:34,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2485.97998 ± 867.332
2025-05-07 15:53:34,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2899.1643, 538.27985, 3230.5347, 2325.733, 1902.5457, 3267.0422, 3559.2505, 2447.2231, 2991.0544, 1698.9728]
2025-05-07 15:53:34,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 158.0, 1000.0, 767.0, 554.0, 1000.0, 1000.0, 1000.0, 1000.0, 600.0]
2025-05-07 15:53:34,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 87/100 (estimated time remaining: 44 minutes, 19 seconds)
2025-05-07 15:56:27,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:56:43,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2938.77173 ± 792.340
2025-05-07 15:56:43,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [3549.7974, 2892.309, 3234.0317, 700.13745, 3207.5874, 3195.1177, 3244.8823, 3561.5017, 3202.467, 2599.8872]
2025-05-07 15:56:43,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 223.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:56:43,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 88/100 (estimated time remaining: 41 minutes, 17 seconds)
2025-05-07 15:59:48,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:00:02,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2268.95703 ± 1052.728
2025-05-07 16:00:02,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [3252.321, 489.2403, 909.11334, 2981.1094, 2853.965, 3028.7007, 2778.0713, 703.61096, 3230.8152, 2462.624]
2025-05-07 16:00:02,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 217.0, 242.0, 1000.0, 948.0, 1000.0, 1000.0, 227.0, 1000.0, 1000.0]
2025-05-07 16:00:02,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 89/100 (estimated time remaining: 38 minutes, 45 seconds)
2025-05-07 16:02:53,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:03:10,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2865.93799 ± 517.636
2025-05-07 16:03:10,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2721.3384, 2163.688, 2775.2314, 3050.827, 3166.9805, 3486.2556, 3383.0676, 1748.3079, 3219.3564, 2944.3303]
2025-05-07 16:03:10,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 517.0, 1000.0, 1000.0]
2025-05-07 16:03:10,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 90/100 (estimated time remaining: 35 minutes, 20 seconds)
2025-05-07 16:06:02,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:06:16,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2173.85986 ± 1056.726
2025-05-07 16:06:16,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [14.645658, 1893.36, 3347.1523, 2468.8796, 1062.6155, 3300.5864, 2630.2146, 1140.3877, 2884.0024, 2996.752]
2025-05-07 16:06:16,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [16.0, 732.0, 1000.0, 1000.0, 507.0, 1000.0, 1000.0, 448.0, 1000.0, 1000.0]
2025-05-07 16:06:16,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 91/100 (estimated time remaining: 31 minutes, 52 seconds)
2025-05-07 16:09:12,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:09:30,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2996.27856 ± 277.533
2025-05-07 16:09:30,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [3255.9436, 2946.4575, 3020.049, 2522.9905, 2585.8315, 3458.2932, 2859.0146, 3251.222, 3092.851, 2970.1304]
2025-05-07 16:09:30,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 840.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 16:09:30,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 92/100 (estimated time remaining: 28 minutes, 39 seconds)
2025-05-07 16:12:28,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:12:45,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2904.38550 ± 324.742
2025-05-07 16:12:45,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [3273.1853, 3197.6267, 3121.8083, 2610.8972, 3071.0945, 2776.214, 2823.4417, 2897.5566, 3138.1611, 2133.8716]
2025-05-07 16:12:45,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 962.0, 1000.0, 1000.0, 632.0]
2025-05-07 16:12:45,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 93/100 (estimated time remaining: 25 minutes, 39 seconds)
2025-05-07 16:15:43,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:15:58,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2399.95947 ± 651.273
2025-05-07 16:15:58,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [3451.5708, 1224.6791, 1570.3691, 2920.7554, 2685.2463, 1765.5161, 2332.5935, 2490.7874, 2852.6855, 2705.3877]
2025-05-07 16:15:58,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 528.0, 461.0, 1000.0, 1000.0, 588.0, 1000.0, 852.0, 1000.0, 933.0]
2025-05-07 16:15:58,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 94/100 (estimated time remaining: 22 minutes, 18 seconds)
2025-05-07 16:18:46,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:19:03,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2942.11597 ± 813.115
2025-05-07 16:19:03,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [3413.9753, 3521.5696, 3026.4634, 637.2603, 2563.1187, 3291.9348, 3448.6902, 3015.0786, 3167.1514, 3335.9172]
2025-05-07 16:19:03,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 227.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 16:19:03,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 95/100 (estimated time remaining: 19 minutes, 3 seconds)
2025-05-07 16:22:07,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:22:22,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2650.26636 ± 717.420
2025-05-07 16:22:22,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1334.8668, 2609.7158, 2839.2302, 2827.4644, 3145.9102, 3475.5793, 3103.3105, 3047.3464, 2884.3704, 1234.8682]
2025-05-07 16:22:22,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [434.0, 860.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 432.0]
2025-05-07 16:22:22,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 96/100 (estimated time remaining: 16 minutes, 6 seconds)
2025-05-07 16:25:23,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:25:39,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2637.82031 ± 733.227
2025-05-07 16:25:39,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [549.9753, 2690.7837, 2945.237, 2391.541, 3214.6787, 3103.0574, 2794.137, 2738.9, 3129.8474, 2820.0447]
2025-05-07 16:25:39,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [182.0, 1000.0, 1000.0, 711.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 16:25:39,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 97/100 (estimated time remaining: 12 minutes, 55 seconds)
2025-05-07 16:28:21,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:28:39,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2980.48975 ± 349.101
2025-05-07 16:28:39,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2983.742, 3213.3115, 2620.7483, 3088.2449, 3307.031, 2316.2827, 2599.103, 3375.7217, 3391.4963, 2909.2178]
2025-05-07 16:28:39,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 902.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 16:28:39,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 98/100 (estimated time remaining: 9 minutes, 32 seconds)
2025-05-07 16:31:34,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:31:47,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2081.99536 ± 1213.922
2025-05-07 16:31:47,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2400.5315, 657.1578, 3192.0598, 3455.8694, 1156.5494, 345.59375, 2912.6511, 435.08826, 3181.5898, 3082.8633]
2025-05-07 16:31:47,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 222.0, 1000.0, 1000.0, 1000.0, 101.0, 1000.0, 142.0, 1000.0, 1000.0]
2025-05-07 16:31:47,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 19 seconds)
2025-05-07 16:34:56,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:35:11,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2256.02930 ± 1045.781
2025-05-07 16:35:11,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [3218.6946, 679.63635, 3333.89, 2494.3235, 2818.548, 2108.8193, 3089.9543, 829.1837, 3238.4375, 748.8049]
2025-05-07 16:35:11,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 996.0, 354.0, 1000.0, 259.0]
2025-05-07 16:35:11,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 13 seconds)
2025-05-07 16:37:57,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:38:12,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2468.87769 ± 1122.737
2025-05-07 16:38:12,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1980.0579, 3227.8157, 2887.7502, 3315.9597, 2.361492, 3112.8005, 3278.5298, 738.8378, 2966.9294, 3177.735]
2025-05-07 16:38:12,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 992.0, 1000.0, 1000.0, 17.0, 1000.0, 1000.0, 225.0, 1000.0, 1000.0]
2025-05-07 16:38:12,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1149 [DEBUG]: Training session finished
