2025-08-07 00:48:04,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc0-ant/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:04,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc0-ant/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:04,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1489bf2f7e50>}
2025-08-07 00:48:04,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1111 [DEBUG]: using device: cuda
2025-08-07 00:48:04,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1133 [INFO]: Creating new trainer
2025-08-07 00:48:04,054 baseline-bpql-noiseperc0-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=283, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 00:48:04,054 baseline-bpql-noiseperc0-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 00:48:06,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1194 [DEBUG]: Starting training session...
2025-08-07 00:48:06,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 1/100
2025-08-07 00:49:49,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:49:50,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: -12.68624 ± 35.295
2025-08-07 00:49:50,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [-7.0976214, -30.628908, 22.788996, -74.08191, 3.725779, 8.151959, -73.260956, 15.950187, 29.682814, -22.09278]
2025-08-07 00:49:50,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [58.0, 80.0, 37.0, 107.0, 70.0, 48.0, 118.0, 40.0, 39.0, 65.0]
2025-08-07 00:49:50,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (-12.69) for latency ExtremeSparseL4U32
2025-08-07 00:49:50,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 52 minutes, 20 seconds)
2025-08-07 00:51:37,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:51:39,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: -13.98179 ± 43.191
2025-08-07 00:51:39,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [-9.815594, -119.954865, 14.99202, -4.2229967, -39.113174, 28.224192, -21.259638, -40.632153, 18.830633, 33.133728]
2025-08-07 00:51:39,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [117.0, 179.0, 122.0, 88.0, 139.0, 68.0, 112.0, 87.0, 67.0, 53.0]
2025-08-07 00:51:39,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 53 minutes, 52 seconds)
2025-08-07 00:53:26,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:53:30,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: -80.04768 ± 238.422
2025-08-07 00:53:30,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [21.42944, -79.89365, 31.946072, -18.334925, 29.442978, 26.556028, 16.337711, -8.419849, -788.31506, -31.22558]
2025-08-07 00:53:30,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [54.0, 181.0, 58.0, 174.0, 68.0, 61.0, 62.0, 104.0, 1000.0, 98.0]
2025-08-07 00:53:30,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 54 minutes, 23 seconds)
2025-08-07 00:55:19,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:55:21,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: -24.22358 ± 46.197
2025-08-07 00:55:21,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [-17.228653, -6.31789, 66.060745, -88.680336, -41.088173, -92.31753, -4.301993, -21.913742, -58.66382, 22.215542]
2025-08-07 00:55:21,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [110.0, 96.0, 60.0, 153.0, 147.0, 138.0, 110.0, 143.0, 120.0, 90.0]
2025-08-07 00:55:21,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 53 minutes, 51 seconds)
2025-08-07 00:57:06,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:57:09,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: -37.89905 ± 138.530
2025-08-07 00:57:09,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [41.451057, -10.793942, 28.682829, -446.02835, 7.187, 38.147873, 6.1422586, 16.432564, -52.932976, -7.278787]
2025-08-07 00:57:09,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [97.0, 171.0, 120.0, 1000.0, 87.0, 76.0, 136.0, 68.0, 187.0, 115.0]
2025-08-07 00:57:09,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 52 minutes)
2025-08-07 00:59:05,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:59:15,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 183.83868 ± 88.085
2025-08-07 00:59:15,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [316.7394, 302.97565, 104.97546, 107.79631, 181.38553, 134.9749, 319.58316, 125.71835, 91.173775, 153.06436]
2025-08-07 00:59:15,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 273.0, 759.0, 488.0, 241.0, 1000.0, 279.0, 311.0, 374.0]
2025-08-07 00:59:15,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (183.84) for latency ExtremeSparseL4U32
2025-08-07 00:59:15,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 57 minutes, 2 seconds)
2025-08-07 01:00:53,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:01:11,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 659.69574 ± 138.170
2025-08-07 01:01:11,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [743.4008, 699.43207, 754.01776, 694.57623, 692.0303, 256.10782, 693.86206, 647.0435, 741.7106, 674.77686]
2025-08-07 01:01:11,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 476.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:01:11,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (659.70) for latency ExtremeSparseL4U32
2025-08-07 01:01:11,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 57 minutes, 24 seconds)
2025-08-07 01:03:08,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:03:24,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 691.69855 ± 283.680
2025-08-07 01:03:24,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [802.07074, 756.3437, 178.32756, 82.27455, 818.0996, 866.4123, 894.84503, 846.6214, 839.1937, 832.7962]
2025-08-07 01:03:24,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 224.0, 105.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:03:24,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (691.70) for latency ExtremeSparseL4U32
2025-08-07 01:03:24,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 2 minutes, 10 seconds)
2025-08-07 01:05:05,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:05:14,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 434.35190 ± 367.593
2025-08-07 01:05:14,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [854.15546, 59.980663, 829.59235, 67.79475, 862.289, 124.104866, 521.2243, 81.49706, 877.69836, 65.18195]
2025-08-07 01:05:14,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 75.0, 1000.0, 56.0, 1000.0, 127.0, 450.0, 78.0, 1000.0, 58.0]
2025-08-07 01:05:14,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 59 minutes, 56 seconds)
2025-08-07 01:07:03,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:07:17,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 678.50330 ± 351.796
2025-08-07 01:07:17,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [947.93396, 864.5719, 42.45996, 971.9107, 346.59625, 908.6523, 869.7021, 839.7667, 79.60839, 913.8305]
2025-08-07 01:07:17,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 38.0, 1000.0, 278.0, 1000.0, 1000.0, 1000.0, 68.0, 1000.0]
2025-08-07 01:07:17,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 2 minutes, 13 seconds)
2025-08-07 01:09:05,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:09:15,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 372.71454 ± 284.067
2025-08-07 01:09:15,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [50.462406, 823.98315, 115.56315, 195.22377, 489.71616, 116.04985, 306.43408, 842.85895, 174.37396, 612.4798]
2025-08-07 01:09:15,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [46.0, 1000.0, 109.0, 227.0, 1000.0, 126.0, 443.0, 1000.0, 266.0, 1000.0]
2025-08-07 01:09:15,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 57 minutes, 48 seconds)
2025-08-07 01:11:00,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:11:11,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 504.01202 ± 364.728
2025-08-07 01:11:11,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [161.30196, 109.98094, 942.6229, 184.66806, 103.07555, 909.43585, 763.15015, 169.54596, 742.886, 953.4525]
2025-08-07 01:11:11,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [120.0, 83.0, 1000.0, 171.0, 93.0, 1000.0, 1000.0, 177.0, 1000.0, 1000.0]
2025-08-07 01:11:11,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 55 minutes, 51 seconds)
2025-08-07 01:13:06,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:13:23,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 819.63007 ± 243.103
2025-08-07 01:13:23,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [991.1587, 916.77435, 936.09686, 877.2546, 821.08765, 930.018, 919.233, 104.59868, 846.9835, 853.0955]
2025-08-07 01:13:23,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 73.0, 1000.0, 1000.0]
2025-08-07 01:13:23,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (819.63) for latency ExtremeSparseL4U32
2025-08-07 01:13:23,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 53 minutes, 52 seconds)
2025-08-07 01:15:05,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:15:22,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 800.32849 ± 261.373
2025-08-07 01:15:22,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [925.283, 819.3272, 698.24243, 894.99243, 959.2851, 45.569386, 906.95276, 921.0588, 912.2384, 920.33545]
2025-08-07 01:15:22,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 55.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:15:22,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 54 minutes, 23 seconds)
2025-08-07 01:17:13,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:17:25,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 628.19312 ± 353.907
2025-08-07 01:17:25,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [918.7729, 927.3867, 130.97598, 230.235, 125.11183, 904.373, 942.94934, 949.0327, 315.54575, 837.548]
2025-08-07 01:17:25,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 127.0, 241.0, 91.0, 1000.0, 1000.0, 1000.0, 236.0, 1000.0]
2025-08-07 01:17:25,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 52 minutes, 22 seconds)
2025-08-07 01:19:15,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:19:25,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 528.66418 ± 403.709
2025-08-07 01:19:25,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [880.81476, 925.0154, 112.33434, 57.093754, 195.97073, 1049.5613, 859.09906, 134.42784, 144.84871, 927.47644]
2025-08-07 01:19:25,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 167.0, 54.0, 195.0, 1000.0, 1000.0, 112.0, 126.0, 1000.0]
2025-08-07 01:19:25,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 50 minutes, 58 seconds)
2025-08-07 01:21:17,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:21:28,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 524.28882 ± 382.368
2025-08-07 01:21:28,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [820.877, 212.09628, 988.57513, 72.57387, 818.9479, 104.87092, 954.2665, 263.3161, 91.10102, 916.2636]
2025-08-07 01:21:28,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 286.0, 1000.0, 73.0, 1000.0, 98.0, 1000.0, 209.0, 112.0, 1000.0]
2025-08-07 01:21:28,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 50 minutes, 46 seconds)
2025-08-07 01:23:10,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:23:21,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 545.10095 ± 344.523
2025-08-07 01:23:21,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [75.852974, 239.0959, 133.79755, 523.8053, 156.57637, 953.1687, 710.4365, 937.65314, 806.05585, 914.56726]
2025-08-07 01:23:21,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [61.0, 213.0, 134.0, 428.0, 198.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:23:21,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 43 minutes, 26 seconds)
2025-08-07 01:25:06,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:25:15,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 556.37274 ± 275.009
2025-08-07 01:25:15,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [282.73172, 176.76141, 258.66245, 1086.4521, 551.6054, 487.17252, 931.5294, 556.5392, 529.5317, 702.7412]
2025-08-07 01:25:15,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [214.0, 148.0, 224.0, 1000.0, 411.0, 397.0, 1000.0, 427.0, 448.0, 1000.0]
2025-08-07 01:25:15,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 40 minutes, 5 seconds)
2025-08-07 01:27:06,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:27:17,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 629.93274 ± 357.054
2025-08-07 01:27:17,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [652.40594, 935.2518, 976.36096, 228.31506, 960.3352, 293.35767, 103.60805, 1013.73303, 918.2608, 217.69922]
2025-08-07 01:27:17,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [620.0, 1000.0, 1000.0, 210.0, 1000.0, 309.0, 104.0, 1000.0, 1000.0, 214.0]
2025-08-07 01:27:17,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 38 minutes)
2025-08-07 01:29:05,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:29:16,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 612.38550 ± 417.452
2025-08-07 01:29:16,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [138.71527, 1049.9579, 1019.53656, 989.12744, 128.52579, 267.0915, 984.3949, 391.3306, 89.687225, 1065.4884]
2025-08-07 01:29:16,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [96.0, 898.0, 1000.0, 1000.0, 138.0, 211.0, 1000.0, 300.0, 70.0, 1000.0]
2025-08-07 01:29:16,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 35 minutes, 27 seconds)
2025-08-07 01:30:57,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:31:11,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 841.10010 ± 254.424
2025-08-07 01:31:11,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1024.632, 919.9309, 681.3752, 773.7428, 1189.0316, 982.4015, 612.7597, 1216.3309, 410.48047, 600.3163]
2025-08-07 01:31:11,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 645.0, 599.0, 916.0, 1000.0, 499.0, 1000.0, 314.0, 492.0]
2025-08-07 01:31:11,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (841.10) for latency ExtremeSparseL4U32
2025-08-07 01:31:11,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 31 minutes, 34 seconds)
2025-08-07 01:33:07,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:33:22,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 940.94812 ± 247.438
2025-08-07 01:33:22,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [907.4671, 554.1685, 883.537, 1173.7567, 492.91708, 1122.217, 883.4858, 1261.9899, 1194.0367, 935.9052]
2025-08-07 01:33:22,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [662.0, 529.0, 1000.0, 1000.0, 331.0, 1000.0, 672.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:33:22,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (940.95) for latency ExtremeSparseL4U32
2025-08-07 01:33:22,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 34 minutes, 6 seconds)
2025-08-07 01:35:03,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:35:16,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 972.30194 ± 393.316
2025-08-07 01:35:16,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [400.69333, 1342.5942, 1190.9546, 247.30637, 972.3109, 1464.1987, 1292.0538, 1030.1917, 1164.338, 618.378]
2025-08-07 01:35:16,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [298.0, 1000.0, 1000.0, 147.0, 707.0, 1000.0, 1000.0, 1000.0, 831.0, 509.0]
2025-08-07 01:35:16,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (972.30) for latency ExtremeSparseL4U32
2025-08-07 01:35:16,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 32 minutes, 11 seconds)
2025-08-07 01:37:02,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:37:15,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1016.57581 ± 426.188
2025-08-07 01:37:15,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1268.9476, 1393.8468, 277.1717, 657.1052, 1261.3096, 1186.1388, 250.98494, 1139.272, 1342.2335, 1388.7483]
2025-08-07 01:37:15,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [960.0, 1000.0, 164.0, 458.0, 1000.0, 1000.0, 170.0, 741.0, 993.0, 1000.0]
2025-08-07 01:37:15,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (1016.58) for latency ExtremeSparseL4U32
2025-08-07 01:37:15,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 29 minutes, 22 seconds)
2025-08-07 01:39:01,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:39:12,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 875.14337 ± 382.983
2025-08-07 01:39:12,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [645.0756, 1206.53, 321.36203, 1081.6078, 580.96783, 1469.6091, 970.8091, 1023.0845, 1194.8981, 257.489]
2025-08-07 01:39:12,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [424.0, 1000.0, 246.0, 1000.0, 355.0, 1000.0, 611.0, 624.0, 1000.0, 231.0]
2025-08-07 01:39:12,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 27 minutes, 3 seconds)
2025-08-07 01:40:52,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:41:04,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1062.52649 ± 603.276
2025-08-07 01:41:04,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1255.7618, 694.32886, 1660.8314, 1656.5697, 1539.2238, 393.14694, 1499.3196, 1592.1703, 216.65324, 117.26016]
2025-08-07 01:41:04,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [706.0, 414.0, 1000.0, 1000.0, 1000.0, 226.0, 1000.0, 1000.0, 162.0, 94.0]
2025-08-07 01:41:04,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (1062.53) for latency ExtremeSparseL4U32
2025-08-07 01:41:04,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 24 minutes, 20 seconds)
2025-08-07 01:42:58,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:43:11,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1092.39136 ± 575.277
2025-08-07 01:43:11,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1159.0773, 1602.241, 61.096867, 1526.6317, 1676.9653, 354.62708, 1462.439, 362.026, 1142.6882, 1576.1202]
2025-08-07 01:43:11,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [674.0, 1000.0, 45.0, 1000.0, 1000.0, 218.0, 1000.0, 223.0, 1000.0, 1000.0]
2025-08-07 01:43:11,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (1092.39) for latency ExtremeSparseL4U32
2025-08-07 01:43:11,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 21 minutes, 26 seconds)
2025-08-07 01:44:54,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:45:05,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1055.76013 ± 610.363
2025-08-07 01:45:05,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [519.56793, 138.48578, 738.6061, 1133.4966, 1566.7413, 1776.9504, 1780.8661, 1046.7976, 169.68797, 1686.4019]
2025-08-07 01:45:05,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [312.0, 73.0, 490.0, 651.0, 925.0, 1000.0, 1000.0, 633.0, 100.0, 1000.0]
2025-08-07 01:45:05,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 19 minutes, 23 seconds)
2025-08-07 01:46:49,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:47:05,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1540.16467 ± 421.336
2025-08-07 01:47:05,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1732.9025, 1691.8076, 1778.7572, 1759.5367, 1782.6531, 1683.5493, 1036.558, 1655.4562, 453.24484, 1827.1803]
2025-08-07 01:47:05,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [999.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 637.0, 1000.0, 291.0, 1000.0]
2025-08-07 01:47:05,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (1540.16) for latency ExtremeSparseL4U32
2025-08-07 01:47:05,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 17 minutes, 39 seconds)
2025-08-07 01:48:50,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:49:07,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1709.15039 ± 254.126
2025-08-07 01:49:07,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1006.2679, 1569.5875, 1732.829, 1815.8279, 1831.1099, 1698.8768, 1945.2219, 1877.5835, 1774.931, 1839.2695]
2025-08-07 01:49:07,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:49:07,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (1709.15) for latency ExtremeSparseL4U32
2025-08-07 01:49:07,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 16 minutes, 57 seconds)
2025-08-07 01:50:58,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:51:15,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1456.53638 ± 463.392
2025-08-07 01:51:15,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1545.4089, 1587.9183, 1578.5518, 1782.0571, 1770.7972, 1664.6564, 1557.4292, 123.98637, 1286.8792, 1667.6809]
2025-08-07 01:51:15,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 72.0, 1000.0, 1000.0]
2025-08-07 01:51:15,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 18 minutes, 23 seconds)
2025-08-07 01:52:54,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:53:10,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1566.07153 ± 357.690
2025-08-07 01:53:10,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2084.042, 1170.7261, 1654.0177, 705.8319, 1747.4119, 1520.1123, 1703.593, 1707.0992, 1631.5137, 1736.3677]
2025-08-07 01:53:10,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 669.0, 1000.0, 435.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 843.0]
2025-08-07 01:53:10,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 13 minutes, 48 seconds)
2025-08-07 01:54:56,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:55:10,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1493.10229 ± 621.513
2025-08-07 01:55:10,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1101.7201, 2238.3804, 2143.74, 1935.6355, 1004.63947, 344.98734, 1302.9949, 899.68353, 2221.0166, 1738.2244]
2025-08-07 01:55:10,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [494.0, 1000.0, 1000.0, 968.0, 1000.0, 203.0, 638.0, 441.0, 1000.0, 925.0]
2025-08-07 01:55:10,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 13 minutes, 4 seconds)
2025-08-07 01:56:55,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:57:07,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 870.17053 ± 421.268
2025-08-07 01:57:07,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1234.8845, 542.13495, 1292.9937, 127.98018, 1470.7288, 754.9612, 1183.4392, 1100.4749, 517.83795, 476.2691]
2025-08-07 01:57:07,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [522.0, 252.0, 1000.0, 69.0, 1000.0, 1000.0, 1000.0, 1000.0, 217.0, 296.0]
2025-08-07 01:57:07,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 10 minutes, 21 seconds)
2025-08-07 01:58:52,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:59:04,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1648.68921 ± 1098.238
2025-08-07 01:59:04,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [196.59401, 2392.8298, 2523.5276, 2626.7024, 2408.6355, 967.78864, 86.01568, 97.3234, 2571.0886, 2616.3875]
2025-08-07 01:59:04,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [132.0, 1000.0, 1000.0, 1000.0, 1000.0, 406.0, 63.0, 62.0, 1000.0, 1000.0]
2025-08-07 01:59:04,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 7 minutes, 14 seconds)
2025-08-07 02:00:49,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:01:07,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2245.77173 ± 229.719
2025-08-07 02:01:07,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2162.1084, 2709.5535, 1824.2483, 2297.7903, 2318.6226, 2384.2878, 2210.7776, 2271.7732, 2334.4045, 1944.1498]
2025-08-07 02:01:07,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 973.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:01:07,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (2245.77) for latency ExtremeSparseL4U32
2025-08-07 02:01:07,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 4 minutes, 29 seconds)
2025-08-07 02:02:58,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:03:16,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2290.27686 ± 116.820
2025-08-07 02:03:16,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2310.9272, 2208.4128, 2287.6458, 2354.4697, 2017.3192, 2418.3918, 2188.464, 2410.5422, 2380.882, 2325.7122]
2025-08-07 02:03:16,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:03:16,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (2290.28) for latency ExtremeSparseL4U32
2025-08-07 02:03:16,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 5 minutes, 12 seconds)
2025-08-07 02:05:02,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:05:19,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2468.93091 ± 387.808
2025-08-07 02:05:19,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2634.8398, 2498.3794, 2630.5613, 1339.679, 2665.8997, 2603.0115, 2768.7866, 2414.9712, 2619.262, 2513.9187]
2025-08-07 02:05:19,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 547.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:05:19,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (2468.93) for latency ExtremeSparseL4U32
2025-08-07 02:05:19,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 3 minutes, 48 seconds)
2025-08-07 02:07:06,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:07:23,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2284.36279 ± 710.051
2025-08-07 02:07:23,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2745.06, 2622.6091, 2604.8777, 2696.6226, 2551.267, 2233.289, 2093.0498, 2414.0522, 2644.302, 238.49696]
2025-08-07 02:07:23,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 108.0]
2025-08-07 02:07:23,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 3 minutes, 16 seconds)
2025-08-07 02:09:07,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:09:25,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2697.05884 ± 73.996
2025-08-07 02:09:25,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2585.1973, 2653.1628, 2772.495, 2666.5095, 2589.9712, 2801.8796, 2682.1604, 2704.8032, 2800.3215, 2714.0854]
2025-08-07 02:09:25,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:09:25,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (2697.06) for latency ExtremeSparseL4U32
2025-08-07 02:09:25,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 2 minutes, 14 seconds)
2025-08-07 02:11:14,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:11:31,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2449.30786 ± 264.504
2025-08-07 02:11:31,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1754.5891, 2331.7588, 2637.7996, 2574.9927, 2338.715, 2367.4885, 2682.3606, 2529.1548, 2616.4436, 2659.776]
2025-08-07 02:11:31,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [693.0, 1000.0, 1000.0, 1000.0, 979.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:11:31,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 30 seconds)
2025-08-07 02:13:17,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:13:35,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2826.16919 ± 61.017
2025-08-07 02:13:35,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2743.824, 2770.787, 2772.8035, 2815.6882, 2810.2805, 2937.7239, 2839.4404, 2861.1348, 2920.8296, 2789.1772]
2025-08-07 02:13:35,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:13:35,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (2826.17) for latency ExtremeSparseL4U32
2025-08-07 02:13:35,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 57 minutes, 34 seconds)
2025-08-07 02:15:23,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:15:41,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2560.12378 ± 119.507
2025-08-07 02:15:41,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2626.352, 2676.1519, 2433.2075, 2579.7441, 2730.9297, 2669.4695, 2446.5784, 2418.133, 2631.4443, 2389.2263]
2025-08-07 02:15:41,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:15:41,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 56 minutes, 5 seconds)
2025-08-07 02:17:25,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:17:42,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2422.35645 ± 621.752
2025-08-07 02:17:42,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1150.4463, 2373.6025, 2664.2922, 2770.7673, 2811.96, 2663.5127, 2976.4436, 1279.2068, 2815.913, 2717.4207]
2025-08-07 02:17:42,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 846.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 464.0, 1000.0, 1000.0]
2025-08-07 02:17:42,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 53 minutes, 27 seconds)
2025-08-07 02:19:24,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:19:42,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2773.73071 ± 143.564
2025-08-07 02:19:42,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2601.1013, 2849.831, 2491.8098, 2735.0232, 2740.4094, 2750.0579, 2961.9211, 2818.3328, 2791.1313, 2997.6895]
2025-08-07 02:19:42,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:19:42,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 51 minutes, 2 seconds)
2025-08-07 02:21:30,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:21:46,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2596.75439 ± 701.471
2025-08-07 02:21:46,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2954.0566, 2683.944, 2677.046, 2796.5361, 2816.6658, 2947.3677, 2857.0347, 2854.6748, 509.12985, 2871.0896]
2025-08-07 02:21:46,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 286.0, 1000.0]
2025-08-07 02:21:46,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 48 minutes, 45 seconds)
2025-08-07 02:23:38,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:23:56,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2671.43848 ± 339.029
2025-08-07 02:23:56,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3136.0193, 2787.6152, 2506.6316, 2579.7998, 2821.697, 2338.1375, 3025.8445, 1954.2993, 2990.0537, 2574.2876]
2025-08-07 02:23:56,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:23:56,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 47 minutes, 43 seconds)
2025-08-07 02:25:43,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:26:00,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2730.19897 ± 724.098
2025-08-07 02:26:00,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3092.6765, 2894.2444, 2749.6487, 3031.041, 3010.8752, 2848.8228, 3075.7893, 3034.2676, 579.7579, 2984.8672]
2025-08-07 02:26:00,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 291.0, 1000.0]
2025-08-07 02:26:00,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 45 minutes, 18 seconds)
2025-08-07 02:27:42,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:27:59,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2602.61182 ± 640.155
2025-08-07 02:27:59,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2985.7832, 2782.7185, 2784.1433, 2943.9487, 2873.702, 751.57275, 2333.6213, 2882.1863, 2877.0447, 2811.397]
2025-08-07 02:27:59,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 258.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:27:59,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 42 minutes, 50 seconds)
2025-08-07 02:29:47,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:30:05,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2761.36694 ± 89.100
2025-08-07 02:30:05,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2876.504, 2772.2493, 2679.29, 2809.1895, 2771.4048, 2789.9739, 2831.9888, 2807.79, 2733.3252, 2541.9558]
2025-08-07 02:30:05,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:30:05,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 41 minutes, 40 seconds)
2025-08-07 02:31:52,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:32:09,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2795.85083 ± 404.494
2025-08-07 02:32:09,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2809.3215, 3165.3567, 1643.514, 3034.5986, 2991.5227, 3042.1128, 2763.051, 2886.6577, 2865.1958, 2757.1748]
2025-08-07 02:32:09,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 603.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:32:09,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 39 minutes, 36 seconds)
2025-08-07 02:33:59,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:34:16,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2868.73608 ± 198.946
2025-08-07 02:34:16,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2769.8142, 2928.1624, 2856.0151, 3056.6667, 2630.2615, 2727.1272, 3097.762, 2495.7314, 3023.656, 3102.1655]
2025-08-07 02:34:16,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 840.0, 1000.0, 1000.0]
2025-08-07 02:34:16,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (2868.74) for latency ExtremeSparseL4U32
2025-08-07 02:34:16,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 37 minutes, 9 seconds)
2025-08-07 02:36:07,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:36:24,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2643.04956 ± 571.748
2025-08-07 02:36:24,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2680.9026, 2901.7827, 2771.9539, 1011.732, 3100.8252, 2476.9954, 2948.1667, 3058.3203, 2714.1255, 2765.691]
2025-08-07 02:36:24,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 371.0, 1000.0, 799.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:36:24,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 35 minutes, 36 seconds)
2025-08-07 02:38:09,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:38:27,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2779.49316 ± 126.420
2025-08-07 02:38:27,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2825.651, 3039.9612, 2882.3901, 2653.781, 2560.4526, 2768.0222, 2671.6238, 2759.4866, 2800.5383, 2833.0261]
2025-08-07 02:38:27,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:38:27,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 34 minutes, 13 seconds)
2025-08-07 02:40:14,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:40:32,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2993.72192 ± 92.246
2025-08-07 02:40:32,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3083.6902, 2992.6277, 2978.7876, 3033.8398, 2793.3486, 3110.1328, 3048.5076, 2864.597, 2995.709, 3035.9795]
2025-08-07 02:40:32,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:40:32,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (2993.72) for latency ExtremeSparseL4U32
2025-08-07 02:40:32,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 31 minutes, 59 seconds)
2025-08-07 02:42:19,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:42:35,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2539.86597 ± 880.457
2025-08-07 02:42:35,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2988.6785, 2945.8247, 3109.546, 2962.5967, 2743.9155, 1635.0831, 3022.486, 2991.6833, 191.71631, 2807.13]
2025-08-07 02:42:35,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 99.0, 1000.0]
2025-08-07 02:42:35,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 29 minutes, 46 seconds)
2025-08-07 02:44:18,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:44:30,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1931.89673 ± 1015.189
2025-08-07 02:44:30,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [113.20241, 2703.3005, 3119.5732, 1271.4613, 2922.2734, 1684.9375, 595.6954, 2776.4402, 1365.9974, 2766.0835]
2025-08-07 02:44:30,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [74.0, 1000.0, 1000.0, 433.0, 1000.0, 581.0, 212.0, 1000.0, 482.0, 1000.0]
2025-08-07 02:44:30,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 25 minutes, 55 seconds)
2025-08-07 02:46:15,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:46:30,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2615.71313 ± 908.955
2025-08-07 02:46:30,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3139.0247, 2056.2322, 2880.6863, 2931.231, 3151.039, 3111.368, 2864.062, 2782.4485, 3186.216, 54.82256]
2025-08-07 02:46:30,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 717.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 50.0]
2025-08-07 02:46:30,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 22 minutes, 54 seconds)
2025-08-07 02:48:22,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:48:37,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2445.21948 ± 819.875
2025-08-07 02:48:37,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [552.9697, 2830.0017, 2799.1814, 2759.0046, 2722.0544, 3010.1223, 1131.3695, 2689.8862, 3074.438, 2883.1675]
2025-08-07 02:48:37,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [206.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 429.0, 1000.0, 1000.0, 929.0]
2025-08-07 02:48:37,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 21 minutes, 23 seconds)
2025-08-07 02:50:22,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:50:40,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2943.77490 ± 518.550
2025-08-07 02:50:40,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3022.0647, 2960.911, 3068.3552, 3354.0967, 2975.9216, 2969.1907, 2989.3499, 1476.21, 3121.5076, 3500.143]
2025-08-07 02:50:40,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:50:40,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 19 minutes, 3 seconds)
2025-08-07 02:52:30,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:52:46,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2423.16260 ± 789.728
2025-08-07 02:52:46,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1355.9546, 3039.4404, 2678.018, 2607.792, 2903.953, 2661.3376, 2857.9375, 2726.784, 2925.869, 474.54156]
2025-08-07 02:52:46,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 883.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 166.0]
2025-08-07 02:52:46,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 17 minutes, 25 seconds)
2025-08-07 02:54:36,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:54:54,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3046.71729 ± 180.344
2025-08-07 02:54:54,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3092.0364, 3226.6067, 2997.6655, 3048.7593, 2578.6936, 3108.7795, 3244.4883, 3178.8752, 3038.7605, 2952.5103]
2025-08-07 02:54:54,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:54:54,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3046.72) for latency ExtremeSparseL4U32
2025-08-07 02:54:54,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 16 minutes, 55 seconds)
2025-08-07 02:56:39,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:56:57,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2795.19897 ± 237.926
2025-08-07 02:56:57,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2474.8372, 2973.6143, 2955.631, 2902.467, 2744.514, 2469.4592, 2784.758, 2773.1008, 3286.807, 2586.802]
2025-08-07 02:56:57,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:56:57,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 15 minutes, 11 seconds)
2025-08-07 02:58:46,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:59:05,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3141.41748 ± 277.688
2025-08-07 02:59:05,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2708.8665, 3382.5542, 3288.9773, 3073.2588, 2666.424, 2953.1965, 3517.4353, 3419.6304, 3123.5576, 3280.2708]
2025-08-07 02:59:05,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:59:05,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3141.42) for latency ExtremeSparseL4U32
2025-08-07 02:59:05,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 13 minutes, 10 seconds)
2025-08-07 03:01:00,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:01:18,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3214.31494 ± 183.405
2025-08-07 03:01:18,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3413.1855, 3288.8435, 3220.8975, 2976.9114, 3056.18, 3463.2239, 3308.4375, 3123.145, 2899.7432, 3392.5813]
2025-08-07 03:01:18,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:01:18,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3214.31) for latency ExtremeSparseL4U32
2025-08-07 03:01:18,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 12 minutes, 15 seconds)
2025-08-07 03:02:57,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:03:15,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2862.57104 ± 262.722
2025-08-07 03:03:15,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3152.5625, 2649.323, 3070.3193, 3276.8125, 2606.3281, 2739.3398, 3163.2117, 2683.8284, 2491.3035, 2792.6821]
2025-08-07 03:03:15,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:03:15,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 9 minutes, 11 seconds)
2025-08-07 03:05:04,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:05:23,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3100.39795 ± 221.226
2025-08-07 03:05:23,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2770.2874, 3076.8467, 3138.315, 3484.329, 3332.746, 3280.956, 3021.2024, 3019.3447, 2739.7979, 3140.155]
2025-08-07 03:05:23,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:05:23,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 7 minutes, 3 seconds)
2025-08-07 03:07:12,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:07:27,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2606.65601 ± 509.324
2025-08-07 03:07:27,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3205.4438, 2201.1829, 2771.468, 2622.2256, 3072.4531, 1658.8842, 3226.2048, 2346.6775, 2952.9785, 2009.0433]
2025-08-07 03:07:27,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 806.0, 1000.0, 1000.0, 1000.0, 527.0, 1000.0, 718.0, 1000.0, 701.0]
2025-08-07 03:07:27,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 5 minutes, 9 seconds)
2025-08-07 03:09:18,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:09:36,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2458.39502 ± 757.924
2025-08-07 03:09:36,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2866.6013, 1274.8118, 804.9052, 2919.6094, 3000.725, 2107.2754, 2886.8604, 3019.8672, 2900.1567, 2803.1372]
2025-08-07 03:09:36,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:09:36,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 3 minutes, 9 seconds)
2025-08-07 03:11:31,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:11:49,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3231.93164 ± 94.793
2025-08-07 03:11:49,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3222.3245, 3289.274, 2997.528, 3217.0637, 3226.0308, 3233.35, 3317.052, 3159.2407, 3326.8752, 3330.5771]
2025-08-07 03:11:49,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:11:49,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3231.93) for latency ExtremeSparseL4U32
2025-08-07 03:11:49,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 1 minute, 3 seconds)
2025-08-07 03:13:37,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:13:55,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3210.51367 ± 77.089
2025-08-07 03:13:55,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3304.1748, 3300.8748, 3139.8772, 3281.045, 3164.8176, 3212.0623, 3259.6672, 3150.4944, 3235.8086, 3056.3171]
2025-08-07 03:13:55,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:13:55,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 59 minutes, 40 seconds)
2025-08-07 03:15:39,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:15:57,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2923.24683 ± 173.234
2025-08-07 03:15:57,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3172.658, 2665.4788, 2698.0872, 2935.934, 3073.4473, 2755.4807, 2894.5325, 3175.99, 2980.833, 2880.027]
2025-08-07 03:15:57,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:15:57,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 57 minutes, 5 seconds)
2025-08-07 03:17:52,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:18:10,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3011.11133 ± 514.882
2025-08-07 03:18:10,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3300.553, 3247.7227, 3094.6746, 3072.696, 3217.6472, 3302.2747, 3013.7288, 3008.2266, 3346.0881, 1507.5011]
2025-08-07 03:18:10,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 520.0]
2025-08-07 03:18:10,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 55 minutes, 39 seconds)
2025-08-07 03:19:59,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:20:17,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3177.06445 ± 50.194
2025-08-07 03:20:17,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3115.8054, 3187.092, 3198.3584, 3222.0264, 3062.6582, 3197.819, 3202.8943, 3244.3335, 3174.8125, 3164.844]
2025-08-07 03:20:17,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:20:17,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 53 minutes, 24 seconds)
2025-08-07 03:22:09,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:22:25,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2926.84326 ± 935.673
2025-08-07 03:22:25,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3185.095, 3380.431, 3146.9053, 132.8159, 3177.9014, 3116.4382, 3378.6067, 3324.9697, 3247.7666, 3177.5032]
2025-08-07 03:22:25,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 77.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:22:25,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 50 minutes, 50 seconds)
2025-08-07 03:24:14,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:24:31,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2634.94409 ± 661.090
2025-08-07 03:24:31,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3171.7126, 1446.8412, 2710.061, 2877.3157, 2981.216, 2967.9202, 2743.461, 3130.5356, 1247.1288, 3073.25]
2025-08-07 03:24:31,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 500.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 493.0, 1000.0]
2025-08-07 03:24:31,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 48 minutes, 44 seconds)
2025-08-07 03:26:19,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:26:37,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3077.31519 ± 248.683
2025-08-07 03:26:37,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3249.9404, 3322.4668, 3125.8037, 3228.0, 3105.8489, 2842.3604, 3427.8328, 2898.3157, 2538.1343, 3034.4495]
2025-08-07 03:26:37,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:26:37,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 46 minutes, 57 seconds)
2025-08-07 03:28:17,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:28:35,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3261.94141 ± 154.093
2025-08-07 03:28:35,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3302.1577, 3306.4314, 2864.5247, 3232.8545, 3439.1404, 3295.3674, 3171.8503, 3275.7766, 3285.8855, 3445.427]
2025-08-07 03:28:35,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:28:35,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3261.94) for latency ExtremeSparseL4U32
2025-08-07 03:28:35,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 43 minutes, 48 seconds)
2025-08-07 03:30:33,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:30:51,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3344.07275 ± 152.334
2025-08-07 03:30:51,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3361.5383, 3625.0144, 3066.4614, 3301.4019, 3521.6003, 3203.5303, 3359.9849, 3216.5874, 3364.8572, 3419.7502]
2025-08-07 03:30:51,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:30:51,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3344.07) for latency ExtremeSparseL4U32
2025-08-07 03:30:51,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 42 minutes, 16 seconds)
2025-08-07 03:32:40,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:32:58,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3237.96777 ± 347.871
2025-08-07 03:32:58,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3187.916, 3395.0706, 3427.8188, 3193.291, 3112.3154, 3508.0537, 3284.4937, 2282.067, 3560.7842, 3427.8696]
2025-08-07 03:32:58,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:32:58,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 40 minutes, 6 seconds)
2025-08-07 03:34:48,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:35:05,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2833.45850 ± 863.344
2025-08-07 03:35:05,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3351.643, 3041.4885, 3305.8472, 2750.215, 2347.9082, 411.96326, 3250.4795, 3319.71, 3338.408, 3216.9238]
2025-08-07 03:35:05,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 797.0, 247.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:35:05,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 38 minutes, 2 seconds)
2025-08-07 03:36:54,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:37:11,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3188.16992 ± 506.646
2025-08-07 03:37:11,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3220.5813, 3317.5178, 3137.2102, 3500.5554, 3533.8777, 3583.5217, 1777.6405, 3154.9424, 3595.6582, 3060.194]
2025-08-07 03:37:11,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 573.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:37:11,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 35 minutes, 54 seconds)
2025-08-07 03:38:55,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:39:12,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3238.20508 ± 359.126
2025-08-07 03:39:12,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3543.4873, 3371.1016, 3152.0447, 3006.1587, 3195.1067, 3643.43, 3504.0918, 2321.7607, 3206.7505, 3438.1218]
2025-08-07 03:39:12,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 793.0, 1000.0, 1000.0]
2025-08-07 03:39:12,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 33 minutes, 58 seconds)
2025-08-07 03:40:58,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:41:16,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3203.20166 ± 149.050
2025-08-07 03:41:16,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3234.9192, 3294.1125, 3117.9624, 2959.843, 3302.5776, 3268.2341, 3481.0571, 3119.2773, 2992.0374, 3261.9956]
2025-08-07 03:41:16,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:41:16,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 31 minutes, 15 seconds)
2025-08-07 03:43:05,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:43:22,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3084.37720 ± 484.442
2025-08-07 03:43:22,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3164.272, 3282.6838, 3423.4326, 3461.9226, 3371.8335, 3097.6091, 3189.2336, 3134.7722, 1690.0637, 3027.9478]
2025-08-07 03:43:22,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 626.0, 1000.0]
2025-08-07 03:43:22,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 29 minutes, 8 seconds)
2025-08-07 03:45:12,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:45:30,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3304.15698 ± 122.959
2025-08-07 03:45:30,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3447.3894, 3338.3027, 3020.783, 3226.554, 3386.8271, 3321.047, 3425.125, 3385.1018, 3308.1948, 3182.2449]
2025-08-07 03:45:30,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:45:30,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 27 minutes, 6 seconds)
2025-08-07 03:47:28,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:47:46,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3318.09814 ± 204.771
2025-08-07 03:47:46,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3092.022, 3568.28, 3290.022, 2987.0496, 3093.852, 3410.6677, 3569.0464, 3509.1375, 3195.1807, 3465.724]
2025-08-07 03:47:46,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:47:46,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 25 minutes, 23 seconds)
2025-08-07 03:49:25,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:49:42,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2812.68457 ± 928.955
2025-08-07 03:49:42,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2503.336, 3327.7683, 3094.0964, 3015.1199, 120.777695, 3031.631, 3382.5203, 3070.4673, 3346.1877, 3234.942]
2025-08-07 03:49:42,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 66.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:49:42,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 23 minutes, 4 seconds)
2025-08-07 03:51:31,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:51:49,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3338.45850 ± 166.203
2025-08-07 03:51:49,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3383.4517, 3310.057, 3347.3347, 3170.1716, 3410.911, 3593.8118, 3260.9846, 2971.6313, 3454.6423, 3481.5867]
2025-08-07 03:51:49,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:51:49,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 21 minutes, 5 seconds)
2025-08-07 03:53:42,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:54:00,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3165.51025 ± 214.183
2025-08-07 03:54:00,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2924.7913, 3367.081, 3309.1562, 3052.0234, 3416.332, 3385.366, 3155.2297, 3229.3027, 3106.0488, 2709.7732]
2025-08-07 03:54:00,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:54:00,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 19 minutes, 8 seconds)
2025-08-07 03:55:50,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:56:08,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3426.25439 ± 160.300
2025-08-07 03:56:08,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3439.1548, 3114.982, 3558.712, 3560.001, 3594.726, 3255.279, 3414.0674, 3527.3289, 3233.2832, 3565.0103]
2025-08-07 03:56:08,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:56:08,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3426.25) for latency ExtremeSparseL4U32
2025-08-07 03:56:08,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 17 minutes)
2025-08-07 03:57:57,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:58:13,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3184.28418 ± 847.379
2025-08-07 03:58:13,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3631.801, 3445.6274, 730.4267, 3494.6997, 2886.7402, 3544.5164, 3439.151, 3581.135, 3327.472, 3761.2715]
2025-08-07 03:58:13,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 315.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:58:13,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 38 seconds)
2025-08-07 04:00:03,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:00:21,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3363.98169 ± 152.261
2025-08-07 04:00:21,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3298.3875, 3659.1235, 3384.7903, 3409.7136, 3533.489, 3263.5144, 3452.0767, 3115.3704, 3327.143, 3196.2104]
2025-08-07 04:00:21,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:00:21,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 46 seconds)
2025-08-07 04:02:10,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:02:28,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3435.72974 ± 184.448
2025-08-07 04:02:28,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3298.7817, 3323.3289, 3594.4316, 3447.769, 3510.946, 3079.5596, 3473.6768, 3716.527, 3643.0273, 3269.2517]
2025-08-07 04:02:28,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:02:28,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3435.73) for latency ExtremeSparseL4U32
2025-08-07 04:02:28,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 38 seconds)
2025-08-07 04:04:17,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:04:35,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3174.64380 ± 394.357
2025-08-07 04:04:35,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3627.3486, 2124.129, 3407.1934, 2896.1384, 3229.537, 3430.4712, 3202.6208, 3330.0964, 3321.5833, 3177.318]
2025-08-07 04:04:35,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 679.0, 1000.0, 944.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:04:35,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 27 seconds)
2025-08-07 04:06:24,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:06:42,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3523.32080 ± 146.781
2025-08-07 04:06:42,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3552.9258, 3412.3145, 3478.4587, 3569.0137, 3618.6914, 3268.4492, 3660.713, 3732.0571, 3302.9275, 3637.6572]
2025-08-07 04:06:42,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:06:42,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3523.32) for latency ExtremeSparseL4U32
2025-08-07 04:06:42,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 20 seconds)
2025-08-07 04:08:28,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:08:45,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3092.47705 ± 781.681
2025-08-07 04:08:45,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3344.441, 3192.4307, 3328.786, 3555.1528, 3441.15, 3050.0635, 3362.333, 3433.3691, 3434.8198, 782.2248]
2025-08-07 04:08:45,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 972.0, 1000.0, 1000.0, 304.0]
2025-08-07 04:08:45,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 12 seconds)
2025-08-07 04:10:35,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:10:52,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3039.76831 ± 825.136
2025-08-07 04:10:52,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3517.1375, 3573.423, 3481.7124, 3244.072, 1547.5668, 3305.32, 3332.4822, 1270.8091, 3558.6243, 3566.5352]
2025-08-07 04:10:52,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 527.0, 1000.0, 1000.0, 504.0, 1000.0, 1000.0]
2025-08-07 04:10:52,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 6 seconds)
2025-08-07 04:12:41,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:12:58,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3083.53052 ± 721.440
2025-08-07 04:12:58,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3616.5413, 1060.0754, 2744.8455, 3167.6113, 3176.3867, 3500.3667, 3101.086, 3354.292, 3550.9756, 3563.1252]
2025-08-07 04:12:58,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 359.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:12:58,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1251 [DEBUG]: Training session finished
