2025-08-07 00:48:04,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc10-ant/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:04,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc10-ant/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:04,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1459ea91c550>}
2025-08-07 00:48:04,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1111 [DEBUG]: using device: cuda
2025-08-07 00:48:04,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1133 [INFO]: Creating new trainer
2025-08-07 00:48:04,068 baseline-bpql-noiseperc10-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=283, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 00:48:04,068 baseline-bpql-noiseperc10-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 00:48:06,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1194 [DEBUG]: Starting training session...
2025-08-07 00:48:06,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 1/100
2025-08-07 00:49:47,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:49:48,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: -9.80810 ± 18.471
2025-08-07 00:49:48,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [-2.2478633, -0.17872131, -5.024481, -48.081333, 0.7028611, 27.25349, -16.349066, -23.223352, -14.147782, -16.784733]
2025-08-07 00:49:48,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [59.0, 58.0, 57.0, 76.0, 53.0, 37.0, 62.0, 70.0, 58.0, 69.0]
2025-08-07 00:49:48,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (-9.81) for latency ExtremeSparseL4U32
2025-08-07 00:49:48,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 48 minutes, 59 seconds)
2025-08-07 00:51:34,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:51:35,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: -26.35749 ± 60.899
2025-08-07 00:51:35,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [-39.70574, -13.87936, -23.347805, 22.671133, -171.68323, 14.628872, -14.115326, -96.98085, 38.952, 19.885418]
2025-08-07 00:51:35,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [74.0, 65.0, 78.0, 81.0, 179.0, 44.0, 93.0, 140.0, 43.0, 74.0]
2025-08-07 00:51:35,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 51 minutes, 7 seconds)
2025-08-07 00:53:20,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:53:22,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: -1.88134 ± 65.406
2025-08-07 00:53:22,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [-4.8470693, 36.78153, 15.044035, 23.121397, -195.2578, 12.883598, 18.220648, 26.516731, 15.299471, 33.42407]
2025-08-07 00:53:22,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [77.0, 62.0, 70.0, 68.0, 175.0, 47.0, 60.0, 53.0, 53.0, 41.0]
2025-08-07 00:53:22,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (-1.88) for latency ExtremeSparseL4U32
2025-08-07 00:53:22,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 50 minutes, 4 seconds)
2025-08-07 00:55:06,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:55:08,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: -28.21971 ± 57.099
2025-08-07 00:55:08,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [-23.368523, 20.706553, -26.418491, 16.64752, -0.12427019, -2.4842212, -70.87617, -129.18529, -119.12593, 52.03173]
2025-08-07 00:55:08,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [111.0, 93.0, 115.0, 75.0, 101.0, 91.0, 132.0, 186.0, 142.0, 89.0]
2025-08-07 00:55:08,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 48 minutes, 44 seconds)
2025-08-07 00:56:55,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:56:58,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: -77.83917 ± 189.680
2025-08-07 00:56:58,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [-41.952442, -636.07007, -43.839756, 36.60898, -61.868126, 38.50111, -5.708714, 4.312338, -73.35831, 4.9832644]
2025-08-07 00:56:58,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [104.0, 1000.0, 119.0, 60.0, 263.0, 70.0, 94.0, 82.0, 251.0, 87.0]
2025-08-07 00:56:58,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 48 minutes, 35 seconds)
2025-08-07 00:58:49,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:58:55,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: -10.28305 ± 66.306
2025-08-07 00:58:55,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [-122.566025, 61.338364, 79.292915, 35.55831, -17.736723, 11.512321, -0.1412755, -111.70806, 30.92891, -69.30924]
2025-08-07 00:58:55,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 159.0, 233.0, 197.0, 132.0, 230.0, 96.0, 1000.0, 71.0, 264.0]
2025-08-07 00:58:55,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 51 minutes, 21 seconds)
2025-08-07 01:00:33,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:00:39,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 114.43724 ± 90.550
2025-08-07 01:00:39,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [59.429432, 114.01293, 97.40691, 88.11958, 28.41072, 53.165955, 87.92051, 360.98087, 77.445076, 177.4805]
2025-08-07 01:00:39,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [112.0, 273.0, 278.0, 262.0, 167.0, 161.0, 183.0, 1000.0, 185.0, 697.0]
2025-08-07 01:00:39,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (114.44) for latency ExtremeSparseL4U32
2025-08-07 01:00:39,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 48 minutes, 32 seconds)
2025-08-07 01:02:25,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:02:36,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 280.37253 ± 157.902
2025-08-07 01:02:36,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [141.75479, 541.0589, 318.38315, 147.38144, 481.22723, 274.0609, 147.7615, 483.2535, 159.85721, 108.98655]
2025-08-07 01:02:36,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [271.0, 1000.0, 1000.0, 224.0, 1000.0, 1000.0, 283.0, 1000.0, 225.0, 202.0]
2025-08-07 01:02:36,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (280.37) for latency ExtremeSparseL4U32
2025-08-07 01:02:36,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 49 minutes, 58 seconds)
2025-08-07 01:04:23,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:04:33,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 297.05109 ± 166.377
2025-08-07 01:04:33,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [165.94958, 165.14742, 290.6583, 328.20724, 306.36142, 691.20776, 360.13892, 96.17677, 426.61618, 140.04741]
2025-08-07 01:04:33,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [276.0, 246.0, 523.0, 708.0, 782.0, 1000.0, 1000.0, 226.0, 683.0, 295.0]
2025-08-07 01:04:33,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (297.05) for latency ExtremeSparseL4U32
2025-08-07 01:04:33,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 51 minutes, 31 seconds)
2025-08-07 01:06:24,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:06:32,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 283.32507 ± 179.985
2025-08-07 01:06:32,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [101.95521, 639.38184, 379.34177, 374.84048, 169.16182, 535.2633, 119.19861, 192.23698, 111.03002, 210.8409]
2025-08-07 01:06:32,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [251.0, 1000.0, 630.0, 609.0, 271.0, 957.0, 196.0, 220.0, 143.0, 284.0]
2025-08-07 01:06:32,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 52 minutes, 5 seconds)
2025-08-07 01:08:14,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:08:28,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 412.81763 ± 154.267
2025-08-07 01:08:28,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [371.58472, 213.18631, 496.14847, 596.91736, 549.9572, 190.48763, 499.62735, 537.1004, 174.1814, 498.98538]
2025-08-07 01:08:28,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 276.0, 841.0, 1000.0, 1000.0, 301.0, 1000.0, 969.0, 169.0, 1000.0]
2025-08-07 01:08:28,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (412.82) for latency ExtremeSparseL4U32
2025-08-07 01:08:28,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 49 minutes, 50 seconds)
2025-08-07 01:10:11,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:10:22,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 401.40451 ± 266.912
2025-08-07 01:10:22,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [659.4248, 192.55855, 718.2104, 768.44916, 42.161724, 465.05884, 110.645004, 137.41069, 651.0037, 269.12265]
2025-08-07 01:10:22,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 170.0, 1000.0, 1000.0, 72.0, 706.0, 179.0, 212.0, 1000.0, 632.0]
2025-08-07 01:10:22,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 51 minutes)
2025-08-07 01:12:06,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:12:12,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 244.91011 ± 265.411
2025-08-07 01:12:12,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [440.65335, 721.5213, 53.00459, 95.45207, 734.6202, 56.348972, 72.66854, 103.22534, 130.80176, 40.804993]
2025-08-07 01:12:12,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [619.0, 1000.0, 81.0, 292.0, 1000.0, 77.0, 74.0, 84.0, 134.0, 57.0]
2025-08-07 01:12:12,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 46 minutes, 59 seconds)
2025-08-07 01:14:06,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:14:13,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 244.77304 ± 228.121
2025-08-07 01:14:13,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [57.180496, 59.877663, 148.47478, 605.264, 303.7006, 187.39027, 85.43623, 746.3079, 140.79576, 113.30298]
2025-08-07 01:14:13,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [79.0, 103.0, 193.0, 1000.0, 698.0, 212.0, 76.0, 1000.0, 234.0, 169.0]
2025-08-07 01:14:13,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 46 minutes, 10 seconds)
2025-08-07 01:15:48,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:16:00,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 498.19751 ± 231.538
2025-08-07 01:16:00,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [589.5221, 661.1892, 405.50156, 705.44244, 793.92365, 666.58417, 273.09988, 159.95424, 620.6822, 106.07555]
2025-08-07 01:16:00,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [762.0, 1000.0, 471.0, 1000.0, 1000.0, 1000.0, 354.0, 257.0, 1000.0, 111.0]
2025-08-07 01:16:00,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (498.20) for latency ExtremeSparseL4U32
2025-08-07 01:16:00,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 41 minutes, 3 seconds)
2025-08-07 01:17:47,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:17:53,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 242.62581 ± 224.294
2025-08-07 01:17:53,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [720.8251, 197.34969, 174.6895, 169.0815, 223.67946, 84.420906, 75.913185, 631.5157, 48.8039, 99.97894]
2025-08-07 01:17:53,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 181.0, 235.0, 190.0, 266.0, 125.0, 165.0, 1000.0, 72.0, 171.0]
2025-08-07 01:17:53,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 38 minutes, 24 seconds)
2025-08-07 01:19:37,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:19:44,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 251.42618 ± 218.883
2025-08-07 01:19:44,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [286.43243, 45.008278, 522.44684, 240.78491, 66.402565, 73.78626, 71.17208, 118.37454, 353.3153, 736.5384]
2025-08-07 01:19:44,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [602.0, 69.0, 1000.0, 456.0, 69.0, 72.0, 71.0, 193.0, 473.0, 1000.0]
2025-08-07 01:19:44,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 35 minutes, 23 seconds)
2025-08-07 01:21:28,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:21:33,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 155.69467 ± 164.236
2025-08-07 01:21:33,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [65.459, 66.88548, 146.04716, 624.2248, 145.86102, 135.64708, 184.23338, 25.67541, 32.21328, 130.70003]
2025-08-07 01:21:33,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [64.0, 301.0, 176.0, 1000.0, 250.0, 263.0, 160.0, 45.0, 68.0, 174.0]
2025-08-07 01:21:33,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 33 minutes, 19 seconds)
2025-08-07 01:23:20,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:23:26,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 219.14334 ± 248.795
2025-08-07 01:23:26,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [132.53543, 40.44258, 221.75157, 125.09366, 61.74983, 730.8892, 105.18834, 32.109467, 679.1053, 62.568016]
2025-08-07 01:23:26,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [163.0, 54.0, 340.0, 214.0, 64.0, 1000.0, 114.0, 49.0, 1000.0, 101.0]
2025-08-07 01:23:26,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 29 minutes, 14 seconds)
2025-08-07 01:25:10,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:25:17,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 287.79617 ± 250.275
2025-08-07 01:25:17,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [671.7, 71.75062, 145.94604, 593.36755, 116.33012, 709.31335, 256.4538, 55.29272, 184.63257, 73.17485]
2025-08-07 01:25:17,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 81.0, 148.0, 1000.0, 160.0, 1000.0, 290.0, 57.0, 338.0, 55.0]
2025-08-07 01:25:17,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 28 minutes, 32 seconds)
2025-08-07 01:26:59,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:27:06,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 256.90085 ± 230.762
2025-08-07 01:27:06,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [151.22592, 110.31098, 655.96155, 628.35443, 135.48383, 15.605536, 99.27021, 84.798935, 517.23975, 170.75752]
2025-08-07 01:27:06,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [151.0, 115.0, 1000.0, 1000.0, 143.0, 28.0, 176.0, 93.0, 1000.0, 205.0]
2025-08-07 01:27:06,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 25 minutes, 30 seconds)
2025-08-07 01:28:50,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:28:58,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 352.83383 ± 195.198
2025-08-07 01:28:58,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [257.31805, 295.7094, 273.50925, 122.13862, 292.0517, 693.2365, 134.67749, 657.2817, 251.88707, 550.52893]
2025-08-07 01:28:58,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [228.0, 338.0, 286.0, 144.0, 354.0, 1000.0, 139.0, 1000.0, 228.0, 815.0]
2025-08-07 01:28:58,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 24 minutes, 3 seconds)
2025-08-07 01:30:49,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:30:54,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 202.78207 ± 188.955
2025-08-07 01:30:54,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [361.61044, 127.7196, 121.051155, 260.23074, 702.8295, 82.401024, 61.910137, 66.734535, 113.072464, 130.26122]
2025-08-07 01:30:54,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [600.0, 162.0, 121.0, 376.0, 1000.0, 140.0, 82.0, 147.0, 149.0, 162.0]
2025-08-07 01:30:54,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 24 minutes, 1 second)
2025-08-07 01:32:37,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:32:46,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 346.04242 ± 192.893
2025-08-07 01:32:46,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [531.4802, 604.2884, 351.76663, 256.6093, 118.72464, 125.4799, 238.07591, 107.79625, 562.65204, 563.5507]
2025-08-07 01:32:46,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [638.0, 1000.0, 441.0, 629.0, 152.0, 160.0, 333.0, 156.0, 1000.0, 1000.0]
2025-08-07 01:32:46,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 22 minutes, 4 seconds)
2025-08-07 01:34:24,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:34:28,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 159.73047 ± 125.159
2025-08-07 01:34:28,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [93.65759, 56.49593, 97.013626, 475.04144, 136.53455, 313.45926, 93.69167, 123.12191, 137.45341, 70.835396]
2025-08-07 01:34:28,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [115.0, 78.0, 148.0, 1000.0, 283.0, 466.0, 122.0, 116.0, 123.0, 65.0]
2025-08-07 01:34:28,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 17 minutes, 43 seconds)
2025-08-07 01:36:18,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:36:27,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 394.71240 ± 210.394
2025-08-07 01:36:27,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [594.7796, 244.58295, 399.71988, 207.0028, 457.0187, 615.5449, 633.44073, 132.76393, 49.632076, 612.63855]
2025-08-07 01:36:27,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [798.0, 361.0, 406.0, 194.0, 615.0, 1000.0, 766.0, 190.0, 74.0, 1000.0]
2025-08-07 01:36:27,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 18 minutes, 22 seconds)
2025-08-07 01:38:05,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:38:12,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 336.83505 ± 227.712
2025-08-07 01:38:12,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [214.41411, 126.83959, 678.98346, 205.0215, 460.49902, 320.89832, 455.08203, 55.71492, 742.2595, 108.637825]
2025-08-07 01:38:12,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [276.0, 165.0, 1000.0, 242.0, 430.0, 375.0, 521.0, 81.0, 1000.0, 116.0]
2025-08-07 01:38:12,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 14 minutes, 47 seconds)
2025-08-07 01:39:55,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:40:02,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 269.88599 ± 201.238
2025-08-07 01:40:02,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [357.2794, 110.89208, 206.43987, 136.53441, 68.84491, 260.32697, 139.0793, 623.6372, 138.39366, 657.432]
2025-08-07 01:40:02,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [450.0, 98.0, 140.0, 158.0, 60.0, 362.0, 128.0, 1000.0, 137.0, 1000.0]
2025-08-07 01:40:02,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 11 minutes, 28 seconds)
2025-08-07 01:41:49,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:42:00,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 449.16513 ± 276.414
2025-08-07 01:42:00,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [922.0434, 89.66881, 110.944885, 571.2896, 619.3353, 351.68924, 746.41644, 406.62997, 83.82069, 589.81274]
2025-08-07 01:42:00,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 103.0, 133.0, 655.0, 1000.0, 363.0, 1000.0, 482.0, 72.0, 1000.0]
2025-08-07 01:42:00,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 10 minutes, 54 seconds)
2025-08-07 01:43:40,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:43:46,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 321.29071 ± 314.293
2025-08-07 01:43:46,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [272.47363, 156.68462, 82.33508, 130.77919, 68.15875, 712.6873, 609.66394, 996.82214, 54.960762, 128.34203]
2025-08-07 01:43:46,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [266.0, 123.0, 94.0, 167.0, 62.0, 1000.0, 595.0, 965.0, 46.0, 131.0]
2025-08-07 01:43:46,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 10 minutes, 6 seconds)
2025-08-07 01:45:34,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:45:41,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 344.61874 ± 247.012
2025-08-07 01:45:41,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [565.53, 246.90681, 246.56049, 652.78326, 102.27221, 659.88745, 170.22685, 49.579994, 667.45636, 84.98402]
2025-08-07 01:45:41,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [610.0, 269.0, 251.0, 740.0, 94.0, 1000.0, 196.0, 65.0, 1000.0, 98.0]
2025-08-07 01:45:41,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 7 minutes, 26 seconds)
2025-08-07 01:47:29,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:47:36,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 320.72076 ± 222.267
2025-08-07 01:47:36,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [341.69205, 691.0673, 50.871582, 246.0299, 151.64793, 700.2777, 231.56042, 506.83926, 148.98917, 138.23192]
2025-08-07 01:47:36,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [274.0, 1000.0, 67.0, 270.0, 124.0, 1000.0, 242.0, 618.0, 151.0, 161.0]
2025-08-07 01:47:36,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 7 minutes, 48 seconds)
2025-08-07 01:49:17,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:49:26,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 443.19598 ± 322.353
2025-08-07 01:49:26,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [170.48404, 585.77765, 946.03796, 215.91882, 202.81749, 942.41406, 250.42976, 163.32425, 149.44128, 805.3144]
2025-08-07 01:49:26,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [176.0, 702.0, 1000.0, 285.0, 199.0, 983.0, 451.0, 122.0, 164.0, 1000.0]
2025-08-07 01:49:26,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 6 minutes, 6 seconds)
2025-08-07 01:51:06,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:51:12,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 282.55112 ± 217.715
2025-08-07 01:51:12,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [382.01, 108.52257, 713.48267, 621.123, 139.97212, 158.68854, 361.3517, 103.647934, 65.069984, 171.64256]
2025-08-07 01:51:12,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [395.0, 75.0, 1000.0, 1000.0, 125.0, 148.0, 361.0, 90.0, 54.0, 220.0]
2025-08-07 01:51:12,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 1 minute, 25 seconds)
2025-08-07 01:53:03,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:53:13,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 401.79172 ± 275.952
2025-08-07 01:53:13,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [223.88823, 160.17139, 549.20746, 169.36911, 605.90137, 60.587486, 860.9288, 624.6841, 678.5125, 84.66688]
2025-08-07 01:53:13,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [220.0, 128.0, 692.0, 131.0, 1000.0, 62.0, 1000.0, 1000.0, 1000.0, 88.0]
2025-08-07 01:53:13,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 2 minutes, 47 seconds)
2025-08-07 01:54:54,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:55:03,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 461.03305 ± 273.911
2025-08-07 01:55:03,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [158.44809, 534.66833, 205.26433, 483.90744, 386.18857, 636.10706, 665.60974, 1066.3708, 91.40738, 382.35883]
2025-08-07 01:55:03,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [107.0, 528.0, 196.0, 530.0, 406.0, 1000.0, 1000.0, 1000.0, 104.0, 410.0]
2025-08-07 01:55:03,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 59 minutes, 53 seconds)
2025-08-07 01:56:51,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:56:57,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 304.97723 ± 271.260
2025-08-07 01:56:57,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [127.89254, 892.90137, 328.02032, 169.96843, 65.53689, 121.80379, 169.24968, 350.3116, 91.29386, 732.7937]
2025-08-07 01:56:57,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [130.0, 1000.0, 309.0, 191.0, 71.0, 127.0, 186.0, 331.0, 68.0, 1000.0]
2025-08-07 01:56:57,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 57 minutes, 51 seconds)
2025-08-07 01:58:36,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:58:44,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 369.10492 ± 303.666
2025-08-07 01:58:44,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [181.55147, 409.5372, 460.19275, 535.65436, 122.13537, 956.8338, 37.97271, 80.142296, 110.39207, 796.637]
2025-08-07 01:58:44,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [209.0, 397.0, 550.0, 1000.0, 103.0, 1000.0, 43.0, 78.0, 130.0, 1000.0]
2025-08-07 01:58:44,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 55 minutes, 20 seconds)
2025-08-07 02:00:27,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:00:34,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 342.35849 ± 267.133
2025-08-07 02:00:34,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [770.3958, 36.920425, 372.28262, 655.97784, 226.21544, 177.41197, 83.71708, 63.3228, 731.05414, 306.2868]
2025-08-07 02:00:34,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [869.0, 52.0, 341.0, 1000.0, 266.0, 211.0, 85.0, 73.0, 1000.0, 384.0]
2025-08-07 02:00:34,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 54 minutes, 23 seconds)
2025-08-07 02:02:17,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:02:24,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 359.41565 ± 277.077
2025-08-07 02:02:24,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [620.8869, 197.65294, 391.61542, 1009.6529, 182.7775, 509.89377, 129.06258, 367.3342, 88.962166, 96.3181]
2025-08-07 02:02:24,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 165.0, 355.0, 1000.0, 131.0, 486.0, 103.0, 360.0, 66.0, 109.0]
2025-08-07 02:02:24,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 50 minutes, 11 seconds)
2025-08-07 02:04:12,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:04:21,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 540.03955 ± 433.384
2025-08-07 02:04:21,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [332.2214, 1006.4126, 464.4624, 196.25374, 67.2657, 1171.3416, 93.994484, 1031.1003, 982.65356, 54.68955]
2025-08-07 02:04:21,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [332.0, 1000.0, 447.0, 199.0, 73.0, 1000.0, 87.0, 1000.0, 1000.0, 66.0]
2025-08-07 02:04:21,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (540.04) for latency ExtremeSparseL4U32
2025-08-07 02:04:21,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 49 minutes, 48 seconds)
2025-08-07 02:06:06,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:06:15,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 435.83920 ± 340.256
2025-08-07 02:06:15,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [289.6724, 644.55914, 952.0434, 96.86731, 626.7612, 51.363216, 528.9305, 63.994156, 131.85886, 972.3419]
2025-08-07 02:06:15,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [247.0, 1000.0, 1000.0, 123.0, 1000.0, 59.0, 593.0, 117.0, 146.0, 1000.0]
2025-08-07 02:06:15,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 47 minutes, 53 seconds)
2025-08-07 02:07:54,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:08:00,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 270.80902 ± 262.441
2025-08-07 02:08:00,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [102.80563, 800.10614, 38.70455, 64.5689, 679.3498, 364.83325, 185.14275, 36.93122, 75.40479, 360.24286]
2025-08-07 02:08:00,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [75.0, 782.0, 42.0, 93.0, 1000.0, 402.0, 162.0, 54.0, 81.0, 299.0]
2025-08-07 02:08:00,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 45 minutes, 29 seconds)
2025-08-07 02:09:53,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:10:03,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 555.85754 ± 293.259
2025-08-07 02:10:03,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [945.84393, 355.65576, 762.49945, 852.75006, 697.34644, 164.43704, 134.09967, 856.52716, 542.26373, 247.15257]
2025-08-07 02:10:03,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 358.0, 1000.0, 821.0, 653.0, 194.0, 187.0, 1000.0, 449.0, 286.0]
2025-08-07 02:10:03,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (555.86) for latency ExtremeSparseL4U32
2025-08-07 02:10:03,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 46 minutes, 14 seconds)
2025-08-07 02:11:49,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:11:59,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 571.40668 ± 371.917
2025-08-07 02:11:59,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [377.89554, 49.145355, 129.93608, 996.15247, 1102.2976, 334.9884, 943.7116, 227.4167, 873.98755, 678.53564]
2025-08-07 02:11:59,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [380.0, 43.0, 119.0, 1000.0, 1000.0, 308.0, 851.0, 227.0, 699.0, 1000.0]
2025-08-07 02:11:59,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (571.41) for latency ExtremeSparseL4U32
2025-08-07 02:11:59,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 45 minutes, 23 seconds)
2025-08-07 02:13:40,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:13:47,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 403.40265 ± 300.512
2025-08-07 02:13:47,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [189.44635, 608.94977, 546.8401, 373.38345, 1084.666, 623.0487, 276.9711, 112.73611, 169.75592, 48.229183]
2025-08-07 02:13:47,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [198.0, 658.0, 565.0, 322.0, 988.0, 578.0, 263.0, 127.0, 114.0, 53.0]
2025-08-07 02:13:47,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 41 minutes, 44 seconds)
2025-08-07 02:15:24,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:15:38,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 813.39929 ± 241.924
2025-08-07 02:15:38,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [501.42755, 1190.5945, 618.3286, 1109.8364, 764.40607, 1124.8987, 596.69135, 811.3371, 548.1952, 868.2776]
2025-08-07 02:15:38,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [470.0, 1000.0, 488.0, 1000.0, 1000.0, 1000.0, 496.0, 729.0, 1000.0, 1000.0]
2025-08-07 02:15:38,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (813.40) for latency ExtremeSparseL4U32
2025-08-07 02:15:38,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 39 minutes, 31 seconds)
2025-08-07 02:17:29,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:17:43,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 914.08350 ± 178.885
2025-08-07 02:17:43,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1170.5164, 1077.0453, 814.5965, 800.5894, 808.0578, 1129.9839, 689.97736, 722.2753, 797.9544, 1129.8385]
2025-08-07 02:17:43,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 744.0, 644.0, 1000.0, 1000.0, 606.0, 765.0, 616.0, 1000.0]
2025-08-07 02:17:43,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (914.08) for latency ExtremeSparseL4U32
2025-08-07 02:17:43,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 41 minutes, 11 seconds)
2025-08-07 02:19:24,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:19:35,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 654.10449 ± 350.161
2025-08-07 02:19:35,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [764.0692, 149.31773, 146.82292, 1313.9149, 523.2427, 422.19147, 988.23206, 862.1333, 546.3848, 824.73596]
2025-08-07 02:19:35,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 178.0, 142.0, 1000.0, 404.0, 430.0, 1000.0, 697.0, 454.0, 1000.0]
2025-08-07 02:19:35,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 37 minutes, 7 seconds)
2025-08-07 02:21:18,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:21:27,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 583.46436 ± 437.162
2025-08-07 02:21:27,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [549.9781, 195.23972, 78.26916, 1210.3949, 779.9819, 1306.4347, 282.31924, 1009.3423, 192.81978, 229.86389]
2025-08-07 02:21:27,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [524.0, 171.0, 88.0, 977.0, 1000.0, 1000.0, 219.0, 961.0, 177.0, 156.0]
2025-08-07 02:21:27,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 34 minutes, 42 seconds)
2025-08-07 02:23:11,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:23:22,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 722.82867 ± 487.594
2025-08-07 02:23:22,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [66.80592, 1229.6932, 335.15497, 949.0393, 1163.7249, 1185.6393, 1187.2509, 80.85996, 942.84595, 87.27214]
2025-08-07 02:23:22,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [57.0, 1000.0, 328.0, 1000.0, 1000.0, 1000.0, 986.0, 95.0, 1000.0, 80.0]
2025-08-07 02:23:22,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 34 minutes, 1 second)
2025-08-07 02:25:11,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:25:16,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 343.45203 ± 319.502
2025-08-07 02:25:16,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [220.2635, 148.20592, 45.50478, 403.80887, 177.6069, 751.0862, 29.284096, 297.33008, 1100.9005, 260.52954]
2025-08-07 02:25:16,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [208.0, 124.0, 61.0, 316.0, 251.0, 597.0, 55.0, 251.0, 1000.0, 216.0]
2025-08-07 02:25:16,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 32 minutes, 25 seconds)
2025-08-07 02:27:01,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:27:12,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 666.64893 ± 331.938
2025-08-07 02:27:12,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [786.1597, 608.29443, 418.7286, 191.82518, 746.50757, 785.06775, 79.58642, 869.12866, 1216.5197, 964.6712]
2025-08-07 02:27:12,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [632.0, 478.0, 344.0, 195.0, 1000.0, 1000.0, 66.0, 1000.0, 1000.0, 822.0]
2025-08-07 02:27:12,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 29 minutes, 5 seconds)
2025-08-07 02:28:53,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:29:03,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 681.67029 ± 441.328
2025-08-07 02:29:03,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [397.14288, 893.76105, 891.04785, 242.40819, 154.8682, 390.80814, 1285.3075, 1214.2421, 1209.6161, 137.5009]
2025-08-07 02:29:03,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [341.0, 817.0, 1000.0, 225.0, 145.0, 283.0, 1000.0, 1000.0, 983.0, 123.0]
2025-08-07 02:29:03,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 27 minutes, 10 seconds)
2025-08-07 02:30:51,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:30:59,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 545.92664 ± 415.560
2025-08-07 02:30:59,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1231.768, 793.19336, 700.0702, 86.36185, 178.28235, 114.021164, 1132.6317, 339.96753, 102.17989, 780.79047]
2025-08-07 02:30:59,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 643.0, 632.0, 60.0, 176.0, 96.0, 912.0, 305.0, 105.0, 604.0]
2025-08-07 02:30:59,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 25 minutes, 49 seconds)
2025-08-07 02:32:40,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:32:48,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 509.43423 ± 352.900
2025-08-07 02:32:48,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [194.36589, 485.0554, 770.9512, 132.1851, 1017.36487, 643.0414, 33.069565, 153.11336, 1051.9296, 613.2661]
2025-08-07 02:32:48,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [150.0, 405.0, 1000.0, 122.0, 875.0, 552.0, 42.0, 138.0, 1000.0, 510.0]
2025-08-07 02:32:48,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 23 minutes, 1 second)
2025-08-07 02:34:36,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:34:47,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 790.75623 ± 448.845
2025-08-07 02:34:47,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1279.4149, 1259.3461, 204.0015, 79.31444, 442.69012, 935.1798, 1372.204, 604.5522, 565.5054, 1165.3542]
2025-08-07 02:34:47,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 159.0, 83.0, 366.0, 701.0, 1000.0, 507.0, 464.0, 845.0]
2025-08-07 02:34:47,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 21 minutes, 51 seconds)
2025-08-07 02:36:34,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:36:44,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 747.02087 ± 371.733
2025-08-07 02:36:44,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [425.09915, 446.52487, 844.4702, 1325.0161, 369.8774, 1244.2291, 240.48643, 658.9493, 1183.3046, 732.2516]
2025-08-07 02:36:44,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [280.0, 299.0, 577.0, 1000.0, 268.0, 1000.0, 188.0, 525.0, 1000.0, 541.0]
2025-08-07 02:36:44,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 20 minutes, 3 seconds)
2025-08-07 02:38:27,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:38:45,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1324.64771 ± 39.811
2025-08-07 02:38:45,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1333.3983, 1256.015, 1353.8038, 1366.9381, 1379.7314, 1320.7394, 1323.7214, 1257.0598, 1307.6844, 1347.3851]
2025-08-07 02:38:45,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 976.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:38:45,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (1324.65) for latency ExtremeSparseL4U32
2025-08-07 02:38:45,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 19 minutes, 33 seconds)
2025-08-07 02:40:29,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:40:42,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 932.67908 ± 491.160
2025-08-07 02:40:42,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [172.72977, 1335.0669, 1268.8291, 1333.6582, 1243.686, 191.70662, 1385.5571, 800.6646, 1294.5857, 300.3065]
2025-08-07 02:40:42,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [147.0, 1000.0, 1000.0, 1000.0, 1000.0, 179.0, 1000.0, 1000.0, 970.0, 238.0]
2025-08-07 02:40:42,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 17 minutes, 47 seconds)
2025-08-07 02:42:31,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:42:43,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 958.66669 ± 459.844
2025-08-07 02:42:43,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1179.3712, 415.32004, 553.49255, 1475.191, 482.60025, 1344.0093, 210.62273, 1181.3723, 1453.3154, 1291.3717]
2025-08-07 02:42:43,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 294.0, 400.0, 1000.0, 376.0, 1000.0, 187.0, 768.0, 1000.0, 1000.0]
2025-08-07 02:42:43,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 17 minutes, 18 seconds)
2025-08-07 02:44:20,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:44:32,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 850.69891 ± 458.977
2025-08-07 02:44:32,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1388.6654, 978.80615, 335.11276, 768.3498, 140.77866, 1440.6312, 1168.9143, 403.3605, 517.8593, 1364.5111]
2025-08-07 02:44:32,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 640.0, 306.0, 543.0, 127.0, 1000.0, 1000.0, 282.0, 430.0, 1000.0]
2025-08-07 02:44:32,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 14 minutes, 2 seconds)
2025-08-07 02:46:18,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:46:33,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1172.99780 ± 412.500
2025-08-07 02:46:33,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1461.1002, 1327.1229, 1268.5985, 123.15626, 1048.4978, 1421.7327, 1563.432, 1408.2848, 1345.5923, 762.46094]
2025-08-07 02:46:33,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 108.0, 842.0, 1000.0, 1000.0, 1000.0, 1000.0, 542.0]
2025-08-07 02:46:33,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 12 minutes, 38 seconds)
2025-08-07 02:48:16,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:48:27,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 910.37872 ± 501.417
2025-08-07 02:48:27,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1375.0847, 1380.5896, 1390.4807, 173.94717, 926.48926, 314.6695, 609.73157, 1344.6692, 1361.1271, 226.99841]
2025-08-07 02:48:27,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 988.0, 121.0, 669.0, 261.0, 498.0, 1000.0, 1000.0, 158.0]
2025-08-07 02:48:27,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 9 minutes, 51 seconds)
2025-08-07 02:50:12,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:50:25,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1069.82288 ± 394.302
2025-08-07 02:50:25,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1121.9733, 613.9669, 1370.0671, 364.69455, 1342.6405, 1186.1744, 1433.65, 491.30038, 1406.7703, 1366.9906]
2025-08-07 02:50:25,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [745.0, 429.0, 1000.0, 305.0, 1000.0, 876.0, 1000.0, 448.0, 1000.0, 1000.0]
2025-08-07 02:50:25,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 8 minutes, 1 second)
2025-08-07 02:52:17,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:52:26,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 682.89172 ± 517.227
2025-08-07 02:52:26,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [336.22903, 33.932037, 1200.5251, 1356.8568, 350.10315, 1440.7675, 626.62067, 222.27937, 1133.7233, 127.880295]
2025-08-07 02:52:26,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [325.0, 41.0, 899.0, 1000.0, 228.0, 1000.0, 450.0, 190.0, 937.0, 153.0]
2025-08-07 02:52:26,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 6 minutes, 1 second)
2025-08-07 02:54:10,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:54:24,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1019.65393 ± 503.161
2025-08-07 02:54:24,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [95.94977, 194.19841, 741.00726, 1472.3876, 1352.4615, 826.2841, 1464.4698, 1513.9496, 1288.8717, 1246.9595]
2025-08-07 02:54:24,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [73.0, 159.0, 573.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:54:24,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 5 minutes, 6 seconds)
2025-08-07 02:56:06,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:56:17,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 841.61682 ± 526.943
2025-08-07 02:56:17,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [569.3733, 1454.6777, 996.79395, 126.6505, 1214.1554, 227.3086, 1521.7354, 1423.598, 717.4854, 164.3893]
2025-08-07 02:56:17,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [496.0, 1000.0, 663.0, 119.0, 885.0, 178.0, 1000.0, 1000.0, 510.0, 139.0]
2025-08-07 02:56:17,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 2 minutes, 16 seconds)
2025-08-07 02:58:01,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:58:16,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 931.57361 ± 354.667
2025-08-07 02:58:16,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [776.2921, 1325.5151, 464.69012, 1365.2911, 385.25656, 648.9769, 1108.3824, 732.5138, 1168.2628, 1340.5552]
2025-08-07 02:58:16,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [509.0, 974.0, 467.0, 1000.0, 394.0, 1000.0, 1000.0, 603.0, 1000.0, 1000.0]
2025-08-07 02:58:16,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 46 seconds)
2025-08-07 03:00:04,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:00:19,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1091.94019 ± 399.947
2025-08-07 03:00:19,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [425.9354, 1297.2643, 1296.5265, 1422.7402, 958.4103, 1372.7112, 1322.1367, 1398.1898, 248.22675, 1177.2621]
2025-08-07 03:00:19,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [354.0, 1000.0, 1000.0, 1000.0, 785.0, 1000.0, 1000.0, 1000.0, 191.0, 885.0]
2025-08-07 03:00:19,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 59 minutes, 20 seconds)
2025-08-07 03:02:03,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:02:13,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 827.24200 ± 492.881
2025-08-07 03:02:13,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [390.78644, 327.39404, 1207.5978, 855.3214, 1397.4402, 313.04678, 1510.8757, 938.6549, 1266.4855, 64.817154]
2025-08-07 03:02:13,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [297.0, 207.0, 1000.0, 585.0, 1000.0, 221.0, 1000.0, 632.0, 776.0, 67.0]
2025-08-07 03:02:13,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 56 minutes, 46 seconds)
2025-08-07 03:03:53,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:04:02,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 679.52747 ± 478.470
2025-08-07 03:04:02,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [197.17668, 281.63852, 1499.5015, 906.24866, 285.78516, 201.8274, 1167.5631, 405.8342, 501.81586, 1347.8835]
2025-08-07 03:04:02,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [160.0, 207.0, 1000.0, 643.0, 226.0, 147.0, 1000.0, 345.0, 333.0, 923.0]
2025-08-07 03:04:02,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 53 minutes, 59 seconds)
2025-08-07 03:05:47,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:05:59,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 928.58673 ± 527.646
2025-08-07 03:05:59,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1353.6796, 1354.1473, 1544.7969, 814.85297, 58.41817, 1253.4644, 361.9822, 1413.7773, 149.07318, 981.67535]
2025-08-07 03:05:59,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 853.0, 1000.0, 525.0, 47.0, 917.0, 250.0, 1000.0, 128.0, 1000.0]
2025-08-07 03:05:59,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 52 minutes, 21 seconds)
2025-08-07 03:07:45,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:07:53,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 548.46887 ± 317.722
2025-08-07 03:07:53,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [723.99194, 754.6699, 623.43823, 149.96806, 1150.3607, 71.24736, 317.46133, 285.13528, 631.7928, 776.6229]
2025-08-07 03:07:53,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [437.0, 1000.0, 376.0, 127.0, 847.0, 64.0, 205.0, 193.0, 1000.0, 475.0]
2025-08-07 03:07:53,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 50 minutes, 3 seconds)
2025-08-07 03:09:42,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:09:53,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 664.01599 ± 410.896
2025-08-07 03:09:53,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1331.1066, 439.1652, 765.40686, 56.140705, 1357.2496, 692.65234, 364.96246, 769.6307, 183.64484, 680.2005]
2025-08-07 03:09:53,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 274.0, 1000.0, 52.0, 1000.0, 1000.0, 310.0, 566.0, 146.0, 491.0]
2025-08-07 03:09:53,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 47 minutes, 50 seconds)
2025-08-07 03:11:41,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:11:56,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1164.95374 ± 430.602
2025-08-07 03:11:56,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [54.71688, 1438.7974, 996.0386, 1356.7289, 1117.0989, 1586.0688, 1305.5496, 860.4876, 1524.7646, 1409.2856]
2025-08-07 03:11:56,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [66.0, 1000.0, 1000.0, 1000.0, 772.0, 1000.0, 1000.0, 562.0, 1000.0, 1000.0]
2025-08-07 03:11:56,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 46 minutes, 37 seconds)
2025-08-07 03:13:33,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:13:48,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1187.51599 ± 332.009
2025-08-07 03:13:48,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1469.1683, 1054.6439, 1591.0764, 551.359, 1502.9453, 864.5941, 898.7446, 1391.0575, 1492.0975, 1059.4728]
2025-08-07 03:13:48,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 767.0, 1000.0, 425.0, 1000.0, 556.0, 1000.0, 1000.0, 1000.0, 688.0]
2025-08-07 03:13:48,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 44 minutes, 55 seconds)
2025-08-07 03:15:39,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:15:52,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 935.77472 ± 433.612
2025-08-07 03:15:52,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1434.9166, 257.03067, 275.4941, 1147.3025, 1023.914, 1011.7362, 1367.6124, 1516.0192, 623.93335, 699.7881]
2025-08-07 03:15:52,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 165.0, 203.0, 1000.0, 776.0, 658.0, 1000.0, 1000.0, 1000.0, 570.0]
2025-08-07 03:15:52,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 43 minutes, 29 seconds)
2025-08-07 03:17:35,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:17:50,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1069.18481 ± 383.433
2025-08-07 03:17:50,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [709.3288, 1357.5128, 944.3237, 282.3731, 1398.8622, 1348.4098, 1079.3522, 1360.0371, 688.16504, 1523.4847]
2025-08-07 03:17:50,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [538.0, 1000.0, 1000.0, 237.0, 1000.0, 1000.0, 783.0, 1000.0, 520.0, 1000.0]
2025-08-07 03:17:50,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 41 minutes, 44 seconds)
2025-08-07 03:19:38,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:19:53,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1204.23804 ± 483.671
2025-08-07 03:19:53,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1506.7505, 1417.2205, 915.11975, 1448.8553, 572.5573, 1431.9425, 1525.9893, 1656.4226, 1471.732, 95.79117]
2025-08-07 03:19:53,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 934.0, 1000.0, 1000.0, 314.0, 924.0, 1000.0, 1000.0, 1000.0, 74.0]
2025-08-07 03:19:53,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 39 minutes, 59 seconds)
2025-08-07 03:21:37,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:21:48,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 882.90253 ± 596.990
2025-08-07 03:21:48,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1439.9257, 1124.4266, 1573.6151, 257.7357, 1468.5874, 120.366936, 1565.2441, 398.76575, 45.863426, 834.49457]
2025-08-07 03:21:48,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 791.0, 1000.0, 195.0, 1000.0, 106.0, 1000.0, 293.0, 45.0, 555.0]
2025-08-07 03:21:48,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 37 minutes, 28 seconds)
2025-08-07 03:23:35,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:23:46,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 903.58221 ± 586.170
2025-08-07 03:23:46,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1555.79, 1462.6495, 239.07019, 337.15076, 1595.2383, 479.77948, 35.197426, 1522.2262, 1169.6827, 639.0383]
2025-08-07 03:23:46,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 197.0, 219.0, 1000.0, 308.0, 42.0, 1000.0, 1000.0, 405.0]
2025-08-07 03:23:46,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 35 minutes, 52 seconds)
2025-08-07 03:25:33,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:25:48,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1173.16211 ± 546.500
2025-08-07 03:25:48,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1359.6666, 1532.0845, 1470.8335, 1449.3016, 1513.7958, 1567.5718, 1593.5436, 141.44641, 983.512, 119.864784]
2025-08-07 03:25:48,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 95.0, 1000.0, 103.0]
2025-08-07 03:25:48,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 33 minutes, 47 seconds)
2025-08-07 03:27:31,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:27:48,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1289.19788 ± 294.188
2025-08-07 03:27:48,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [457.46506, 1463.9698, 1524.288, 1528.4662, 1399.5989, 1309.0234, 1198.8892, 1364.7107, 1353.8214, 1291.7463]
2025-08-07 03:27:48,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [335.0, 999.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 944.0]
2025-08-07 03:27:48,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 31 minutes, 53 seconds)
2025-08-07 03:29:34,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:29:46,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 955.34485 ± 531.029
2025-08-07 03:29:46,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [78.2082, 1430.7303, 1549.9857, 1552.748, 388.7617, 1187.9849, 995.38245, 536.7019, 387.77588, 1445.1686]
2025-08-07 03:29:46,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [66.0, 1000.0, 1000.0, 1000.0, 300.0, 813.0, 1000.0, 337.0, 241.0, 1000.0]
2025-08-07 03:29:46,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 29 minutes, 39 seconds)
2025-08-07 03:31:35,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:31:48,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 905.72479 ± 469.172
2025-08-07 03:31:48,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1490.7776, 785.08575, 727.2025, 1088.3013, 523.58856, 445.3938, 880.51697, 85.47171, 1469.3577, 1561.5532]
2025-08-07 03:31:48,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 479.0, 1000.0, 310.0, 234.0, 1000.0, 82.0, 1000.0, 1000.0]
2025-08-07 03:31:48,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 28 minutes)
2025-08-07 03:33:38,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:33:49,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 832.02087 ± 422.849
2025-08-07 03:33:49,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [467.0994, 489.58444, 1580.4456, 1193.4425, 1250.2325, 157.4819, 680.8862, 636.6603, 671.57404, 1192.802]
2025-08-07 03:33:49,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [283.0, 329.0, 1000.0, 845.0, 818.0, 129.0, 1000.0, 391.0, 458.0, 850.0]
2025-08-07 03:33:49,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 26 minutes, 8 seconds)
2025-08-07 03:35:28,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:35:41,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 999.00793 ± 457.027
2025-08-07 03:35:41,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1419.5017, 458.59268, 1497.3373, 898.2776, 738.3212, 1598.8813, 665.0319, 306.9998, 1563.7618, 843.37427]
2025-08-07 03:35:41,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 310.0, 1000.0, 588.0, 530.0, 1000.0, 389.0, 193.0, 964.0, 1000.0]
2025-08-07 03:35:41,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 23 minutes, 42 seconds)
2025-08-07 03:37:36,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:37:49,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1128.54028 ± 555.722
2025-08-07 03:37:49,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1728.7098, 1478.318, 1604.8035, 664.6851, 1120.8507, 679.742, 1678.6008, 184.51335, 1689.9363, 455.24228]
2025-08-07 03:37:49,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 925.0, 432.0, 1000.0, 434.0, 1000.0, 118.0, 952.0, 323.0]
2025-08-07 03:37:49,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 3 seconds)
2025-08-07 03:39:32,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:39:48,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1329.86670 ± 506.025
2025-08-07 03:39:48,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1756.5426, 1545.5056, 1491.4379, 853.3966, 1695.3956, 59.462425, 1468.1611, 1515.5975, 1807.0521, 1106.1154]
2025-08-07 03:39:48,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 56.0, 1000.0, 1000.0, 1000.0, 706.0]
2025-08-07 03:39:48,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (1329.87) for latency ExtremeSparseL4U32
2025-08-07 03:39:48,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes, 4 seconds)
2025-08-07 03:41:30,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:41:44,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1134.20630 ± 482.769
2025-08-07 03:41:44,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [874.866, 352.84006, 409.45734, 1057.5137, 1456.7952, 1543.3744, 873.73987, 1768.9294, 1269.2711, 1735.276]
2025-08-07 03:41:44,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [565.0, 342.0, 305.0, 788.0, 924.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:41:44,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 53 seconds)
2025-08-07 03:43:30,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:43:42,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 945.33264 ± 581.069
2025-08-07 03:43:42,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1622.0292, 387.79868, 712.9168, 1251.4178, 1655.9343, 633.6419, 1855.2716, 149.74657, 863.9568, 320.61227]
2025-08-07 03:43:42,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 318.0, 1000.0, 814.0, 1000.0, 420.0, 1000.0, 135.0, 1000.0, 202.0]
2025-08-07 03:43:42,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 49 seconds)
2025-08-07 03:45:36,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:45:45,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 807.53729 ± 639.628
2025-08-07 03:45:45,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [67.74836, 1337.1044, 557.28516, 217.17053, 50.874783, 319.265, 1954.7632, 1443.9298, 1371.1312, 756.1002]
2025-08-07 03:45:45,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [60.0, 772.0, 384.0, 157.0, 43.0, 309.0, 1000.0, 1000.0, 1000.0, 442.0]
2025-08-07 03:45:45,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 6 seconds)
2025-08-07 03:47:29,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:47:39,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 899.06641 ± 344.066
2025-08-07 03:47:39,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [965.8207, 886.4466, 1049.7493, 747.4415, 857.1186, 638.2532, 660.76044, 1727.3204, 1091.503, 366.2496]
2025-08-07 03:47:39,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [550.0, 565.0, 627.0, 449.0, 531.0, 464.0, 373.0, 1000.0, 792.0, 222.0]
2025-08-07 03:47:39,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 47 seconds)
2025-08-07 03:49:22,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:49:36,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1149.99622 ± 561.595
2025-08-07 03:49:36,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1414.2751, 1409.4261, 1254.7688, 1643.299, 721.8644, 1687.2146, 28.08372, 287.4011, 1483.3528, 1570.2762]
2025-08-07 03:49:36,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 793.0, 1000.0, 548.0, 1000.0, 45.0, 200.0, 915.0, 1000.0]
2025-08-07 03:49:36,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 48 seconds)
2025-08-07 03:51:20,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:51:32,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1067.56995 ± 551.519
2025-08-07 03:51:32,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [381.57492, 677.90546, 1623.9818, 1698.2205, 1780.9384, 600.8822, 906.57025, 1673.2057, 1033.9048, 298.51538]
2025-08-07 03:51:32,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [241.0, 455.0, 1000.0, 1000.0, 1000.0, 402.0, 577.0, 1000.0, 690.0, 168.0]
2025-08-07 03:51:32,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 50 seconds)
2025-08-07 03:53:18,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:53:32,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1080.89734 ± 587.182
2025-08-07 03:53:32,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [55.87514, 1135.0125, 1612.3197, 1467.4093, 673.05536, 1721.7502, 157.3291, 1538.6421, 1599.3247, 848.25476]
2025-08-07 03:53:32,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [52.0, 673.0, 1000.0, 1000.0, 1000.0, 1000.0, 128.0, 864.0, 1000.0, 1000.0]
2025-08-07 03:53:32,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 53 seconds)
2025-08-07 03:55:23,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:55:40,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1456.09180 ± 299.723
2025-08-07 03:55:40,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1667.4607, 1181.4614, 703.0404, 1371.5382, 1651.0796, 1559.6014, 1616.3999, 1792.0892, 1583.4707, 1434.7751]
2025-08-07 03:55:40,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 683.0, 438.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:55:40,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (1456.09) for latency ExtremeSparseL4U32
2025-08-07 03:55:40,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 57 seconds)
2025-08-07 03:57:25,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:57:37,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1008.77893 ± 635.475
2025-08-07 03:57:37,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [63.95876, 1688.014, 877.717, 1813.7966, 1662.829, 448.91544, 786.6818, 73.44474, 1600.0104, 1072.4205]
2025-08-07 03:57:37,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [51.0, 1000.0, 453.0, 1000.0, 1000.0, 270.0, 1000.0, 66.0, 1000.0, 595.0]
2025-08-07 03:57:37,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 59 seconds)
2025-08-07 03:59:29,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:59:42,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1256.74255 ± 548.026
2025-08-07 03:59:42,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1651.5685, 1814.9572, 1126.1616, 1400.9528, 667.503, 1830.274, 413.92694, 1777.0264, 1509.6902, 375.36484]
2025-08-07 03:59:42,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [892.0, 1000.0, 729.0, 793.0, 374.0, 1000.0, 233.0, 1000.0, 868.0, 206.0]
2025-08-07 03:59:42,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1251 [DEBUG]: Training session finished
