2025-08-07 08:54:59,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc25-ant/MM1Queue_a033_s075-bpql-mem16
2025-08-07 08:54:59,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc25-ant/MM1Queue_a033_s075-bpql-mem16
2025-08-07 08:54:59,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x150b3690bfd0>}
2025-08-07 08:54:59,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1111 [DEBUG]: using device: cuda
2025-08-07 08:54:59,789 baseline-bpql-noiseperc25-ant:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 08:54:59,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1133 [INFO]: Creating new trainer
2025-08-07 08:54:59,806 baseline-bpql-noiseperc25-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=155, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 08:54:59,806 baseline-bpql-noiseperc25-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 08:55:01,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1194 [DEBUG]: Starting training session...
2025-08-07 08:55:01,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 1/100
2025-08-07 08:56:42,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:56:43,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -42.86775 ± 68.658
2025-08-07 08:56:43,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-10.562005, -7.6622486, 3.1663826, -99.89116, -230.16583, 1.7555358, -37.461334, -24.31681, -4.5016856, -19.038343]
2025-08-07 08:56:43,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [34.0, 32.0, 19.0, 127.0, 159.0, 25.0, 52.0, 27.0, 64.0, 34.0]
2025-08-07 08:56:43,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1226 [INFO]: New best (-42.87) for latency MM1Queue_a033_s075
2025-08-07 08:56:43,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 47 minutes, 23 seconds)
2025-08-07 08:58:22,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:58:22,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -43.74776 ± 32.371
2025-08-07 08:58:22,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-20.98232, -82.639885, -83.973274, -33.49703, -70.06758, -22.250616, -57.051907, -73.514885, -3.4393473, 9.939166]
2025-08-07 08:58:22,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [80.0, 67.0, 91.0, 58.0, 115.0, 42.0, 71.0, 72.0, 26.0, 41.0]
2025-08-07 08:58:22,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 44 minutes, 23 seconds)
2025-08-07 09:00:08,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:00:10,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -110.85138 ± 109.269
2025-08-07 09:00:10,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-2.7168336, 5.984308, -111.76592, -56.114204, -291.5383, -5.3763933, -205.9377, -188.26009, -252.42271, -0.3660138]
2025-08-07 09:00:10,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [46.0, 60.0, 162.0, 106.0, 230.0, 18.0, 172.0, 229.0, 248.0, 20.0]
2025-08-07 09:00:10,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 46 minutes, 23 seconds)
2025-08-07 09:01:54,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:01:55,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -23.58931 ± 30.636
2025-08-07 09:01:55,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-37.397198, 9.308028, -15.669625, -32.390785, -28.444397, -1.4616315, 13.779946, -50.181965, -93.00991, -0.4255871]
2025-08-07 09:01:55,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [49.0, 44.0, 31.0, 117.0, 113.0, 23.0, 24.0, 157.0, 145.0, 41.0]
2025-08-07 09:01:55,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1226 [INFO]: New best (-23.59) for latency MM1Queue_a033_s075
2025-08-07 09:01:55,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 45 minutes, 36 seconds)
2025-08-07 09:03:32,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:03:34,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -91.70702 ± 162.445
2025-08-07 09:03:34,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-26.585257, -222.91437, 2.3741615, -5.0575595, -3.1092274, 0.20736597, -63.129665, -21.90546, -540.1174, -36.83279]
2025-08-07 09:03:34,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [92.0, 397.0, 29.0, 57.0, 52.0, 38.0, 93.0, 63.0, 1000.0, 51.0]
2025-08-07 09:03:34,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 42 minutes, 29 seconds)
2025-08-07 09:05:17,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:05:21,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -145.79214 ± 205.891
2025-08-07 09:05:21,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-5.081976, -552.18085, -83.97143, 4.4168453, -14.025098, 14.881012, -116.54591, -540.7287, -129.8605, -34.824997]
2025-08-07 09:05:21,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [58.0, 1000.0, 164.0, 25.0, 30.0, 39.0, 262.0, 1000.0, 237.0, 87.0]
2025-08-07 09:05:21,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 42 minutes, 20 seconds)
2025-08-07 09:07:06,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:07:07,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -37.45673 ± 33.277
2025-08-07 09:07:07,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-39.500328, -18.778694, -39.29456, -113.844154, -67.103325, 11.221677, -56.35023, -15.582334, -14.905123, -20.430273]
2025-08-07 09:07:07,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [102.0, 51.0, 74.0, 144.0, 140.0, 29.0, 107.0, 45.0, 31.0, 34.0]
2025-08-07 09:07:07,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 42 minutes, 27 seconds)
2025-08-07 09:08:48,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:08:49,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -16.66443 ± 32.993
2025-08-07 09:08:49,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [10.21698, 13.9092, 4.6846004, -33.806816, -26.736456, 6.380367, -25.100937, -102.74973, -15.843549, 2.4020445]
2025-08-07 09:08:49,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [78.0, 24.0, 94.0, 69.0, 40.0, 45.0, 86.0, 111.0, 56.0, 13.0]
2025-08-07 09:08:49,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1226 [INFO]: New best (-16.66) for latency MM1Queue_a033_s075
2025-08-07 09:08:49,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 39 minutes, 7 seconds)
2025-08-07 09:10:31,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:10:33,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -92.56557 ± 207.770
2025-08-07 09:10:33,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-3.8132749, 2.999172, -36.214123, -34.945206, -713.2418, -47.777523, -3.119405, -56.571846, -22.002449, -10.9692135]
2025-08-07 09:10:33,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [48.0, 33.0, 68.0, 75.0, 1000.0, 68.0, 32.0, 71.0, 41.0, 32.0]
2025-08-07 09:10:33,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 36 minutes, 59 seconds)
2025-08-07 09:12:15,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:12:16,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -9.85814 ± 25.978
2025-08-07 09:12:16,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-17.08961, -16.347212, -76.37949, 15.0830965, -9.72034, -5.0775347, -9.0829935, -13.097444, 7.165086, 25.964994]
2025-08-07 09:12:16,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [34.0, 36.0, 115.0, 26.0, 41.0, 30.0, 52.0, 110.0, 49.0, 54.0]
2025-08-07 09:12:16,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1226 [INFO]: New best (-9.86) for latency MM1Queue_a033_s075
2025-08-07 09:12:16,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 36 minutes, 31 seconds)
2025-08-07 09:13:59,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:14:00,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -41.80839 ± 54.982
2025-08-07 09:14:00,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-18.507355, -21.094849, -72.10767, -191.32753, -42.663208, -15.171619, 1.2430831, -41.38828, -33.08331, 16.016882]
2025-08-07 09:14:00,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [50.0, 45.0, 85.0, 173.0, 126.0, 94.0, 38.0, 62.0, 61.0, 27.0]
2025-08-07 09:14:00,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 34 minutes, 6 seconds)
2025-08-07 09:15:42,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:15:46,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -262.80835 ± 360.351
2025-08-07 09:15:46,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-23.877281, 25.648191, -727.74475, -71.79279, -59.607178, -11.0308895, -16.954695, -825.75073, -41.971214, -875.0022]
2025-08-07 09:15:46,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [34.0, 36.0, 1000.0, 61.0, 88.0, 33.0, 75.0, 1000.0, 38.0, 1000.0]
2025-08-07 09:15:46,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 32 minutes, 30 seconds)
2025-08-07 09:17:28,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:17:29,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -18.45362 ± 18.432
2025-08-07 09:17:29,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-22.066957, -28.708471, -26.843586, -60.241318, -24.294973, -8.437207, 10.199783, -19.02116, -1.6508704, -3.4714859]
2025-08-07 09:17:29,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [77.0, 62.0, 62.0, 73.0, 60.0, 51.0, 30.0, 32.0, 48.0, 47.0]
2025-08-07 09:17:29,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 30 minutes, 51 seconds)
2025-08-07 09:19:11,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:19:12,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -36.39987 ± 38.639
2025-08-07 09:19:12,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-1.6966714, -34.779957, -29.032295, -6.8371544, -48.33645, -27.845762, 8.158484, -29.550352, -57.086086, -136.99246]
2025-08-07 09:19:12,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [25.0, 49.0, 64.0, 70.0, 42.0, 93.0, 24.0, 43.0, 58.0, 129.0]
2025-08-07 09:19:12,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 28 minutes, 52 seconds)
2025-08-07 09:20:54,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:20:56,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -100.48402 ± 213.235
2025-08-07 09:20:56,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-34.471497, 1.6759188, -31.26479, -20.392723, -738.0513, -47.216232, -54.70046, -40.365627, -1.2261913, -38.827198]
2025-08-07 09:20:56,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [78.0, 28.0, 50.0, 29.0, 1000.0, 54.0, 65.0, 57.0, 28.0, 47.0]
2025-08-07 09:20:56,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 27 minutes, 22 seconds)
2025-08-07 09:22:42,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:22:43,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -24.67816 ± 31.637
2025-08-07 09:22:43,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-1.7144234, -23.017431, -8.654464, 0.5329949, -79.836075, -12.712235, -58.974045, -73.292564, 10.269582, 0.61705667]
2025-08-07 09:22:43,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [20.0, 34.0, 21.0, 33.0, 61.0, 81.0, 60.0, 75.0, 38.0, 50.0]
2025-08-07 09:22:43,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 26 minutes, 18 seconds)
2025-08-07 09:24:25,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:24:25,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -14.10420 ± 20.741
2025-08-07 09:24:25,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-43.589043, -47.802624, -0.3938207, 7.4895496, 4.8691854, -16.734, -15.856047, 2.8225958, -38.057285, 6.2094574]
2025-08-07 09:24:25,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [111.0, 68.0, 29.0, 65.0, 16.0, 32.0, 87.0, 30.0, 64.0, 38.0]
2025-08-07 09:24:25,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 23 minutes, 35 seconds)
2025-08-07 09:26:06,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:26:07,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -29.12995 ± 51.365
2025-08-07 09:26:07,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-150.40784, 19.571451, -79.459305, 5.8114967, -12.9333725, -31.424105, 1.0892062, 24.496176, -60.904343, -7.13886]
2025-08-07 09:26:07,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [132.0, 33.0, 64.0, 30.0, 57.0, 66.0, 38.0, 24.0, 130.0, 44.0]
2025-08-07 09:26:07,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 21 minutes, 38 seconds)
2025-08-07 09:27:47,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:27:49,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -86.43775 ± 179.473
2025-08-07 09:27:49,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-12.411422, -620.0787, -5.916112, -83.921585, -53.933132, -11.9035635, -19.066652, -26.76301, -31.014708, 0.6313732]
2025-08-07 09:27:49,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [22.0, 1000.0, 66.0, 89.0, 43.0, 31.0, 39.0, 38.0, 75.0, 26.0]
2025-08-07 09:27:49,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 19 minutes, 31 seconds)
2025-08-07 09:29:31,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:29:34,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -142.75131 ± 269.302
2025-08-07 09:29:34,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-10.651417, -23.319778, 11.435057, -642.0004, -18.043224, -4.6772947, -717.87225, -2.7552376, -19.19854, -0.43005362]
2025-08-07 09:29:34,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [41.0, 30.0, 40.0, 1000.0, 31.0, 47.0, 1000.0, 26.0, 32.0, 24.0]
2025-08-07 09:29:34,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 18 minutes, 4 seconds)
2025-08-07 09:31:17,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:31:17,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -6.17723 ± 16.692
2025-08-07 09:31:17,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-10.42711, -29.773584, 14.387188, 14.188653, 4.57315, 2.0639467, -4.4441977, -39.320644, -13.158823, 0.1391733]
2025-08-07 09:31:17,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [21.0, 37.0, 43.0, 40.0, 48.0, 21.0, 59.0, 73.0, 40.0, 34.0]
2025-08-07 09:31:17,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1226 [INFO]: New best (-6.18) for latency MM1Queue_a033_s075
2025-08-07 09:31:17,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 15 minutes, 28 seconds)
2025-08-07 09:33:00,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:33:00,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -8.25348 ± 15.529
2025-08-07 09:33:00,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-10.604135, 8.72789, 7.486349, -37.03453, 13.188413, -13.954173, -27.986385, 1.9212387, -8.526396, -15.753109]
2025-08-07 09:33:00,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [59.0, 40.0, 27.0, 85.0, 37.0, 23.0, 40.0, 32.0, 34.0, 22.0]
2025-08-07 09:33:01,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 13 minutes, 54 seconds)
2025-08-07 09:34:43,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:34:43,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -19.73083 ± 22.261
2025-08-07 09:34:43,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-8.539296, 15.41208, -38.863705, -21.246208, -17.355928, -9.481926, -0.61580193, -6.361428, -49.383377, -60.872677]
2025-08-07 09:34:43,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [24.0, 31.0, 34.0, 49.0, 101.0, 27.0, 20.0, 26.0, 90.0, 43.0]
2025-08-07 09:34:43,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 12 minutes, 27 seconds)
2025-08-07 09:36:24,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:36:25,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -27.25797 ± 32.066
2025-08-07 09:36:25,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-110.09329, -21.329348, -5.3763056, -27.102312, -31.227425, -9.396627, -2.1002417, 1.3579283, -56.439312, -10.872776]
2025-08-07 09:36:25,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [105.0, 44.0, 34.0, 39.0, 49.0, 28.0, 28.0, 29.0, 44.0, 20.0]
2025-08-07 09:36:25,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 10 minutes, 45 seconds)
2025-08-07 09:38:07,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:38:07,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -15.07757 ± 19.915
2025-08-07 09:38:07,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-32.319973, 10.878291, 0.020226074, -3.8800366, 1.896286, 7.23622, -43.760666, -23.21404, -45.314293, -22.31777]
2025-08-07 09:38:07,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 19.0, 28.0, 26.0, 26.0, 41.0, 42.0, 105.0, 37.0, 30.0]
2025-08-07 09:38:07,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 8 minutes, 16 seconds)
2025-08-07 09:39:49,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:39:50,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -13.81082 ± 25.083
2025-08-07 09:39:50,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-22.129269, -9.087731, 7.2108264, -24.949276, -29.088211, 15.115311, 0.53485864, -18.70062, -72.680214, 15.666104]
2025-08-07 09:39:50,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [58.0, 13.0, 20.0, 36.0, 27.0, 24.0, 29.0, 25.0, 66.0, 33.0]
2025-08-07 09:39:50,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 6 minutes, 26 seconds)
2025-08-07 09:41:41,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:41:44,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -31.71745 ± 29.550
2025-08-07 09:41:44,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [16.928724, -44.729553, -19.183271, -43.51365, -9.837458, -11.71655, -18.940998, -87.31074, -25.56401, -73.30695]
2025-08-07 09:41:44,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [29.0, 45.0, 37.0, 36.0, 24.0, 36.0, 22.0, 1000.0, 34.0, 1000.0]
2025-08-07 09:41:44,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 7 minutes, 30 seconds)
2025-08-07 09:43:17,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:43:18,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -12.24348 ± 18.416
2025-08-07 09:43:18,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-13.931012, -10.202701, 15.258536, -17.215385, -34.23, -29.9813, -2.3973615, -10.883694, 19.650023, -38.501915]
2025-08-07 09:43:18,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [52.0, 23.0, 24.0, 27.0, 43.0, 27.0, 26.0, 25.0, 36.0, 60.0]
2025-08-07 09:43:18,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 3 minutes, 24 seconds)
2025-08-07 09:45:00,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:45:00,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -4.96396 ± 11.318
2025-08-07 09:45:00,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [1.5439792, -27.192701, -2.361699, -2.0836601, 0.6717008, -24.628342, 0.050314818, 11.320205, -0.63326037, -6.326111]
2025-08-07 09:45:00,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [24.0, 29.0, 25.0, 29.0, 38.0, 41.0, 35.0, 139.0, 49.0, 25.0]
2025-08-07 09:45:00,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1226 [INFO]: New best (-4.96) for latency MM1Queue_a033_s075
2025-08-07 09:45:00,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 1 minute, 58 seconds)
2025-08-07 09:46:42,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:46:43,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -14.28267 ± 18.634
2025-08-07 09:46:43,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-16.120237, 3.84276, 0.4127809, -5.324293, -7.2257385, -50.990536, -13.504318, -44.38324, -18.16518, 8.631324]
2025-08-07 09:46:43,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [31.0, 23.0, 25.0, 25.0, 13.0, 1000.0, 26.0, 35.0, 24.0, 30.0]
2025-08-07 09:46:43,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 26 seconds)
2025-08-07 09:48:34,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:48:37,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -45.80605 ± 45.290
2025-08-07 09:48:37,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-23.306795, -39.6734, -32.52829, -104.231, -15.345286, -0.8045206, -105.95506, -125.81103, -12.585716, 2.1805856]
2025-08-07 09:48:37,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [31.0, 66.0, 69.0, 1000.0, 33.0, 41.0, 66.0, 256.0, 21.0, 24.0]
2025-08-07 09:48:37,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 1 minute, 9 seconds)
2025-08-07 09:50:08,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:50:10,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -25.20745 ± 42.773
2025-08-07 09:50:10,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-9.986569, -69.971214, -35.56483, 13.813247, -7.642074, -115.899734, -57.45476, 35.4315, 5.9296546, -10.729706]
2025-08-07 09:50:10,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [37.0, 1000.0, 75.0, 26.0, 53.0, 331.0, 59.0, 37.0, 70.0, 34.0]
2025-08-07 09:50:10,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 54 minutes, 39 seconds)
2025-08-07 09:52:01,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:52:03,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -19.51513 ± 37.779
2025-08-07 09:52:03,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [1.1591841, -5.040583, 14.191164, -50.504593, 7.6659946, -117.154144, -11.490897, -32.94223, -12.772991, 11.737763]
2025-08-07 09:52:03,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [24.0, 27.0, 22.0, 52.0, 32.0, 1000.0, 33.0, 42.0, 24.0, 27.0]
2025-08-07 09:52:03,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 57 minutes, 15 seconds)
2025-08-07 09:53:35,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:53:36,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -26.23365 ± 34.899
2025-08-07 09:53:36,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-0.81199646, -18.990746, 0.62632334, -42.1548, -13.113503, -20.014973, -12.547397, -8.097581, -125.03805, -22.19372]
2025-08-07 09:53:36,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [34.0, 31.0, 16.0, 43.0, 46.0, 26.0, 81.0, 26.0, 166.0, 40.0]
2025-08-07 09:53:36,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 53 minutes, 26 seconds)
2025-08-07 09:55:18,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:55:19,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -16.34757 ± 23.849
2025-08-07 09:55:19,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-7.3606224, 2.143032, -12.248392, -83.423775, 5.9254227, -6.9849854, -19.456312, -17.07725, -20.420118, -4.572683]
2025-08-07 09:55:19,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [56.0, 24.0, 109.0, 71.0, 15.0, 24.0, 39.0, 24.0, 26.0, 16.0]
2025-08-07 09:55:19,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 51 minutes, 38 seconds)
2025-08-07 09:57:00,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:57:01,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -5.60477 ± 7.945
2025-08-07 09:57:01,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-8.363939, 7.8493686, 1.7466736, -1.6773319, -16.525826, -5.084723, 0.3739761, -11.201028, -3.9980154, -19.16683]
2025-08-07 09:57:01,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [25.0, 52.0, 19.0, 54.0, 44.0, 24.0, 23.0, 46.0, 58.0, 28.0]
2025-08-07 09:57:01,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 47 minutes, 36 seconds)
2025-08-07 09:58:42,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:58:43,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -3.84144 ± 10.160
2025-08-07 09:58:43,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-1.1439451, -1.8364112, -3.5604968, -16.15657, -8.42276, -0.9001187, -11.145121, 11.621613, 13.100227, -19.970844]
2025-08-07 09:58:43,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [24.0, 52.0, 24.0, 28.0, 37.0, 29.0, 36.0, 45.0, 29.0, 73.0]
2025-08-07 09:58:43,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1226 [INFO]: New best (-3.84) for latency MM1Queue_a033_s075
2025-08-07 09:58:43,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 47 minutes, 37 seconds)
2025-08-07 10:00:27,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:00:28,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -8.88835 ± 25.313
2025-08-07 10:00:28,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-37.813, -13.9632635, 9.469359, -12.445993, -11.137503, 18.971449, 17.091711, -67.106804, 6.9010444, 1.1494576]
2025-08-07 10:00:28,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [97.0, 49.0, 31.0, 26.0, 25.0, 43.0, 21.0, 130.0, 24.0, 32.0]
2025-08-07 10:00:28,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 44 minutes, 23 seconds)
2025-08-07 10:02:08,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:02:10,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -23.71378 ± 56.109
2025-08-07 10:02:10,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-7.088589, 17.114109, 2.980588, -9.746794, -7.0404205, -186.68803, -0.9675505, 8.434846, -16.60679, -37.529125]
2025-08-07 10:02:10,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [30.0, 23.0, 27.0, 49.0, 41.0, 1000.0, 29.0, 26.0, 37.0, 32.0]
2025-08-07 10:02:10,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 44 minutes, 31 seconds)
2025-08-07 10:03:51,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:03:51,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -14.09083 ± 17.455
2025-08-07 10:03:51,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-32.961887, -17.244001, -3.1149733, 0.6965585, 13.830361, 3.475126, -11.975586, -45.198383, -30.399769, -18.015787]
2025-08-07 10:03:51,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [38.0, 62.0, 36.0, 33.0, 26.0, 26.0, 31.0, 70.0, 41.0, 40.0]
2025-08-07 10:03:51,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 42 minutes, 32 seconds)
2025-08-07 10:05:39,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:05:41,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -13.80958 ± 20.730
2025-08-07 10:05:41,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-27.698866, 5.6403227, -27.116898, -2.860494, -44.572, 26.383974, -21.895027, -38.224594, -5.9330716, -1.8190894]
2025-08-07 10:05:41,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 24.0, 38.0, 50.0, 1000.0, 29.0, 59.0, 70.0, 29.0, 25.0]
2025-08-07 10:05:41,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 42 minutes, 12 seconds)
2025-08-07 10:07:16,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:07:16,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -3.34648 ± 14.388
2025-08-07 10:07:16,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [3.2401748, -8.312263, -3.3030818, 18.474823, -31.449257, 15.741164, 1.7794701, -3.4772317, -21.913414, -4.2452245]
2025-08-07 10:07:16,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 27.0, 29.0, 68.0, 45.0, 23.0, 24.0, 16.0, 76.0, 26.0]
2025-08-07 10:07:16,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1226 [INFO]: New best (-3.35) for latency MM1Queue_a033_s075
2025-08-07 10:07:16,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 39 minutes, 16 seconds)
2025-08-07 10:08:58,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:08:59,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -18.05774 ± 17.715
2025-08-07 10:08:59,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-9.006912, -12.128274, -19.740978, -66.423004, 6.0221057, -12.022456, -14.793758, -13.752182, -23.259598, -15.472324]
2025-08-07 10:08:59,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 39.0, 38.0, 42.0, 27.0, 33.0, 24.0, 24.0, 46.0, 37.0]
2025-08-07 10:08:59,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 37 minutes, 7 seconds)
2025-08-07 10:10:40,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:10:40,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -9.31320 ± 12.174
2025-08-07 10:10:40,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-30.934404, -11.201847, -14.568278, -10.315496, -17.912188, -5.1756897, -3.4468777, -19.619043, 11.259244, 8.78261]
2025-08-07 10:10:40,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 53.0, 24.0, 28.0, 23.0, 32.0, 35.0, 30.0, 24.0, 27.0]
2025-08-07 10:10:40,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 35 minutes, 14 seconds)
2025-08-07 10:12:21,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:12:22,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -13.57014 ± 19.749
2025-08-07 10:12:22,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-3.6318119, -2.6508284, -24.717014, 0.83504885, -56.156994, -9.656216, -2.9546373, 18.175974, -30.166151, -24.77879]
2025-08-07 10:12:22,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 34.0, 25.0, 25.0, 67.0, 23.0, 25.0, 31.0, 59.0, 51.0]
2025-08-07 10:12:22,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 33 minutes, 36 seconds)
2025-08-07 10:14:03,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:14:04,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -10.28844 ± 17.318
2025-08-07 10:14:04,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-10.323508, -20.28643, -4.09471, 14.992835, -4.8882613, -5.0777726, -47.23862, -31.513126, 4.828763, 0.71643937]
2025-08-07 10:14:04,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [35.0, 34.0, 21.0, 117.0, 25.0, 26.0, 34.0, 41.0, 24.0, 31.0]
2025-08-07 10:14:04,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 30 minutes, 33 seconds)
2025-08-07 10:15:46,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:15:47,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -17.78012 ± 23.316
2025-08-07 10:15:47,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-21.843159, -21.29092, -39.23382, -8.512726, -19.872738, 17.097336, 0.06262503, -12.185919, 0.22523086, -72.24709]
2025-08-07 10:15:47,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 48.0, 32.0, 28.0, 30.0, 25.0, 26.0, 24.0, 25.0, 1000.0]
2025-08-07 10:15:47,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 30 minutes, 17 seconds)
2025-08-07 10:17:28,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:17:28,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -0.34300 ± 9.136
2025-08-07 10:17:28,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [22.0963, -7.2285085, 0.59615064, -7.733578, -1.9266664, 6.564417, -5.8759665, -7.017864, -8.535768, 5.631512]
2025-08-07 10:17:28,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [29.0, 24.0, 36.0, 27.0, 25.0, 33.0, 31.0, 24.0, 25.0, 25.0]
2025-08-07 10:17:28,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1226 [INFO]: New best (-0.34) for latency MM1Queue_a033_s075
2025-08-07 10:17:28,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 28 minutes, 18 seconds)
2025-08-07 10:19:17,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:19:18,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -20.81798 ± 27.866
2025-08-07 10:19:18,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-0.010354227, -76.36214, -3.2207854, 2.2095952, -22.19928, -9.321061, -9.45624, -47.668915, 13.73037, -55.880962]
2025-08-07 10:19:18,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [35.0, 73.0, 31.0, 24.0, 31.0, 25.0, 32.0, 98.0, 29.0, 69.0]
2025-08-07 10:19:18,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 28 minutes)
2025-08-07 10:20:51,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:20:52,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -19.37135 ± 26.726
2025-08-07 10:20:52,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-17.66892, -19.114344, 1.3417461, -18.24563, -6.7517877, -11.838132, 1.5885108, -21.410936, -5.676629, -95.93739]
2025-08-07 10:20:52,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 24.0, 19.0, 42.0, 39.0, 24.0, 42.0, 27.0, 30.0, 106.0]
2025-08-07 10:20:52,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 24 minutes, 57 seconds)
2025-08-07 10:22:33,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:22:34,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -4.90469 ± 14.582
2025-08-07 10:22:34,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-8.155099, 5.0474977, -16.510693, -27.601355, -15.481868, -2.517236, 26.773666, 4.9238615, -15.917181, 0.39145157]
2025-08-07 10:22:34,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [27.0, 25.0, 25.0, 29.0, 44.0, 25.0, 41.0, 16.0, 29.0, 24.0]
2025-08-07 10:22:34,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 23 minutes, 17 seconds)
2025-08-07 10:24:10,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:24:11,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -11.62267 ± 17.292
2025-08-07 10:24:11,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-3.85395, -4.644295, 10.233053, -12.505995, -38.856598, -47.950546, 0.30439004, -10.724238, 2.4153748, -10.643948]
2025-08-07 10:24:11,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [24.0, 25.0, 30.0, 30.0, 27.0, 119.0, 23.0, 37.0, 22.0, 25.0]
2025-08-07 10:24:11,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 20 minutes, 32 seconds)
2025-08-07 10:25:51,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:25:51,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -17.89456 ± 26.367
2025-08-07 10:25:51,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [19.420876, -18.332958, -15.247503, -19.23628, -9.997777, -28.673187, -88.074265, -10.840684, -1.7147223, -6.2490535]
2025-08-07 10:25:51,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [35.0, 33.0, 25.0, 24.0, 24.0, 41.0, 125.0, 25.0, 24.0, 32.0]
2025-08-07 10:25:51,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 18 minutes, 46 seconds)
2025-08-07 10:27:31,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:27:31,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -5.65121 ± 12.072
2025-08-07 10:27:31,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-5.775303, 6.7022853, -0.9186887, -6.2217274, -11.721622, -15.988342, -26.463188, 13.169317, 8.56177, -17.856623]
2025-08-07 10:27:31,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [24.0, 36.0, 25.0, 37.0, 45.0, 26.0, 38.0, 25.0, 32.0, 35.0]
2025-08-07 10:27:31,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 15 minutes, 36 seconds)
2025-08-07 10:29:16,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:29:17,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -15.04287 ± 34.698
2025-08-07 10:29:17,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-105.98446, 14.78653, -2.6150122, -3.4516494, -9.637653, -18.604183, -3.02046, -45.734074, 9.860094, 13.972164]
2025-08-07 10:29:17,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [162.0, 46.0, 23.0, 21.0, 32.0, 34.0, 24.0, 79.0, 26.0, 26.0]
2025-08-07 10:29:17,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 15 minutes, 48 seconds)
2025-08-07 10:30:54,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:30:54,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -14.36901 ± 23.192
2025-08-07 10:30:54,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [1.7425852, -15.220301, -42.20431, -40.50093, -23.607296, -6.4955053, -24.821573, 42.61342, -9.34858, -25.847559]
2025-08-07 10:30:54,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [28.0, 34.0, 40.0, 35.0, 33.0, 26.0, 24.0, 50.0, 22.0, 67.0]
2025-08-07 10:30:54,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 13 minutes, 24 seconds)
2025-08-07 10:32:34,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:32:35,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -11.03051 ± 18.371
2025-08-07 10:32:35,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-53.156555, 15.090071, -4.4239697, -18.129177, -23.560005, -7.849376, -5.094939, -20.762539, 10.4058, -2.824446]
2025-08-07 10:32:35,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [54.0, 27.0, 34.0, 39.0, 49.0, 35.0, 23.0, 25.0, 41.0, 26.0]
2025-08-07 10:32:35,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 12 minutes, 15 seconds)
2025-08-07 10:34:20,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:34:21,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -15.38557 ± 27.265
2025-08-07 10:34:21,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [24.188246, -0.68604136, -9.141194, -28.241396, -0.87172246, -49.30312, -69.595, 11.721125, -1.6048231, -30.321753]
2025-08-07 10:34:21,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [50.0, 20.0, 32.0, 37.0, 44.0, 70.0, 42.0, 25.0, 28.0, 47.0]
2025-08-07 10:34:21,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 11 minutes, 22 seconds)
2025-08-07 10:35:52,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:35:53,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: 0.23130 ± 20.703
2025-08-07 10:35:53,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [7.6186166, 5.2502675, -0.68750125, 10.099665, -57.514027, -6.6188354, 6.2436457, 18.422821, 19.196005, 0.3023155]
2025-08-07 10:35:53,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [42.0, 22.0, 25.0, 58.0, 47.0, 24.0, 37.0, 24.0, 27.0, 83.0]
2025-08-07 10:35:53,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1226 [INFO]: New best (0.23) for latency MM1Queue_a033_s075
2025-08-07 10:35:53,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 8 minutes, 34 seconds)
2025-08-07 10:37:33,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:37:34,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -7.97749 ± 11.142
2025-08-07 10:37:34,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [1.0426599, -18.24139, -29.075369, -1.1981698, 2.607075, -3.62711, -3.1340773, -25.239086, 2.1609771, -5.0704126]
2025-08-07 10:37:34,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [46.0, 24.0, 50.0, 25.0, 24.0, 83.0, 23.0, 58.0, 31.0, 34.0]
2025-08-07 10:37:34,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 6 minutes, 14 seconds)
2025-08-07 10:39:14,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:39:14,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: 0.19251 ± 18.499
2025-08-07 10:39:14,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [12.358784, 23.473404, -40.085888, -12.223712, -14.145717, -2.035552, 19.934385, 13.16905, 10.3935795, -8.91322]
2025-08-07 10:39:14,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [41.0, 25.0, 51.0, 22.0, 42.0, 25.0, 31.0, 31.0, 26.0, 43.0]
2025-08-07 10:39:14,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 5 minutes, 1 second)
2025-08-07 10:40:54,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:40:56,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -22.85639 ± 54.876
2025-08-07 10:40:56,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-8.452107, -7.5780563, 15.9757805, -34.10072, 2.0181487, -21.532034, -6.6843953, -181.93596, 0.376099, 13.349384]
2025-08-07 10:40:56,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [34.0, 24.0, 26.0, 65.0, 24.0, 33.0, 26.0, 1000.0, 25.0, 56.0]
2025-08-07 10:40:56,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 3 minutes, 25 seconds)
2025-08-07 10:42:41,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:42:41,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: 0.98324 ± 17.037
2025-08-07 10:42:41,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [6.229338, 2.7190511, 7.5264416, 7.504528, 17.147243, 7.2022657, 22.603956, -15.421036, -5.304141, -40.37521]
2025-08-07 10:42:41,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [25.0, 14.0, 23.0, 25.0, 26.0, 35.0, 28.0, 47.0, 26.0, 63.0]
2025-08-07 10:42:41,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1226 [INFO]: New best (0.98) for latency MM1Queue_a033_s075
2025-08-07 10:42:41,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 1 minute, 40 seconds)
2025-08-07 10:44:14,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:44:14,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: 3.39378 ± 12.103
2025-08-07 10:44:14,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [4.671521, -2.240375, 19.712883, -0.5692075, 21.738125, 10.827302, -14.510253, 3.0525787, 8.202782, -16.947596]
2025-08-07 10:44:14,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [25.0, 22.0, 30.0, 23.0, 41.0, 25.0, 24.0, 37.0, 24.0, 37.0]
2025-08-07 10:44:14,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1226 [INFO]: New best (3.39) for latency MM1Queue_a033_s075
2025-08-07 10:44:14,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 11 seconds)
2025-08-07 10:45:55,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:45:57,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -20.77030 ± 45.352
2025-08-07 10:45:57,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [10.2148285, -21.131424, -19.96084, -22.864407, 10.642242, -33.779243, -3.5103116, 20.10357, 0.025294704, -147.44267]
2025-08-07 10:45:57,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [31.0, 26.0, 35.0, 28.0, 53.0, 88.0, 38.0, 44.0, 46.0, 1000.0]
2025-08-07 10:45:57,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 58 minutes, 45 seconds)
2025-08-07 10:47:37,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:47:38,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -0.96330 ± 18.102
2025-08-07 10:47:38,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-37.716377, -4.170171, 10.262746, -22.638702, 6.941439, 18.37181, 23.84764, 6.773417, -14.71257, 3.4077501]
2025-08-07 10:47:38,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [42.0, 28.0, 27.0, 22.0, 28.0, 52.0, 47.0, 23.0, 25.0, 14.0]
2025-08-07 10:47:38,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 57 minutes, 2 seconds)
2025-08-07 10:49:16,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:49:17,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -1.22162 ± 12.825
2025-08-07 10:49:17,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [22.05069, 5.081842, -14.415789, 12.5957365, -25.99346, -4.592673, -6.2664614, 2.870806, 2.14693, -5.693859]
2025-08-07 10:49:17,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [38.0, 16.0, 28.0, 82.0, 49.0, 65.0, 36.0, 24.0, 27.0, 36.0]
2025-08-07 10:49:17,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 55 minutes, 6 seconds)
2025-08-07 10:50:55,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:50:57,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -29.80210 ± 56.921
2025-08-07 10:50:57,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-3.6281707, -20.11438, -66.10119, 5.4793444, -177.52426, -14.424847, 15.594775, -25.280943, 37.50202, -49.523357]
2025-08-07 10:50:57,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [37.0, 52.0, 68.0, 24.0, 1000.0, 34.0, 27.0, 44.0, 71.0, 68.0]
2025-08-07 10:50:57,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 52 minutes, 53 seconds)
2025-08-07 10:52:37,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:52:37,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -8.25369 ± 41.793
2025-08-07 10:52:37,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [1.8882887, 19.524086, 9.563765, -8.478806, -47.51121, 11.447456, 4.5520673, -93.88861, -45.35168, 65.71778]
2025-08-07 10:52:37,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [19.0, 23.0, 51.0, 33.0, 35.0, 35.0, 25.0, 95.0, 58.0, 90.0]
2025-08-07 10:52:37,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 51 minutes, 58 seconds)
2025-08-07 10:54:16,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:54:16,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -6.37962 ± 20.784
2025-08-07 10:54:16,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [27.80712, -26.113754, 9.107332, -3.5560741, -8.756424, -2.4972754, 5.5715637, 3.5295093, -16.048244, -52.839943]
2025-08-07 10:54:16,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 57.0, 27.0, 41.0, 25.0, 24.0, 25.0, 35.0, 22.0, 61.0]
2025-08-07 10:54:16,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 49 minutes, 52 seconds)
2025-08-07 10:55:56,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:55:57,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -2.50495 ± 9.171
2025-08-07 10:55:57,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [4.390814, -2.5221, 6.52523, -6.1939263, -8.433741, -6.2428527, 0.146013, -19.759556, -8.13941, 15.180055]
2025-08-07 10:55:57,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [48.0, 31.0, 25.0, 24.0, 30.0, 25.0, 31.0, 51.0, 25.0, 49.0]
2025-08-07 10:55:57,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 48 minutes, 14 seconds)
2025-08-07 10:57:37,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:57:37,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -4.26918 ± 17.445
2025-08-07 10:57:37,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [13.504176, -32.833603, -9.660126, -26.876684, -10.642509, -1.5486842, -7.79336, -1.9439814, 4.261739, 30.84121]
2025-08-07 10:57:37,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [24.0, 49.0, 24.0, 39.0, 25.0, 19.0, 24.0, 36.0, 23.0, 40.0]
2025-08-07 10:57:37,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 46 minutes, 44 seconds)
2025-08-07 10:59:17,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:59:17,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -2.27587 ± 14.499
2025-08-07 10:59:17,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-29.357433, -10.482841, 3.249528, -22.133696, 7.8209076, 5.14346, 3.998964, 4.881542, -8.016413, 22.137331]
2025-08-07 10:59:17,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [35.0, 35.0, 42.0, 37.0, 27.0, 36.0, 21.0, 26.0, 36.0, 53.0]
2025-08-07 10:59:17,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 45 minutes, 2 seconds)
2025-08-07 11:01:00,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:01:01,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -14.59490 ± 19.697
2025-08-07 11:01:01,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-5.236489, -22.290344, -44.629333, 2.1972725, -37.71452, -7.2184916, -15.347443, -5.670404, 24.103947, -34.14314]
2025-08-07 11:01:01,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [23.0, 36.0, 80.0, 24.0, 50.0, 31.0, 40.0, 24.0, 155.0, 42.0]
2025-08-07 11:01:01,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 43 minutes, 37 seconds)
2025-08-07 11:02:36,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:02:36,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -8.24967 ± 14.717
2025-08-07 11:02:36,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-1.9450521, 7.05824, -39.04571, -2.2523794, -26.027567, -18.16656, -6.923196, 12.963318, -2.2787216, -5.8791003]
2025-08-07 11:02:36,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [35.0, 44.0, 38.0, 24.0, 31.0, 66.0, 42.0, 88.0, 25.0, 25.0]
2025-08-07 11:02:36,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 41 minutes, 39 seconds)
2025-08-07 11:04:19,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:04:19,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -6.91950 ± 11.483
2025-08-07 11:04:19,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-20.759167, 4.1712546, -1.2413212, -21.506355, -3.7023582, -24.723024, -11.604373, 10.261767, -3.2140248, 3.1226034]
2025-08-07 11:04:19,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [58.0, 27.0, 24.0, 23.0, 24.0, 36.0, 62.0, 24.0, 64.0, 48.0]
2025-08-07 11:04:19,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 40 minutes, 11 seconds)
2025-08-07 11:05:57,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:05:58,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -14.39517 ± 36.653
2025-08-07 11:05:58,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-16.795485, -18.56113, 4.3969626, 0.4890751, 10.195925, 7.2905483, 5.0619187, -41.464077, 17.859842, -112.4253]
2025-08-07 11:05:58,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 72.0, 28.0, 25.0, 35.0, 86.0, 26.0, 60.0, 64.0, 86.0]
2025-08-07 11:05:58,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 38 minutes, 22 seconds)
2025-08-07 11:07:37,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:07:40,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -28.22849 ± 42.409
2025-08-07 11:07:40,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-19.2013, 5.0516496, -4.64536, -116.90411, -15.930285, -3.172897, 7.334072, -23.352707, -104.70175, -6.7622356]
2025-08-07 11:07:40,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [55.0, 34.0, 23.0, 1000.0, 36.0, 55.0, 70.0, 27.0, 1000.0, 51.0]
2025-08-07 11:07:40,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 36 minutes, 53 seconds)
2025-08-07 11:09:29,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:09:29,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: 0.70323 ± 16.843
2025-08-07 11:09:29,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [3.5624185, -1.6705686, 16.326166, 23.966244, -37.038696, -14.63523, -3.4099047, 1.0009711, 19.651827, -0.72088796]
2025-08-07 11:09:29,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [18.0, 23.0, 37.0, 38.0, 42.0, 55.0, 24.0, 28.0, 31.0, 23.0]
2025-08-07 11:09:29,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 35 minutes, 35 seconds)
2025-08-07 11:11:07,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:11:08,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -0.75109 ± 9.195
2025-08-07 11:11:08,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-7.837263, 0.8541575, 11.74491, 11.731569, -11.267164, -0.26921877, -2.4723897, -16.890635, 9.413176, -2.5180311]
2025-08-07 11:11:08,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [54.0, 24.0, 30.0, 26.0, 24.0, 38.0, 25.0, 24.0, 29.0, 22.0]
2025-08-07 11:11:08,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 34 minutes, 6 seconds)
2025-08-07 11:12:45,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:12:46,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -8.52740 ± 21.840
2025-08-07 11:12:46,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-22.700294, 2.7003138, 7.695012, -7.0257535, -10.897282, -5.9561024, -63.343224, -0.88607764, -9.519093, 24.658457]
2025-08-07 11:12:46,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [25.0, 26.0, 22.0, 44.0, 24.0, 33.0, 116.0, 25.0, 26.0, 29.0]
2025-08-07 11:12:46,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 32 minutes, 4 seconds)
2025-08-07 11:14:19,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:14:19,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -1.81883 ± 18.513
2025-08-07 11:14:19,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-13.11513, 4.5341253, 9.707677, -10.132504, 12.282274, -8.359077, 10.629555, 23.085283, -46.949097, 0.12858315]
2025-08-07 11:14:19,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [22.0, 19.0, 22.0, 27.0, 25.0, 29.0, 24.0, 29.0, 66.0, 25.0]
2025-08-07 11:14:19,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 30 minutes, 4 seconds)
2025-08-07 11:15:59,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:16:00,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -32.45849 ± 32.590
2025-08-07 11:16:00,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-6.054001, -69.55165, -69.12468, -94.61504, -28.538216, 9.310414, -30.23295, -20.93701, 2.2868094, -17.128595]
2025-08-07 11:16:00,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [27.0, 96.0, 79.0, 46.0, 38.0, 41.0, 25.0, 43.0, 25.0, 41.0]
2025-08-07 11:16:00,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 28 minutes, 17 seconds)
2025-08-07 11:17:38,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:17:39,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -12.07857 ± 12.417
2025-08-07 11:17:39,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-17.595045, -2.9842074, -26.37595, 3.6069858, -31.824966, 4.28284, -0.89381534, -25.96239, -10.484556, -12.554601]
2025-08-07 11:17:39,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [23.0, 28.0, 81.0, 24.0, 29.0, 24.0, 26.0, 54.0, 59.0, 35.0]
2025-08-07 11:17:39,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 26 minutes, 7 seconds)
2025-08-07 11:19:26,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:19:27,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -17.20819 ± 23.012
2025-08-07 11:19:27,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-7.080427, -29.619373, 6.56419, -12.3624115, -75.974846, -19.19269, -25.081018, 9.34613, -16.483875, -2.1976142]
2025-08-07 11:19:27,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 27.0, 23.0, 34.0, 61.0, 23.0, 32.0, 25.0, 25.0, 38.0]
2025-08-07 11:19:27,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 24 minutes, 57 seconds)
2025-08-07 11:21:04,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:21:05,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -11.78392 ± 27.093
2025-08-07 11:21:05,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-44.767216, -8.804854, -43.09916, 22.538113, -48.327827, -31.367737, -5.9924374, -1.0080398, 22.337732, 20.652224]
2025-08-07 11:21:05,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [58.0, 22.0, 48.0, 51.0, 38.0, 61.0, 49.0, 37.0, 37.0, 36.0]
2025-08-07 11:21:05,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 23 minutes, 17 seconds)
2025-08-07 11:22:39,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:22:39,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -3.91325 ± 31.618
2025-08-07 11:22:39,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [16.047167, 13.762967, 6.6668477, 5.2475815, 4.1511745, 21.294548, -92.487434, -23.00485, 0.3977174, 8.791829]
2025-08-07 11:22:39,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 26.0, 43.0, 26.0, 25.0, 36.0, 113.0, 25.0, 22.0, 38.0]
2025-08-07 11:22:39,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 21 minutes, 39 seconds)
2025-08-07 11:24:19,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:24:21,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -18.13049 ± 57.660
2025-08-07 11:24:21,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [4.850662, -189.56311, 4.23787, 0.24422224, -6.870912, -13.584924, 7.8279133, -6.924518, 14.071821, 4.406118]
2025-08-07 11:24:21,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [25.0, 1000.0, 25.0, 24.0, 23.0, 47.0, 26.0, 24.0, 70.0, 19.0]
2025-08-07 11:24:21,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 2 seconds)
2025-08-07 11:26:02,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:26:03,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -42.93367 ± 129.916
2025-08-07 11:26:03,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-17.19291, 5.1069975, -5.976393, 0.4715311, 9.393874, -431.7899, 0.56441724, 17.309954, -4.202365, -3.0218909]
2025-08-07 11:26:03,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [34.0, 28.0, 24.0, 24.0, 83.0, 1000.0, 26.0, 39.0, 14.0, 24.0]
2025-08-07 11:26:03,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 29 seconds)
2025-08-07 11:27:40,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:27:42,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -15.21891 ± 46.155
2025-08-07 11:27:42,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [14.923053, 4.642199, -0.4061461, 12.239492, 9.573669, 0.20194332, -149.83218, -10.347819, -20.208452, -12.974852]
2025-08-07 11:27:42,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [62.0, 25.0, 36.0, 15.0, 24.0, 48.0, 1000.0, 24.0, 51.0, 30.0]
2025-08-07 11:27:42,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 29 seconds)
2025-08-07 11:29:23,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:29:27,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -36.69020 ± 80.421
2025-08-07 11:29:27,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-214.80113, -175.08926, 1.6684496, 7.5060663, 2.0276396, 11.192292, -22.482077, 15.221562, 17.78834, -9.93386]
2025-08-07 11:29:27,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 26.0, 33.0, 13.0, 26.0, 33.0, 24.0, 25.0, 24.0]
2025-08-07 11:29:27,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 3 seconds)
2025-08-07 11:31:03,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:31:04,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -4.36153 ± 16.762
2025-08-07 11:31:04,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-11.286252, -0.14350812, 9.59774, 13.380453, -15.825506, 12.581071, -42.72823, -17.176641, -0.47091064, 8.456438]
2025-08-07 11:31:04,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 25.0, 23.0, 20.0, 26.0, 24.0, 56.0, 26.0, 25.0, 34.0]
2025-08-07 11:31:04,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 27 seconds)
2025-08-07 11:32:52,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:32:54,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -17.87915 ± 33.780
2025-08-07 11:32:54,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [14.704366, 3.5327396, 0.45103887, -10.116702, -41.97853, -13.73285, 1.4302564, 4.915194, -31.733326, -106.26369]
2025-08-07 11:32:54,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [24.0, 25.0, 23.0, 12.0, 55.0, 50.0, 24.0, 36.0, 25.0, 1000.0]
2025-08-07 11:32:54,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 58 seconds)
2025-08-07 11:34:30,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:34:30,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -9.25747 ± 11.691
2025-08-07 11:34:30,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-17.753088, -9.700955, -6.684297, 3.5947769, 4.6485987, -1.1535388, -38.259487, -5.281763, -10.421741, -11.563199]
2025-08-07 11:34:30,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [38.0, 23.0, 27.0, 25.0, 50.0, 24.0, 70.0, 25.0, 25.0, 34.0]
2025-08-07 11:34:30,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 7 seconds)
2025-08-07 11:36:14,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:36:15,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -26.38793 ± 61.717
2025-08-07 11:36:15,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-4.2670407, -3.2830307, 6.9511185, -26.303368, -207.41412, 6.015772, -12.150992, -31.41822, -1.5047942, 9.49533]
2025-08-07 11:36:15,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [25.0, 25.0, 25.0, 44.0, 1000.0, 26.0, 58.0, 68.0, 24.0, 31.0]
2025-08-07 11:36:15,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 33 seconds)
2025-08-07 11:37:46,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:37:47,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: 4.05887 ± 13.704
2025-08-07 11:37:47,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-10.669776, 34.607327, -12.383878, -2.1860619, 10.061997, -4.8741655, 8.358482, -5.7528906, 17.664228, 5.763448]
2025-08-07 11:37:47,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [49.0, 41.0, 22.0, 23.0, 28.0, 24.0, 26.0, 38.0, 33.0, 46.0]
2025-08-07 11:37:47,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1226 [INFO]: New best (4.06) for latency MM1Queue_a033_s075
2025-08-07 11:37:47,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 40 seconds)
2025-08-07 11:39:25,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:39:25,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -2.17120 ± 14.874
2025-08-07 11:39:25,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-1.5383798, 5.681685, 3.4922595, 18.132778, -22.070818, 13.953959, -19.73581, -2.1874578, -26.825357, 9.385186]
2025-08-07 11:39:25,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 23.0, 30.0, 26.0, 38.0, 27.0, 41.0, 26.0, 64.0, 31.0]
2025-08-07 11:39:25,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes)
2025-08-07 11:41:14,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:41:14,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: 0.82923 ± 10.732
2025-08-07 11:41:14,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [22.99451, 0.0661218, 5.354236, -17.951597, 3.1549084, -0.70666945, -4.1292377, -9.488743, 12.278089, -3.2792706]
2025-08-07 11:41:14,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [23.0, 25.0, 25.0, 47.0, 23.0, 25.0, 25.0, 38.0, 29.0, 26.0]
2025-08-07 11:41:14,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 20 seconds)
2025-08-07 11:42:52,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:42:53,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -8.65938 ± 29.892
2025-08-07 11:42:53,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-35.320972, 8.182998, 16.004105, -8.425316, -8.345316, -26.680138, -78.36032, 5.1366863, 31.697357, 9.517132]
2025-08-07 11:42:53,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [48.0, 26.0, 52.0, 44.0, 25.0, 23.0, 55.0, 35.0, 41.0, 26.0]
2025-08-07 11:42:53,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 40 seconds)
2025-08-07 11:44:29,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:44:29,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -6.08633 ± 19.678
2025-08-07 11:44:29,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-5.607213, -3.1279986, 12.282204, 2.0273924, 8.818527, -44.36594, -21.763248, -34.572624, 14.708773, 10.736853]
2025-08-07 11:44:29,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 24.0, 25.0, 26.0, 30.0, 60.0, 34.0, 55.0, 34.0, 24.0]
2025-08-07 11:44:29,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1251 [DEBUG]: Training session finished
