2025-05-13 09:06:23,590 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mda-highdim-mem2
2025-05-13 09:06:23,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mda-highdim-mem2
2025-05-13 09:06:23,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x146a5e7ca310>}
2025-05-13 09:06:23,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:23,596 baseline-bpql-mda-noisy-ant:91 [WARNING]: args.assumed_delay != args.horizon: 2 != 24
2025-05-13 09:06:23,596 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1133 [INFO]: Creating new trainer
2025-05-13 09:06:23,604 baseline-bpql-mda-noisy-ant:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-05-13 09:06:23,605 baseline-bpql-mda-noisy-ant:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:23,611 baseline-bpql-mda-noisy-ant:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=27, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(8, 512, batch_first=True)
)
2025-05-13 09:06:24,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:24,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 1/100
2025-05-13 09:10:13,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:10:27,482 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -61.86164 ± 132.168
2025-05-13 09:10:27,482 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [25.057434, -2.7392156, -77.712296, 32.893185, -62.89915, -76.7591, -380.679, -201.37509, 31.243088, 94.35371]
2025-05-13 09:10:27,482 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:10:27,483 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (-61.86) for latency ExtremeClogL1U23
2025-05-13 09:10:27,490 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 6 hours, 41 minutes, 19 seconds)
2025-05-13 09:14:37,260 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:14:51,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 423.45990 ± 73.626
2025-05-13 09:14:51,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [427.38306, 390.92914, 487.83527, 372.08313, 618.43256, 372.53317, 377.96857, 415.30554, 406.18857, 365.94025]
2025-05-13 09:14:51,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:14:51,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (423.46) for latency ExtremeClogL1U23
2025-05-13 09:14:51,546 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 54 minutes, 16 seconds)
2025-05-13 09:18:34,834 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:18:47,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -699.19873 ± 254.185
2025-05-13 09:18:47,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-622.6582, -725.0313, -778.99927, -951.93634, -681.72534, -66.91558, -504.27457, -981.2381, -917.9348, -761.27466]
2025-05-13 09:18:47,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 213.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:18:47,670 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 6 hours, 40 minutes, 36 seconds)
2025-05-13 09:22:44,258 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:22:58,235 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -248.43661 ± 194.272
2025-05-13 09:22:58,235 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-161.15907, -456.74393, -500.1364, -185.36887, -111.229744, 41.229786, -432.48145, -484.34964, 5.3790717, -199.50615]
2025-05-13 09:22:58,235 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:22:58,240 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 6 hours, 37 minutes, 35 seconds)
2025-05-13 09:26:44,890 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:26:55,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -481.26822 ± 405.504
2025-05-13 09:26:55,386 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-143.22736, -446.52835, -15.658588, -372.06482, -570.65076, -462.36588, -175.0693, -354.90372, -733.3425, -1538.8707]
2025-05-13 09:26:55,386 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [442.0, 1000.0, 10.0, 753.0, 1000.0, 1000.0, 528.0, 887.0, 1000.0, 1000.0]
2025-05-13 09:26:55,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 6 hours, 29 minutes, 51 seconds)
2025-05-13 09:31:02,983 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:31:15,536 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1098.82520 ± 465.928
2025-05-13 09:31:15,536 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1268.3932, -1360.4052, -1314.0852, -1479.6825, -1467.835, -476.0211, -1144.3691, -3.3570642, -1479.8798, -994.2236]
2025-05-13 09:31:15,536 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 8.0, 1000.0, 1000.0]
2025-05-13 09:31:15,540 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 6 hours, 31 minutes, 3 seconds)
2025-05-13 09:35:07,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:35:20,430 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1943.92676 ± 671.734
2025-05-13 09:35:20,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-2021.944, -1736.5883, -2278.7917, -2190.3625, -2249.3054, -2236.469, -2140.1184, 10.024251, -2250.21, -2345.502]
2025-05-13 09:35:20,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 18.0, 1000.0, 1000.0]
2025-05-13 09:35:20,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 6 hours, 20 minutes, 57 seconds)
2025-05-13 09:39:08,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:39:14,231 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -371.36261 ± 465.223
2025-05-13 09:39:14,231 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [7.488269, -39.592216, -41.019802, -689.3116, -86.38973, -756.53827, -17.972023, -1447.6438, -623.538, -19.108644]
2025-05-13 09:39:14,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [69.0, 33.0, 50.0, 1000.0, 190.0, 1000.0, 34.0, 1000.0, 1000.0, 33.0]
2025-05-13 09:39:14,240 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 6 hours, 16 minutes, 8 seconds)
2025-05-13 09:42:55,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:43:06,631 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1020.16162 ± 514.842
2025-05-13 09:43:06,631 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1181.4253, -1334.4452, -10.457971, -933.8956, -1541.7506, -122.14203, -1354.9318, -976.97723, -1542.7511, -1202.8401]
2025-05-13 09:43:06,631 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 16.0, 1000.0, 1000.0, 155.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:43:06,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 6 minutes, 32 seconds)
2025-05-13 09:47:08,245 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:47:21,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -827.00372 ± 477.607
2025-05-13 09:47:21,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1165.1367, -1662.8168, -123.28639, -351.4091, -1262.3188, -340.0408, -1123.1814, -991.16693, -874.4538, -376.226]
2025-05-13 09:47:21,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 195.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:47:21,087 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 6 hours, 7 minutes, 42 seconds)
2025-05-13 09:50:57,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:51:11,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1405.10022 ± 250.766
2025-05-13 09:51:11,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1083.3218, -1133.3372, -1576.5349, -1426.2367, -1674.4309, -1191.208, -1584.0316, -1685.3091, -1653.5226, -1043.0693]
2025-05-13 09:51:11,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:51:11,985 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 54 minutes, 56 seconds)
2025-05-13 09:55:18,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:55:30,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -767.94775 ± 330.352
2025-05-13 09:55:30,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-668.22687, -1186.8058, -723.8344, -1165.3369, -1093.3387, -692.9061, -720.9602, -28.735607, -503.96652, -895.36694]
2025-05-13 09:55:30,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 32.0, 421.0, 1000.0]
2025-05-13 09:55:30,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 54 minutes, 51 seconds)
2025-05-13 09:59:01,303 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:59:13,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1115.41821 ± 397.442
2025-05-13 09:59:13,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1233.8942, -27.236046, -1347.2231, -1375.7433, -1498.7303, -1370.4299, -1016.6702, -1150.411, -939.87164, -1193.9712]
2025-05-13 09:59:13,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 31.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:59:13,854 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 47 minutes, 53 seconds)
2025-05-13 10:03:09,648 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:03:23,518 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -927.39929 ± 324.780
2025-05-13 10:03:23,519 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-813.8624, -624.41327, -1592.119, -688.9702, -704.48627, -1012.15875, -795.80786, -1500.2573, -778.81555, -763.1025]
2025-05-13 10:03:23,519 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:03:23,527 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 48 minutes, 50 seconds)
2025-05-13 10:07:32,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:07:43,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -802.90283 ± 466.451
2025-05-13 10:07:43,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-469.1845, -714.6513, -953.0461, -24.091831, -828.96625, -736.0495, -687.5013, -689.76074, -950.4985, -1975.2781]
2025-05-13 10:07:43,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [545.0, 1000.0, 1000.0, 23.0, 631.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:07:43,736 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 46 minutes, 25 seconds)
2025-05-13 10:11:16,415 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:11:23,801 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -782.12732 ± 838.441
2025-05-13 10:11:23,801 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1134.6636, -561.60736, -2038.0583, -1561.1797, -2219.6428, -17.778557, -37.649734, -94.74357, -120.09932, -35.850952]
2025-05-13 10:11:23,801 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 16.0, 46.0, 151.0, 79.0, 19.0]
2025-05-13 10:11:23,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 39 minutes, 18 seconds)
2025-05-13 10:15:19,527 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:15:28,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -834.45520 ± 840.523
2025-05-13 10:15:28,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-530.4378, 4.2874427, -42.851116, -1667.407, -1921.0615, -37.381058, -2100.2036, -1662.7504, -87.95073, -298.79593]
2025-05-13 10:15:28,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 33.0, 39.0, 1000.0, 1000.0, 23.0, 1000.0, 1000.0, 50.0, 1000.0]
2025-05-13 10:15:28,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 31 minutes, 29 seconds)
2025-05-13 10:19:31,482 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:19:42,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -995.64697 ± 529.205
2025-05-13 10:19:42,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-18.043333, -1521.0635, -1337.3469, -1692.6385, -29.489145, -1052.9489, -1105.3759, -1099.0273, -1030.117, -1070.4192]
2025-05-13 10:19:42,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [31.0, 1000.0, 1000.0, 1000.0, 24.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:19:42,756 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 35 minutes, 53 seconds)
2025-05-13 10:23:26,090 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:23:38,763 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1307.38623 ± 553.190
2025-05-13 10:23:38,763 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1434.682, -1665.8656, -1616.3895, -11.120545, -1440.9076, -1655.0835, -614.0803, -1895.4391, -1658.7773, -1081.5164]
2025-05-13 10:23:38,763 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 17.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:23:38,769 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 28 minutes, 6 seconds)
2025-05-13 10:27:37,351 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:27:47,542 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -612.34656 ± 556.587
2025-05-13 10:27:47,542 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-842.55383, -302.88443, 34.867905, -623.8312, -68.73629, -850.54047, -629.2794, -829.1043, -51.46602, -1959.9381]
2025-05-13 10:27:47,542 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 109.0, 1000.0, 46.0, 1000.0, 1000.0, 1000.0, 62.0, 1000.0]
2025-05-13 10:27:47,549 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 21 minutes, 1 second)
2025-05-13 10:31:28,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:31:35,770 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -511.93472 ± 502.847
2025-05-13 10:31:35,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-870.7832, -67.081635, -882.41724, -809.5994, -5.928511, -647.9799, -1584.5166, -60.114304, -94.79552, -96.13064]
2025-05-13 10:31:35,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 51.0, 1000.0, 1000.0, 25.0, 1000.0, 1000.0, 108.0, 69.0, 92.0]
2025-05-13 10:31:35,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 19 minutes, 9 seconds)
2025-05-13 10:35:35,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:35:45,888 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -942.31659 ± 677.323
2025-05-13 10:35:45,888 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1329.7845, -474.54276, -1954.5597, -49.75824, -75.642426, -1308.3467, -1328.7368, -1414.0796, -15.273729, -1472.4414]
2025-05-13 10:35:45,888 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 46.0, 49.0, 1000.0, 1000.0, 1000.0, 14.0, 1000.0]
2025-05-13 10:35:45,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 16 minutes, 34 seconds)
2025-05-13 10:39:41,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:39:54,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1014.14062 ± 531.259
2025-05-13 10:39:54,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1019.85443, -924.2773, -1214.9697, -1833.1255, -212.45546, -1001.4124, -237.30882, -1084.2871, -1890.1969, -723.5182]
2025-05-13 10:39:54,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 205.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:39:54,551 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 11 minutes, 1 second)
2025-05-13 10:43:28,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:43:37,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -680.21112 ± 460.726
2025-05-13 10:43:37,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-984.3499, -47.782352, -0.24108645, -638.4483, -1148.0887, -828.21173, -16.253122, -919.0641, -1280.5049, -939.1669]
2025-05-13 10:43:37,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 53.0, 12.0, 1000.0, 1000.0, 1000.0, 21.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:43:37,927 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 3 minutes, 47 seconds)
2025-05-13 10:47:48,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:47:52,708 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -362.04767 ± 527.168
2025-05-13 10:47:52,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-927.5524, -93.8981, -1551.1428, -7.19877, -28.650028, -6.0490637, -1.1316884, -898.8116, -63.997, -42.045506]
2025-05-13 10:47:52,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 91.0, 1000.0, 40.0, 22.0, 28.0, 8.0, 1000.0, 98.0, 48.0]
2025-05-13 10:47:52,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 1 minute, 17 seconds)
2025-05-13 10:51:32,427 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:51:43,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -755.76416 ± 522.770
2025-05-13 10:51:43,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-63.61099, -708.96344, -447.97314, -27.133532, -1024.1956, -1221.814, -1030.9976, -1593.0499, -1247.8511, -192.05276]
2025-05-13 10:51:43,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [102.0, 1000.0, 1000.0, 67.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 452.0]
2025-05-13 10:51:43,076 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 57 minutes, 48 seconds)
2025-05-13 10:55:50,672 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:56:03,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -881.75867 ± 398.237
2025-05-13 10:56:03,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-26.84801, -1049.2866, -954.123, -771.94305, -812.7433, -1561.4366, -757.3658, -829.77563, -1402.919, -651.1457]
2025-05-13 10:56:03,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:56:03,312 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 56 minutes, 14 seconds)
2025-05-13 10:59:41,493 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:59:50,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -824.76630 ± 730.802
2025-05-13 10:59:50,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1770.2491, -1530.8096, -3.605014, -630.52203, -1426.6279, -1791.4045, -15.566336, -40.5558, -1010.60706, -27.715876]
2025-05-13 10:59:50,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 16.0, 1000.0, 1000.0, 1000.0, 23.0, 35.0, 1000.0, 58.0]
2025-05-13 10:59:50,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 46 minutes, 55 seconds)
2025-05-13 11:03:46,577 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:03:55,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -728.91138 ± 650.342
2025-05-13 11:03:55,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1487.8009, -1118.0481, -63.59856, -1.73328, -878.7776, -19.783978, -1862.1332, -691.048, -1.1997275, -1164.9901]
2025-05-13 11:03:55,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 110.0, 31.0, 1000.0, 32.0, 1000.0, 1000.0, 30.0, 1000.0]
2025-05-13 11:03:55,347 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 48 minutes, 7 seconds)
2025-05-13 11:07:37,838 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:07:45,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -616.07532 ± 668.199
2025-05-13 11:07:45,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-18.617582, -1029.9509, -1582.2527, -5.605951, -1699.892, -633.4732, -10.365961, 7.2793155, -2.718013, -1185.1562]
2025-05-13 11:07:45,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [14.0, 1000.0, 1000.0, 21.0, 1000.0, 1000.0, 12.0, 20.0, 14.0, 1000.0]
2025-05-13 11:07:45,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 38 minutes, 13 seconds)
2025-05-13 11:11:46,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:11:53,596 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -807.71588 ± 812.039
2025-05-13 11:11:53,596 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1318.632, -1736.4667, -1849.9515, -30.324125, -20.627794, -45.195843, 18.504684, -1274.635, -1820.2401, 0.4098102]
2025-05-13 11:11:53,596 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 31.0, 55.0, 50.0, 28.0, 1000.0, 1000.0, 18.0]
2025-05-13 11:11:53,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 38 minutes, 25 seconds)
2025-05-13 11:15:32,087 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:15:36,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -209.20345 ± 317.155
2025-05-13 11:15:36,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-441.3509, -28.824484, -6.564151, -629.49854, -6.6088486, -20.64664, -21.397879, -922.2523, -4.742092, -10.14855]
2025-05-13 11:15:36,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 71.0, 30.0, 1000.0, 79.0, 21.0, 45.0, 1000.0, 34.0, 20.0]
2025-05-13 11:15:36,735 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 25 minutes, 58 seconds)
2025-05-13 11:19:29,662 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:19:35,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -414.34344 ± 513.462
2025-05-13 11:19:35,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-936.17395, -12.000693, -35.35582, -25.202147, -11.124843, -674.54865, 0.13999602, -962.9036, -40.2894, -1445.9755]
2025-05-13 11:19:35,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 22.0, 60.0, 23.0, 34.0, 1000.0, 65.0, 1000.0, 31.0, 1000.0]
2025-05-13 11:19:35,612 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 24 minutes, 46 seconds)
2025-05-13 11:23:35,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:23:44,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -785.87915 ± 655.240
2025-05-13 11:23:44,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-697.36414, -1372.7501, -8.371985, -1475.1311, -10.532405, -1422.7413, -15.190194, -102.06082, -1160.9827, -1593.6675]
2025-05-13 11:23:44,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 12.0, 1000.0, 16.0, 1000.0, 19.0, 81.0, 1000.0, 1000.0]
2025-05-13 11:23:44,600 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 21 minutes, 38 seconds)
2025-05-13 11:27:46,612 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:27:59,308 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1353.22693 ± 606.670
2025-05-13 11:27:59,309 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1036.5668, -1487.5155, -34.75138, -1563.4767, -1390.8501, -1299.4067, -2228.389, -1234.6937, -978.7023, -2277.9172]
2025-05-13 11:27:59,309 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 47.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:27:59,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 23 minutes, 5 seconds)
2025-05-13 11:31:45,653 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:31:58,483 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1009.96698 ± 382.717
2025-05-13 11:31:58,483 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-927.9484, -10.362249, -939.6118, -1285.8771, -1160.2599, -794.727, -1330.2573, -1421.7863, -1200.4158, -1028.4244]
2025-05-13 11:31:58,483 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 35.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:31:58,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 17 minutes, 2 seconds)
2025-05-13 11:35:56,136 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:36:07,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -504.72015 ± 348.254
2025-05-13 11:36:07,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [49.61887, -617.2851, -663.6771, -755.99384, -14.367073, -650.5252, -652.94415, -712.91833, -26.07626, -1003.03326]
2025-05-13 11:36:07,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 13.0, 1000.0, 1000.0, 1000.0, 93.0, 1000.0]
2025-05-13 11:36:07,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 18 minutes, 25 seconds)
2025-05-13 11:39:45,631 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:39:58,226 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -717.13513 ± 284.434
2025-05-13 11:39:58,226 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-858.07947, -788.95905, -785.2218, -854.73535, -1227.8966, -617.11346, -23.878813, -707.43555, -675.03827, -632.9938]
2025-05-13 11:39:58,226 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 23.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:39:58,233 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 12 minutes, 40 seconds)
2025-05-13 11:43:41,013 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:43:53,462 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -505.57462 ± 500.791
2025-05-13 11:43:53,462 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-75.14438, -799.9298, -1623.908, -376.86124, -19.136915, -97.27838, 114.07548, -848.4268, -582.8317, -746.30475]
2025-05-13 11:43:53,462 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 34.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:43:53,478 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 5 minutes, 48 seconds)
2025-05-13 11:47:56,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:48:09,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -529.27679 ± 525.482
2025-05-13 11:48:09,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1335.1816, -34.487164, -1062.0938, 91.68441, -1208.1051, -390.05212, -948.8771, -266.49316, -227.01811, 87.855316]
2025-05-13 11:48:09,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 75.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:48:09,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 1 minute, 56 seconds)
2025-05-13 11:52:07,913 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:52:19,563 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -834.56915 ± 708.999
2025-05-13 11:52:19,564 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-2224.8792, -214.51457, -486.08572, -1135.5907, -339.0452, -162.98483, -1894.5621, -39.647804, -803.3686, -1045.0122]
2025-05-13 11:52:19,564 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 298.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 90.0, 1000.0, 1000.0]
2025-05-13 11:52:19,572 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 8 seconds)
2025-05-13 11:55:57,124 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:56:04,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -600.66876 ± 569.364
2025-05-13 11:56:04,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1069.0907, -1703.8577, -340.93842, -105.32331, -726.77386, -47.855984, -1305.6798, -662.9203, -4.3109994, -39.936543]
2025-05-13 11:56:04,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 245.0, 120.0, 1000.0, 69.0, 1000.0, 1000.0, 66.0, 42.0]
2025-05-13 11:56:04,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 51 minutes, 30 seconds)
2025-05-13 11:59:52,075 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:00:06,337 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1153.87622 ± 141.928
2025-05-13 12:00:06,337 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1371.7112, -1283.381, -1001.45325, -841.91077, -1205.4967, -1086.6635, -1130.7274, -1195.5603, -1192.3544, -1229.503]
2025-05-13 12:00:06,337 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:00:06,347 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 49 minutes, 32 seconds)
2025-05-13 12:03:56,365 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:04:07,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1005.18231 ± 618.762
2025-05-13 12:04:07,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-42.63916, -2231.8743, -838.6255, -1042.376, -996.33307, -28.14013, -1039.9685, -1476.4779, -1433.3702, -922.01886]
2025-05-13 12:04:07,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [50.0, 1000.0, 1000.0, 1000.0, 1000.0, 32.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:04:07,854 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 46 minutes, 41 seconds)
2025-05-13 12:08:01,600 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:08:10,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -750.14459 ± 715.567
2025-05-13 12:08:10,526 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1068.4316, -14.836825, -278.64664, -2034.2473, -24.358109, -1269.9668, -78.6057, -1597.1436, -31.341543, -1103.8677]
2025-05-13 12:08:10,526 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 70.0, 1000.0, 1000.0, 56.0, 1000.0, 113.0, 1000.0, 33.0, 1000.0]
2025-05-13 12:08:10,532 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 40 minutes, 16 seconds)
2025-05-13 12:12:15,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:12:21,596 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -634.67645 ± 730.425
2025-05-13 12:12:21,596 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-26.087524, -1578.1698, -1289.1305, -10.938037, -91.36627, -9.278517, -41.6379, -80.292854, -1540.2921, -1679.5715]
2025-05-13 12:12:21,596 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 1000.0, 1000.0, 18.0, 294.0, 15.0, 142.0, 48.0, 1000.0, 1000.0]
2025-05-13 12:12:21,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 36 minutes, 21 seconds)
2025-05-13 12:16:18,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:16:28,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -912.98419 ± 589.542
2025-05-13 12:16:28,165 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-91.85416, -5.4036818, -35.57599, -1401.0844, -1278.8866, -946.5496, -1464.6105, -1526.6814, -1206.8221, -1172.3734]
2025-05-13 12:16:28,165 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [85.0, 31.0, 40.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:16:28,176 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 36 minutes, 7 seconds)
2025-05-13 12:19:59,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:20:09,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -697.95721 ± 503.170
2025-05-13 12:20:09,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-840.45984, -1183.5845, -1418.4836, -10.626978, -10.243991, -870.63, -903.2011, -21.526844, -522.1435, -1198.6721]
2025-05-13 12:20:09,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 26.0, 26.0, 1000.0, 1000.0, 42.0, 1000.0, 1000.0]
2025-05-13 12:20:09,570 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 28 minutes, 33 seconds)
2025-05-13 12:24:17,493 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:24:31,620 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1478.66467 ± 563.227
2025-05-13 12:24:31,620 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1182.6401, -1091.0464, -1951.8866, -770.81494, -2089.9302, -1574.4727, -1346.452, -526.885, -1946.7784, -2305.7405]
2025-05-13 12:24:31,620 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:24:31,628 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 28 minutes, 2 seconds)
2025-05-13 12:28:24,264 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:28:37,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -621.59668 ± 206.150
2025-05-13 12:28:37,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-771.7901, -571.904, -696.19116, -654.64233, -674.06647, -502.72052, -699.65625, -62.49044, -799.9279, -782.5779]
2025-05-13 12:28:37,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 127.0, 1000.0, 1000.0]
2025-05-13 12:28:37,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 24 minutes, 27 seconds)
2025-05-13 12:32:21,211 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:32:32,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -866.73517 ± 467.922
2025-05-13 12:32:32,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-80.15553, -1164.6016, -1470.3993, 5.879541, -845.8139, -1012.7285, -1324.6746, -881.4734, -710.313, -1183.0718]
2025-05-13 12:32:32,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [133.0, 1000.0, 1000.0, 26.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:32:32,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 17 minutes, 46 seconds)
2025-05-13 12:36:24,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:36:37,057 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1295.11462 ± 451.591
2025-05-13 12:36:37,057 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1088.5273, -1841.0281, -1508.46, -1500.8701, -935.0834, -295.04327, -1523.6362, -1914.3969, -1258.6798, -1085.4211]
2025-05-13 12:36:37,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 275.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:36:37,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 13 minutes, 25 seconds)
2025-05-13 12:40:30,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:40:41,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -953.53693 ± 656.718
2025-05-13 12:40:41,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1921.7358, -1331.5248, -144.84726, -1350.9817, -44.517937, -1485.3145, -216.75539, -1492.302, -1228.4802, -318.90894]
2025-05-13 12:40:41,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 179.0, 1000.0, 37.0, 1000.0, 1000.0, 1000.0, 1000.0, 273.0]
2025-05-13 12:40:41,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 12 minutes, 57 seconds)
2025-05-13 12:44:38,576 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:44:51,088 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1057.21008 ± 472.749
2025-05-13 12:44:51,088 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1343.2688, -1384.383, -651.8428, -1531.8448, -786.24286, -1252.8856, 2.757631, -1469.3164, -723.603, -1431.472]
2025-05-13 12:44:51,088 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 23.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:44:51,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 6 minutes, 59 seconds)
2025-05-13 12:48:30,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:48:40,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -909.01367 ± 639.613
2025-05-13 12:48:40,861 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-513.92285, -1278.9305, -1391.6564, -11.363766, -1443.824, -50.696495, -1260.0797, -1686.361, -40.125984, -1413.1761]
2025-05-13 12:48:40,861 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 15.0, 1000.0, 58.0, 1000.0, 1000.0, 93.0, 1000.0]
2025-05-13 12:48:40,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 32 seconds)
2025-05-13 12:52:37,713 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:52:46,451 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -744.40839 ± 589.972
2025-05-13 12:52:46,451 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-861.8859, -14.554852, -102.14595, -1455.2028, -121.140236, -1605.9872, -1095.9886, -1200.8616, -955.4705, -30.846071]
2025-05-13 12:52:46,451 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 37.0, 75.0, 1000.0, 125.0, 1000.0, 1000.0, 1000.0, 1000.0, 27.0]
2025-05-13 12:52:46,464 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 58 minutes, 2 seconds)
2025-05-13 12:56:41,770 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:56:53,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1203.20166 ± 586.096
2025-05-13 12:56:53,127 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-72.45724, -1942.0958, -1331.0471, -1243.787, -1567.0663, -1366.217, -1515.2205, -1423.116, -1470.6743, -100.33535]
2025-05-13 12:56:53,127 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [59.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 126.0]
2025-05-13 12:56:53,137 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 54 minutes, 18 seconds)
2025-05-13 13:00:49,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:01:01,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1051.42505 ± 657.697
2025-05-13 13:01:01,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1738.4493, -65.394966, -1541.011, -1708.2332, -1506.8949, -555.2074, -1263.6688, -1594.1946, 43.71593, -584.91156]
2025-05-13 13:01:01,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 58.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:01:01,905 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 50 minutes, 54 seconds)
2025-05-13 13:04:33,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:04:44,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -752.23181 ± 638.770
2025-05-13 13:04:44,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1066.2386, -670.4086, -1880.0189, -20.020077, -506.26288, -108.424385, -496.7326, -405.75705, -431.43823, -1937.0177]
2025-05-13 13:04:44,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 39.0, 1000.0, 88.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:04:44,631 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 43 minutes, 6 seconds)
2025-05-13 13:08:37,373 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:08:49,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -950.02216 ± 553.880
2025-05-13 13:08:49,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1529.7491, -265.65573, -1298.0912, -1291.9338, -1030.4364, -34.680397, -1093.9141, -1259.6697, -125.67816, -1570.4128]
2025-05-13 13:08:49,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 34.0, 1000.0, 1000.0, 179.0, 1000.0]
2025-05-13 13:08:49,033 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 41 minutes, 5 seconds)
2025-05-13 13:12:45,037 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:12:58,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -901.62872 ± 667.603
2025-05-13 13:12:58,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-414.94852, -456.23862, -516.6565, -509.4449, -811.9541, -2055.863, -2316.6301, -436.79156, -509.6896, -988.0701]
2025-05-13 13:12:58,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:12:58,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 37 minutes, 36 seconds)
2025-05-13 13:16:48,279 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:16:56,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -859.60907 ± 702.442
2025-05-13 13:16:56,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-60.893013, -1556.6964, -39.12404, -41.92019, -1840.3578, -1280.766, -880.72284, -1402.4216, -36.423557, -1456.7655]
2025-05-13 13:16:56,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [38.0, 1000.0, 34.0, 30.0, 1000.0, 1000.0, 1000.0, 1000.0, 33.0, 1000.0]
2025-05-13 13:16:56,839 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 32 minutes, 28 seconds)
2025-05-13 13:20:56,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:21:10,409 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1455.48938 ± 424.433
2025-05-13 13:21:10,409 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-2193.392, -1309.1165, -1778.5818, -1471.9147, -1354.4546, -1461.9758, -1512.3248, -435.14023, -1334.3495, -1703.6433]
2025-05-13 13:21:10,409 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:21:10,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 29 minutes, 2 seconds)
2025-05-13 13:24:45,329 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:24:55,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1107.45728 ± 734.492
2025-05-13 13:24:55,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1503.4067, -1503.2823, -51.576515, -58.70348, -2008.6249, -1885.7653, -1067.0164, -46.071, -1633.5646, -1316.5626]
2025-05-13 13:24:55,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 26.0, 29.0, 1000.0, 1000.0, 1000.0, 60.0, 1000.0, 1000.0]
2025-05-13 13:24:55,409 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 25 minutes, 17 seconds)
2025-05-13 13:28:52,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:29:02,039 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1003.00598 ± 744.180
2025-05-13 13:29:02,039 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1344.2152, -777.3423, -1334.8503, -2095.252, -41.194893, -1967.3787, -864.8687, -1546.9347, -38.972065, -19.050835]
2025-05-13 13:29:02,039 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 38.0, 1000.0, 1000.0, 1000.0, 29.0, 15.0]
2025-05-13 13:29:02,048 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 21 minutes, 31 seconds)
2025-05-13 13:32:53,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:33:03,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -714.96863 ± 581.478
2025-05-13 13:33:03,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-421.47342, -41.752613, -1718.9698, -47.39389, -105.72524, -638.23145, -1384.4381, -1092.3097, -411.41086, -1287.9816]
2025-05-13 13:33:03,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 47.0, 1000.0, 23.0, 167.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:33:03,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 16 minutes, 33 seconds)
2025-05-13 13:37:04,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:37:11,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -844.43427 ± 827.722
2025-05-13 13:37:11,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-45.15601, -1701.2875, -4.5292916, -53.2597, -1668.9648, -1411.9951, -2031.5189, -28.863688, -1474.7867, -23.98045]
2025-05-13 13:37:11,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [168.0, 1000.0, 23.0, 32.0, 1000.0, 1000.0, 1000.0, 20.0, 1000.0, 22.0]
2025-05-13 13:37:11,632 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 13 minutes, 37 seconds)
2025-05-13 13:41:10,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:41:18,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -728.32196 ± 666.038
2025-05-13 13:41:18,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-53.560513, -1374.102, -689.9999, -13.622172, -24.705202, -1273.4253, -1914.1257, -14.846287, -1304.8129, -620.0194]
2025-05-13 13:41:18,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [35.0, 1000.0, 1000.0, 12.0, 14.0, 1000.0, 1000.0, 12.0, 1000.0, 1000.0]
2025-05-13 13:41:18,330 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 8 minutes, 50 seconds)
2025-05-13 13:45:10,641 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:45:23,293 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1082.25439 ± 392.328
2025-05-13 13:45:23,293 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1372.634, -1252.1394, -1144.2949, -916.8843, -1292.6229, -1568.8735, -46.672127, -1096.9381, -1208.1993, -923.28534]
2025-05-13 13:45:23,293 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 22.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:45:23,305 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 6 minutes, 52 seconds)
2025-05-13 13:48:52,628 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:49:01,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -906.72784 ± 735.082
2025-05-13 13:49:01,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-31.374996, -1591.0701, -1701.0055, -1516.0109, -26.640486, -1130.4125, -1692.5077, -29.0252, -1325.2456, -23.985422]
2025-05-13 13:49:01,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [15.0, 1000.0, 1000.0, 1000.0, 14.0, 1000.0, 1000.0, 24.0, 1000.0, 13.0]
2025-05-13 13:49:01,422 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 59 minutes, 56 seconds)
2025-05-13 13:52:57,090 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:53:07,202 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1071.63635 ± 775.825
2025-05-13 13:53:07,202 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1622.161, -34.56094, -1714.979, -1329.3945, 26.47866, -1438.1414, -696.5791, -2130.1199, -1752.0972, -24.808592]
2025-05-13 13:53:07,202 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 20.0, 1000.0, 1000.0, 54.0, 1000.0, 1000.0, 1000.0, 1000.0, 19.0]
2025-05-13 13:53:07,210 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 56 minutes, 20 seconds)
2025-05-13 13:57:09,014 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:57:23,464 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1267.72681 ± 343.552
2025-05-13 13:57:23,464 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1282.5112, -1024.5333, -1697.5183, -1731.7288, -906.5885, -879.5496, -1800.5715, -1354.4788, -1049.2817, -950.50555]
2025-05-13 13:57:23,464 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:57:23,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 53 minutes, 6 seconds)
2025-05-13 14:01:21,938 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:01:29,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -663.52307 ± 645.528
2025-05-13 14:01:29,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-32.03866, -1056.4602, -1537.8087, -10.424766, -27.908295, -24.75068, -50.833233, -1365.2029, -1364.6215, -1165.1815]
2025-05-13 14:01:29,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [15.0, 1000.0, 1000.0, 11.0, 18.0, 13.0, 28.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:01:29,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 48 minutes, 58 seconds)
2025-05-13 14:05:07,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:05:17,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -948.80505 ± 601.248
2025-05-13 14:05:17,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-20.804497, -1158.209, -1378.7095, -25.597103, -1309.0912, -1483.3516, -1407.5359, -71.63652, -1375.4357, -1257.6798]
2025-05-13 14:05:17,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [13.0, 1000.0, 1000.0, 13.0, 1000.0, 1000.0, 1000.0, 113.0, 1000.0, 1000.0]
2025-05-13 14:05:17,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 43 minutes, 28 seconds)
2025-05-13 14:09:17,064 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:09:20,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -360.80533 ± 647.050
2025-05-13 14:09:20,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-48.770336, -25.084732, -42.788593, -22.738033, -1673.7135, -28.812183, -18.318884, -68.84521, -43.72844, -1635.2533]
2025-05-13 14:09:20,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [31.0, 21.0, 24.0, 14.0, 1000.0, 17.0, 13.0, 41.0, 38.0, 1000.0]
2025-05-13 14:09:20,194 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 41 minutes, 33 seconds)
2025-05-13 14:13:02,934 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:13:07,388 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -470.91122 ± 656.491
2025-05-13 14:13:07,388 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1200.156, -1453.8062, -49.03199, -37.0772, -109.67638, -34.36223, -32.889954, -45.471813, -28.355482, -1718.2849]
2025-05-13 14:13:07,388 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 33.0, 15.0, 64.0, 16.0, 15.0, 22.0, 14.0, 1000.0]
2025-05-13 14:13:07,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 36 minutes)
2025-05-13 14:17:05,784 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:17:14,668 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -833.88263 ± 710.292
2025-05-13 14:17:14,668 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-107.61519, -1615.4261, -18.179691, -949.27094, -29.80285, -1668.058, -1494.8412, -30.599283, -704.11755, -1720.916]
2025-05-13 14:17:14,668 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [72.0, 1000.0, 22.0, 1000.0, 91.0, 1000.0, 1000.0, 15.0, 1000.0, 1000.0]
2025-05-13 14:17:14,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 31 minutes, 19 seconds)
2025-05-13 14:20:50,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:20:58,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -963.14032 ± 737.890
2025-05-13 14:20:58,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1503.1387, -1313.4086, -33.882565, -313.26257, -41.284504, -22.796423, -1542.7103, -1141.3611, -1727.453, -1992.1053]
2025-05-13 14:20:58,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 18.0, 170.0, 18.0, 12.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:20:58,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 25 minutes, 46 seconds)
2025-05-13 14:24:55,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:25:04,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -776.57782 ± 693.930
2025-05-13 14:25:04,581 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1804.0961, -54.35912, -1601.3068, -1217.9352, -137.56628, -755.29095, -565.94104, 19.614893, -1621.7441, -27.153439]
2025-05-13 14:25:04,581 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 21.0, 1000.0, 1000.0, 81.0, 1000.0, 1000.0, 37.0, 1000.0, 13.0]
2025-05-13 14:25:04,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 23 minutes, 6 seconds)
2025-05-13 14:28:52,136 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:29:02,429 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -996.32843 ± 633.921
2025-05-13 14:29:02,429 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-130.53282, -87.68444, -1615.6592, -1022.8199, -1274.4092, -973.55817, -173.5864, -1902.1495, -1098.562, -1684.3234]
2025-05-13 14:29:02,429 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [57.0, 44.0, 1000.0, 1000.0, 1000.0, 1000.0, 117.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:29:02,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 18 minutes, 48 seconds)
2025-05-13 14:32:57,037 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:33:07,478 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -901.81329 ± 578.343
2025-05-13 14:33:07,478 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1558.4319, -34.67954, -1853.1702, -1123.9109, -1286.4119, -293.11554, -786.0197, -1100.2103, -877.58167, -104.60113]
2025-05-13 14:33:07,478 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 31.0, 1000.0, 1000.0, 1000.0, 181.0, 1000.0, 1000.0, 1000.0, 50.0]
2025-05-13 14:33:07,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 16 minutes)
2025-05-13 14:37:01,713 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:37:13,196 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1135.68799 ± 623.057
2025-05-13 14:37:13,196 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-21.443018, -1636.9344, -1687.7208, -1569.268, -1611.2816, -1018.2067, -757.30383, -1528.2484, -1498.6838, -27.789772]
2025-05-13 14:37:13,196 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [14.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 28.0]
2025-05-13 14:37:13,205 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 11 minutes, 54 seconds)
2025-05-13 14:41:07,248 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:41:18,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1089.74536 ± 602.424
2025-05-13 14:41:18,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-953.86847, -1347.4585, -1046.0349, -1297.6952, -2185.2434, -48.971313, -1246.4413, -64.4715, -1333.9948, -1373.2733]
2025-05-13 14:41:18,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 28.0, 1000.0, 41.0, 1000.0, 1000.0]
2025-05-13 14:41:18,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 9 minutes, 6 seconds)
2025-05-13 14:45:10,537 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:45:23,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1216.27014 ± 475.911
2025-05-13 14:45:23,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-636.71094, -1638.1887, -69.59579, -1397.5171, -1602.7803, -1570.0818, -1351.3022, -1209.3446, -1146.077, -1541.1027]
2025-05-13 14:45:23,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 35.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:45:23,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 4 minutes, 59 seconds)
2025-05-13 14:49:18,463 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:49:29,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1294.15979 ± 735.037
2025-05-13 14:49:29,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-33.70203, -49.67803, -935.83624, -997.82184, -1914.667, -2161.032, -1668.5032, -1488.398, -1595.2595, -2096.6997]
2025-05-13 14:49:29,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [14.0, 48.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:49:29,793 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 1 minute, 22 seconds)
2025-05-13 14:53:14,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:53:27,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1064.87769 ± 372.378
2025-05-13 14:53:27,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1184.8561, -825.43677, -1847.517, -1308.5977, -776.9847, -693.3469, -933.76575, -1413.193, -536.2219, -1128.8563]
2025-05-13 14:53:27,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:53:27,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 56 minutes, 57 seconds)
2025-05-13 14:57:20,205 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:57:27,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -722.04688 ± 726.812
2025-05-13 14:57:27,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-24.444725, -758.7635, -30.438065, 12.452301, -17.750793, -1737.3196, -222.99245, -1155.1265, -1884.7549, -1401.3306]
2025-05-13 14:57:27,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [14.0, 1000.0, 31.0, 58.0, 14.0, 1000.0, 267.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:57:27,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 52 minutes, 38 seconds)
2025-05-13 15:01:22,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:01:31,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -805.66071 ± 654.046
2025-05-13 15:01:31,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-39.854996, -1723.649, -1225.5919, -73.52843, -1433.9532, -1251.6672, -1294.9213, -46.902653, -992.9797, 26.442125]
2025-05-13 15:01:31,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [20.0, 1000.0, 1000.0, 44.0, 1000.0, 1000.0, 1000.0, 27.0, 1000.0, 95.0]
2025-05-13 15:01:31,508 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 48 minutes, 31 seconds)
2025-05-13 15:05:05,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:05:17,337 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1089.33960 ± 613.723
2025-05-13 15:05:17,337 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-721.7521, -1458.3112, -1666.2042, -28.027412, -1005.3864, -1422.038, -1870.3251, -1551.8165, -43.753643, -1125.7817]
2025-05-13 15:05:17,337 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 44.0, 1000.0, 1000.0, 1000.0, 1000.0, 41.0, 1000.0]
2025-05-13 15:05:17,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 43 minutes, 47 seconds)
2025-05-13 15:09:13,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:09:22,183 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -594.68860 ± 467.149
2025-05-13 15:09:22,183 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-12.722003, -746.78204, -31.096273, -1176.5784, -1012.0824, -723.3953, -36.87605, -940.43005, -1152.9586, -113.96461]
2025-05-13 15:09:22,183 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [24.0, 1000.0, 16.0, 1000.0, 1000.0, 1000.0, 20.0, 1000.0, 1000.0, 82.0]
2025-05-13 15:09:22,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 39 minutes, 44 seconds)
2025-05-13 15:13:14,726 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:13:23,219 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -406.34619 ± 488.362
2025-05-13 15:13:23,219 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-56.480003, -60.909943, -30.02555, -308.88608, -927.7321, -318.33337, -337.15732, -49.21417, -314.6636, -1660.0599]
2025-05-13 15:13:23,219 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [32.0, 34.0, 14.0, 1000.0, 1000.0, 1000.0, 1000.0, 40.0, 1000.0, 1000.0]
2025-05-13 15:13:23,231 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 35 minutes, 51 seconds)
2025-05-13 15:17:19,776 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:17:28,378 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -749.86072 ± 687.190
2025-05-13 15:17:28,378 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1586.0012, -57.899853, -18.992002, -24.353682, -37.00982, -1967.9747, -549.93274, -1358.8236, -917.9097, -979.70984]
2025-05-13 15:17:28,378 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 36.0, 13.0, 21.0, 16.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:17:28,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 32 minutes)
2025-05-13 15:21:21,207 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:21:31,381 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -808.84674 ± 578.849
2025-05-13 15:21:31,381 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1418.8116, -1205.0016, -1407.4153, -58.277275, -601.9481, -36.969124, 3.2037258, -1087.2747, -745.68134, -1530.292]
2025-05-13 15:21:31,381 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 55.0, 1000.0, 22.0, 37.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:21:31,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 27 minutes, 59 seconds)
2025-05-13 15:25:10,041 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:25:21,344 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -466.71466 ± 273.426
2025-05-13 15:25:21,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-434.96472, -1089.4679, -145.69106, -432.817, -479.30618, -629.91254, -565.7621, -338.18073, -528.8512, -22.192657]
2025-05-13 15:25:21,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 116.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 16.0]
2025-05-13 15:25:21,359 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 24 minutes, 4 seconds)
2025-05-13 15:29:22,221 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:29:36,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1171.56189 ± 338.691
2025-05-13 15:29:36,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-834.80414, -1282.2513, -1285.4702, -1123.4592, -1591.6653, -933.2884, -756.8055, -863.4095, -1161.6864, -1882.7798]
2025-05-13 15:29:36,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:29:36,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 20 minutes, 14 seconds)
2025-05-13 15:33:30,469 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:33:36,502 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -349.24969 ± 434.159
2025-05-13 15:33:36,502 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-19.727613, -671.6939, -22.337435, -135.5062, -979.5596, -22.183277, -32.3372, -1254.7032, -311.5733, -42.87535]
2025-05-13 15:33:36,502 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [14.0, 1000.0, 14.0, 83.0, 1000.0, 16.0, 17.0, 1000.0, 1000.0, 24.0]
2025-05-13 15:33:36,514 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 16 minutes, 10 seconds)
2025-05-13 15:37:15,207 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:37:25,377 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -790.37292 ± 498.522
2025-05-13 15:37:25,378 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-70.93171, -1052.8927, -1086.9072, -993.2156, -24.546988, -1396.0074, -1007.9373, -1105.6132, -1122.9855, -42.69212]
2025-05-13 15:37:25,378 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [58.0, 1000.0, 1000.0, 1000.0, 13.0, 1000.0, 1000.0, 1000.0, 1000.0, 53.0]
2025-05-13 15:37:25,392 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 11 minutes, 58 seconds)
2025-05-13 15:41:20,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:41:30,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -871.39417 ± 644.096
2025-05-13 15:41:30,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-628.03735, -1819.4924, -32.056713, -10.137744, -990.9573, -908.9041, -1398.879, -56.295654, -1709.8442, -1159.3367]
2025-05-13 15:41:30,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 21.0, 26.0, 1000.0, 1000.0, 1000.0, 38.0, 1000.0, 1000.0]
2025-05-13 15:41:30,389 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 7 minutes, 59 seconds)
2025-05-13 15:45:25,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:45:38,198 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1042.80347 ± 492.198
2025-05-13 15:45:38,198 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1021.6607, -46.284065, -941.4121, -1769.869, -1333.8491, -1242.8705, -1381.6497, -320.54428, -974.5082, -1395.3867]
2025-05-13 15:45:38,198 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 22.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:45:38,209 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 3 seconds)
2025-05-13 15:49:42,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:49:53,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -1034.42847 ± 582.663
2025-05-13 15:49:53,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-41.567734, -1127.2524, -1562.0554, -935.6039, -1821.3871, -951.3574, -1674.4115, -21.77399, -960.6216, -1248.2544]
2025-05-13 15:49:53,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [46.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 14.0, 1000.0, 1000.0]
2025-05-13 15:49:53,567 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1251 [DEBUG]: Training session finished
