2025-09-12 15:25:47,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc15-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay
2025-09-12 15:25:47,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc15-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay
2025-09-12 15:25:47,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x152c8759c410>}
2025-09-12 15:25:47,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1111 [DEBUG]: using device: cuda
2025-09-12 15:25:47,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1133 [INFO]: Creating new trainer
2025-09-12 15:25:47,959 baseline-mbpac-noiseperc15-ant:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-09-12 15:25:47,959 baseline-mbpac-noiseperc15-ant:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 15:25:47,969 baseline-mbpac-noiseperc15-ant:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=27, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=8, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-12 15:25:48,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1194 [DEBUG]: Starting training session...
2025-09-12 15:25:48,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 1/100
2025-09-12 15:37:47,845 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:37:47,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 15:38:51,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: -81.94693 ± 118.551
2025-09-12 15:38:51,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [-1.442845, -26.351463, -45.41114, -57.562622, -36.219784, 11.123043, -48.988747, -123.8977, -69.24016, -421.4779]
2025-09-12 15:38:51,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [10.0, 126.0, 212.0, 80.0, 132.0, 66.0, 85.0, 368.0, 104.0, 1000.0]
2025-09-12 15:38:51,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (-81.95) for latency ExtremeSparseL4U32
2025-09-12 15:38:51,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 21 hours, 32 minutes, 6 seconds)
2025-09-12 15:49:33,625 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:49:33,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 15:50:14,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: -23.08355 ± 36.313
2025-09-12 15:50:14,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [15.246622, -18.757801, -15.009513, -10.251057, 3.1169705, -14.283396, -28.651081, -74.62491, 16.012823, -103.63422]
2025-09-12 15:50:14,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [39.0, 120.0, 129.0, 90.0, 27.0, 239.0, 51.0, 286.0, 53.0, 320.0]
2025-09-12 15:50:14,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (-23.08) for latency ExtremeSparseL4U32
2025-09-12 15:50:14,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 19 hours, 56 minutes, 59 seconds)
2025-09-12 16:01:10,229 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:01:10,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 16:01:38,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: -18.66969 ± 29.966
2025-09-12 16:01:38,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [-75.696815, -55.923496, 9.012071, -13.750166, 7.637864, -8.239201, -55.366707, 10.918382, -5.1418877, -0.14694451]
2025-09-12 16:01:38,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [184.0, 204.0, 16.0, 126.0, 35.0, 83.0, 110.0, 92.0, 69.0, 31.0]
2025-09-12 16:01:38,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (-18.67) for latency ExtremeSparseL4U32
2025-09-12 16:01:39,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 19 hours, 18 minutes, 41 seconds)
2025-09-12 16:12:40,927 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:12:40,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 16:14:19,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: -31.40433 ± 54.820
2025-09-12 16:14:19,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [13.366658, -6.9548326, -74.162704, -147.19363, 10.89996, -7.0595436, -3.396905, 3.5802572, 7.649322, -110.771904]
2025-09-12 16:14:19,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [120.0, 17.0, 1000.0, 1000.0, 53.0, 30.0, 16.0, 70.0, 23.0, 1000.0]
2025-09-12 16:14:19,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 19 hours, 24 minutes, 7 seconds)
2025-09-12 16:26:02,556 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:26:02,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 16:27:28,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: -34.07138 ± 57.053
2025-09-12 16:27:28,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [-56.191544, 13.804307, -65.00187, -65.14608, -0.27269596, -31.953672, -175.30264, 6.8583775, 18.985306, 13.506696]
2025-09-12 16:27:28,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [103.0, 185.0, 1000.0, 374.0, 22.0, 49.0, 1000.0, 114.0, 23.0, 27.0]
2025-09-12 16:27:28,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 19 hours, 31 minutes, 23 seconds)
2025-09-12 16:37:57,995 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:37:58,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 16:38:18,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 4.66639 ± 11.598
2025-09-12 16:38:18,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [22.472761, 0.7660662, 3.4484832, 11.168934, 15.846688, 5.700259, -14.122831, 15.124112, -14.430793, 0.6902105]
2025-09-12 16:38:18,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [77.0, 122.0, 38.0, 30.0, 62.0, 58.0, 115.0, 32.0, 142.0, 15.0]
2025-09-12 16:38:18,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (4.67) for latency ExtremeSparseL4U32
2025-09-12 16:38:18,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 18 hours, 37 minutes, 31 seconds)
2025-09-12 16:48:56,625 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:48:56,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 16:50:03,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: -10.78618 ± 32.560
2025-09-12 16:50:03,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [8.427468, 42.10048, -74.92545, 30.031776, -43.6361, -18.905209, -20.916664, -26.212296, -4.4273415, 0.6015244]
2025-09-12 16:50:03,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [70.0, 79.0, 1000.0, 63.0, 595.0, 114.0, 186.0, 51.0, 30.0, 17.0]
2025-09-12 16:50:03,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 18 hours, 32 minutes, 26 seconds)
2025-09-12 17:01:03,281 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:01:03,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 17:01:51,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 6.53639 ± 23.792
2025-09-12 17:01:51,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [3.3384898, 27.886301, -2.4864275, 26.056658, 37.438313, -7.7675242, -12.193366, 17.727358, 22.219492, -46.85536]
2025-09-12 17:01:51,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [30.0, 32.0, 28.0, 30.0, 71.0, 63.0, 1000.0, 96.0, 43.0, 207.0]
2025-09-12 17:01:51,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (6.54) for latency ExtremeSparseL4U32
2025-09-12 17:01:51,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 18 hours, 27 minutes, 49 seconds)
2025-09-12 17:13:14,447 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:13:14,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 17:15:13,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 33.65939 ± 53.372
2025-09-12 17:15:13,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [26.123718, 31.15946, 16.589045, 2.825658, -75.691666, 63.107204, 30.406977, 64.902336, 28.025112, 149.14604]
2025-09-12 17:15:13,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [28.0, 1000.0, 456.0, 23.0, 216.0, 120.0, 1000.0, 127.0, 49.0, 1000.0]
2025-09-12 17:15:13,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (33.66) for latency ExtremeSparseL4U32
2025-09-12 17:15:13,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 18 hours, 28 minutes, 36 seconds)
2025-09-12 17:25:59,453 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:25:59,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 17:26:54,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 21.81672 ± 42.185
2025-09-12 17:26:54,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [-5.404425, 28.429789, -16.727493, 37.274105, 1.0689988, 138.78981, 0.90328085, 11.757775, -3.1459925, 25.22141]
2025-09-12 17:26:54,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [69.0, 73.0, 159.0, 152.0, 16.0, 1000.0, 18.0, 21.0, 234.0, 101.0]
2025-09-12 17:26:54,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 17 hours, 50 minutes, 2 seconds)
2025-09-12 17:38:20,242 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:38:20,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 17:39:40,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 84.95988 ± 94.851
2025-09-12 17:39:40,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [19.438782, 61.057697, 220.77629, 0.56614476, 30.360422, 308.97733, 64.110344, 13.517259, 59.69657, 71.09789]
2025-09-12 17:39:40,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [59.0, 139.0, 1000.0, 14.0, 51.0, 1000.0, 169.0, 25.0, 67.0, 143.0]
2025-09-12 17:39:40,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (84.96) for latency ExtremeSparseL4U32
2025-09-12 17:39:40,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 18 hours, 12 minutes, 16 seconds)
2025-09-12 17:50:05,412 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:50:05,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 17:50:21,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 22.87447 ± 16.219
2025-09-12 17:50:21,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [56.41301, 43.127693, 5.816927, 0.54737955, 25.31943, 21.072895, 31.871141, 15.457665, 11.1961775, 17.922422]
2025-09-12 17:50:21,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [46.0, 101.0, 19.0, 18.0, 95.0, 63.0, 79.0, 58.0, 22.0, 45.0]
2025-09-12 17:50:21,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 17 hours, 41 minutes, 30 seconds)
2025-09-12 18:01:21,630 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:01:21,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 18:01:55,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 64.31783 ± 41.587
2025-09-12 18:01:55,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [63.367283, 32.71836, 99.74511, 114.80912, 33.082134, 5.276965, 36.02729, 113.91276, 120.3247, 23.91464]
2025-09-12 18:01:55,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [65.0, 62.0, 151.0, 153.0, 101.0, 73.0, 62.0, 131.0, 301.0, 60.0]
2025-09-12 18:01:55,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 17 hours, 25 minutes, 12 seconds)
2025-09-12 18:13:15,752 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:13:15,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 18:14:29,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 70.13329 ± 97.172
2025-09-12 18:14:29,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [35.904537, 4.9432826, -8.995174, 317.18884, 186.05098, 3.0710905, 43.463406, 51.175144, 26.272139, 42.258617]
2025-09-12 18:14:29,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [23.0, 17.0, 40.0, 1000.0, 1000.0, 31.0, 115.0, 48.0, 35.0, 148.0]
2025-09-12 18:14:29,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 16 hours, 59 minutes, 23 seconds)
2025-09-12 18:25:04,450 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:25:04,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 18:26:20,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 96.35458 ± 92.423
2025-09-12 18:26:20,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [1.8527796, 101.69444, 319.0857, 4.1464677, 44.96882, 72.92887, 169.22041, 28.056526, 61.961983, 159.62979]
2025-09-12 18:26:20,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 239.0, 1000.0, 59.0, 123.0, 138.0, 298.0, 117.0, 104.0, 428.0]
2025-09-12 18:26:20,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (96.35) for latency ExtremeSparseL4U32
2025-09-12 18:26:20,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 16 hours, 50 minutes, 24 seconds)
2025-09-12 18:37:30,090 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:37:30,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 18:38:18,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 56.47702 ± 75.602
2025-09-12 18:38:18,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [107.02525, 23.075102, 24.426344, 268.70016, 21.055973, 19.064276, 15.811725, 39.24185, 40.485947, 5.883548]
2025-09-12 18:38:18,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [134.0, 27.0, 37.0, 1000.0, 52.0, 59.0, 40.0, 88.0, 142.0, 22.0]
2025-09-12 18:38:18,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 16 hours, 25 minutes, 11 seconds)
2025-09-12 18:49:24,255 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:49:24,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 18:49:57,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 60.12824 ± 67.108
2025-09-12 18:49:57,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [26.574615, 57.224136, 29.260452, 100.713486, 11.820836, 248.76367, 41.506294, 24.482258, 32.269318, 28.66729]
2025-09-12 18:49:57,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [25.0, 152.0, 38.0, 228.0, 52.0, 286.0, 53.0, 49.0, 75.0, 178.0]
2025-09-12 18:49:57,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 16 hours, 29 minutes, 21 seconds)
2025-09-12 19:01:36,274 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:01:36,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 19:02:15,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 53.03935 ± 42.788
2025-09-12 19:02:15,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [6.977336, 20.254023, 33.444725, 59.7696, 67.90043, 159.17932, 1.1953562, 60.661797, 47.496765, 73.51414]
2025-09-12 19:02:15,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [19.0, 26.0, 49.0, 132.0, 131.0, 327.0, 212.0, 200.0, 111.0, 105.0]
2025-09-12 19:02:15,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 16 hours, 29 minutes, 22 seconds)
2025-09-12 19:12:47,537 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:12:47,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 19:13:31,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 80.84969 ± 61.592
2025-09-12 19:13:31,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [29.712873, 40.191845, 110.67793, 87.39053, 97.715164, 222.22481, 45.757565, 12.405215, 138.42577, 23.995117]
2025-09-12 19:13:31,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [75.0, 85.0, 162.0, 145.0, 124.0, 348.0, 105.0, 20.0, 303.0, 68.0]
2025-09-12 19:13:31,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 15 hours, 56 minutes, 10 seconds)
2025-09-12 19:24:19,825 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:24:19,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 19:26:06,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 151.74432 ± 105.401
2025-09-12 19:26:06,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [91.38086, 168.3112, 348.9233, 79.487335, 40.18953, 311.29233, 143.04446, 53.80674, 53.169865, 227.83751]
2025-09-12 19:26:06,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [99.0, 249.0, 1000.0, 112.0, 196.0, 1000.0, 336.0, 114.0, 80.0, 391.0]
2025-09-12 19:26:06,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (151.74) for latency ExtremeSparseL4U32
2025-09-12 19:26:06,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 15 hours, 56 minutes, 10 seconds)
2025-09-12 19:37:56,500 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:37:56,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 19:39:49,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 180.39854 ± 163.335
2025-09-12 19:39:49,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [10.291916, 444.0633, 17.13108, 176.89197, 99.86964, 380.5039, 409.81195, 115.8647, -14.735253, 164.29224]
2025-09-12 19:39:49,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [39.0, 701.0, 26.0, 312.0, 186.0, 1000.0, 1000.0, 177.0, 104.0, 160.0]
2025-09-12 19:39:49,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (180.40) for latency ExtremeSparseL4U32
2025-09-12 19:39:49,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 16 hours, 11 minutes, 53 seconds)
2025-09-12 19:50:01,591 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:50:01,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 19:51:43,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 177.47624 ± 132.647
2025-09-12 19:51:43,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [8.07532, 133.04951, 402.5167, 146.1913, 242.29372, 251.62943, 387.58575, 112.83231, 26.708715, 63.879646]
2025-09-12 19:51:43,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [27.0, 187.0, 1000.0, 125.0, 388.0, 513.0, 843.0, 136.0, 31.0, 106.0]
2025-09-12 19:51:43,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 16 hours, 3 minutes, 31 seconds)
2025-09-12 20:03:27,810 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:03:27,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:04:01,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 75.03935 ± 52.865
2025-09-12 20:04:01,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [13.13476, 38.342064, 26.259298, 90.13871, 28.424528, 132.05351, 84.09399, 63.32569, 195.45995, 79.16107]
2025-09-12 20:04:01,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [45.0, 24.0, 38.0, 137.0, 73.0, 195.0, 133.0, 136.0, 243.0, 98.0]
2025-09-12 20:04:01,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 15 hours, 51 minutes, 17 seconds)
2025-09-12 20:14:17,015 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:14:17,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:15:54,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 151.83379 ± 132.924
2025-09-12 20:15:54,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [85.29196, 357.23746, 46.653587, 313.11234, 58.049183, 42.948006, 375.7008, 29.397354, 142.69075, 67.2565]
2025-09-12 20:15:54,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [83.0, 590.0, 65.0, 1000.0, 128.0, 54.0, 1000.0, 45.0, 177.0, 58.0]
2025-09-12 20:15:54,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 15 hours, 48 minutes, 19 seconds)
2025-09-12 20:27:16,884 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:27:16,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:29:02,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 190.40793 ± 129.855
2025-09-12 20:29:02,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [444.7622, 45.172783, 25.39797, 171.64484, 304.19647, 99.16801, 97.24346, 344.5194, 213.47256, 158.50168]
2025-09-12 20:29:02,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [482.0, 48.0, 47.0, 182.0, 1000.0, 114.0, 150.0, 1000.0, 321.0, 189.0]
2025-09-12 20:29:02,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (190.41) for latency ExtremeSparseL4U32
2025-09-12 20:29:02,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 15 hours, 44 minutes, 3 seconds)
2025-09-12 20:39:42,472 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:39:42,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:40:39,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 115.54417 ± 103.729
2025-09-12 20:40:39,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [72.50688, 12.609193, 1.0061848, 75.24665, 95.59157, 129.8579, 23.742914, 199.70067, 359.0975, 186.08217]
2025-09-12 20:40:39,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [85.0, 38.0, 19.0, 85.0, 124.0, 121.0, 64.0, 266.0, 1000.0, 133.0]
2025-09-12 20:40:39,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 15 hours, 20 seconds)
2025-09-12 20:52:05,939 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:52:05,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:54:43,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 244.84468 ± 190.262
2025-09-12 20:54:43,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [342.23032, 454.7167, 82.9594, 10.01933, 9.71317, 70.2472, 434.9612, 160.15758, 542.9869, 340.455]
2025-09-12 20:54:43,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 124.0, 16.0, 43.0, 81.0, 1000.0, 320.0, 697.0, 1000.0]
2025-09-12 20:54:43,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (244.84) for latency ExtremeSparseL4U32
2025-09-12 20:54:43,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 15 hours, 19 minutes, 46 seconds)
2025-09-12 21:05:24,787 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:05:24,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:06:51,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 138.76453 ± 131.403
2025-09-12 21:06:51,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [130.01009, 35.789673, 36.35009, 52.695835, 69.05362, 395.6076, 97.68778, 261.14032, 322.12805, -12.817728]
2025-09-12 21:06:51,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [156.0, 86.0, 59.0, 115.0, 99.0, 1000.0, 147.0, 1000.0, 313.0, 20.0]
2025-09-12 21:06:51,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 15 hours, 4 minutes, 44 seconds)
2025-09-12 21:18:03,532 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:18:03,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:19:57,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 209.37935 ± 108.162
2025-09-12 21:19:57,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [212.431, 58.128254, 388.14743, 200.2202, 94.380775, 109.28401, 348.7623, 216.65773, 137.41371, 328.36816]
2025-09-12 21:19:57,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [304.0, 104.0, 423.0, 270.0, 86.0, 188.0, 1000.0, 326.0, 177.0, 1000.0]
2025-09-12 21:19:57,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 15 hours, 9 minutes, 33 seconds)
2025-09-12 21:29:54,285 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:29:54,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:31:24,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 198.61423 ± 244.379
2025-09-12 21:31:24,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [-1.2910208, 45.683403, 31.25778, 39.406723, 634.8457, 137.17076, 39.339645, 679.0598, 313.00687, 67.662636]
2025-09-12 21:31:24,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [16.0, 59.0, 51.0, 69.0, 657.0, 121.0, 75.0, 1000.0, 1000.0, 73.0]
2025-09-12 21:31:24,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 14 hours, 33 minutes, 1 second)
2025-09-12 21:42:29,139 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:42:29,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:43:55,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 164.23141 ± 135.393
2025-09-12 21:43:55,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [178.64037, 331.2193, 67.87677, 481.53836, 107.67517, 208.83878, 102.58771, 43.968555, 65.67889, 54.290108]
2025-09-12 21:43:55,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [191.0, 1000.0, 124.0, 1000.0, 77.0, 283.0, 114.0, 54.0, 66.0, 57.0]
2025-09-12 21:43:55,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 14 hours, 33 minutes, 8 seconds)
2025-09-12 21:54:30,137 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:54:30,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:56:34,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 232.19263 ± 149.044
2025-09-12 21:56:34,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [80.3024, 333.74445, 47.35524, 315.488, 89.054886, 402.91296, 18.069376, 439.86542, 313.30276, 281.83078]
2025-09-12 21:56:34,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [66.0, 367.0, 52.0, 1000.0, 92.0, 1000.0, 38.0, 1000.0, 381.0, 260.0]
2025-09-12 21:56:34,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 14 hours, 1 minute, 10 seconds)
2025-09-12 22:07:11,661 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:07:11,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:08:47,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 224.69527 ± 153.573
2025-09-12 22:08:47,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [288.8984, 246.11034, 228.82703, 616.556, 169.8569, 87.96393, 148.72112, 292.30432, 29.18503, 138.5292]
2025-09-12 22:08:47,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 254.0, 233.0, 691.0, 164.0, 116.0, 222.0, 331.0, 69.0, 165.0]
2025-09-12 22:08:47,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 13 hours, 49 minutes, 55 seconds)
2025-09-12 22:20:16,485 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:20:16,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:23:57,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 278.20786 ± 133.901
2025-09-12 22:23:57,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [100.80198, 372.7632, 28.285719, 277.03125, 213.91008, 491.95297, 267.53278, 438.85126, 289.13217, 301.8175]
2025-09-12 22:23:57,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [84.0, 1000.0, 66.0, 1000.0, 302.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:23:57,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (278.21) for latency ExtremeSparseL4U32
2025-09-12 22:23:57,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 14 hours, 4 minutes, 37 seconds)
2025-09-12 22:34:16,145 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:34:16,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:36:32,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 219.02963 ± 169.939
2025-09-12 22:36:32,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [495.95407, 223.67595, 15.4135895, 70.11972, 384.98297, 316.0547, 7.6950674, 354.0612, 16.421055, 305.91812]
2025-09-12 22:36:32,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 216.0, 14.0, 60.0, 1000.0, 1000.0, 18.0, 1000.0, 40.0, 319.0]
2025-09-12 22:36:32,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 14 hours, 6 minutes, 47 seconds)
2025-09-12 22:48:08,929 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:48:08,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:49:56,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 243.13174 ± 193.307
2025-09-12 22:49:56,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [24.822893, 135.74455, 54.271305, 63.55505, 336.1885, 225.42943, 469.84546, 636.82025, 117.41151, 367.22815]
2025-09-12 22:49:56,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [45.0, 176.0, 61.0, 103.0, 1000.0, 233.0, 1000.0, 625.0, 154.0, 285.0]
2025-09-12 22:49:56,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 14 hours, 4 minutes, 58 seconds)
2025-09-12 22:59:53,456 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:59:53,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:01:53,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 215.39905 ± 143.347
2025-09-12 23:01:53,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [496.92993, 70.80484, 183.13918, 115.12756, 308.6387, 143.89867, 136.24805, 12.200864, 338.97058, 348.03204]
2025-09-12 23:01:53,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 61.0, 299.0, 137.0, 326.0, 159.0, 171.0, 15.0, 1000.0, 1000.0]
2025-09-12 23:01:53,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 13 hours, 42 minutes, 58 seconds)
2025-09-12 23:13:19,636 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:13:19,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:15:37,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 240.03220 ± 145.895
2025-09-12 23:15:37,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [336.22488, 512.3065, 315.80957, 126.07994, 245.5008, 375.89874, 230.72746, 204.00804, 26.644335, 27.121748]
2025-09-12 23:15:37,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 783.0, 1000.0, 148.0, 251.0, 1000.0, 215.0, 197.0, 45.0, 25.0]
2025-09-12 23:15:37,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 13 hours, 48 minutes, 48 seconds)
2025-09-12 23:26:22,831 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:26:22,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:28:09,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 256.20242 ± 157.361
2025-09-12 23:28:09,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [27.223244, 243.96721, 169.36684, 202.53813, 313.78656, 318.47183, 151.55536, 545.23334, 494.8654, 95.01604]
2025-09-12 23:28:09,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [29.0, 1000.0, 159.0, 199.0, 309.0, 225.0, 155.0, 483.0, 1000.0, 109.0]
2025-09-12 23:28:09,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 13 hours, 3 minutes, 19 seconds)
2025-09-12 23:39:11,800 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:39:11,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:41:20,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 277.86877 ± 256.046
2025-09-12 23:41:20,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [80.85872, 127.27371, 166.47676, 19.09745, 865.33746, 439.00613, 542.2822, 6.875382, 266.81567, 264.664]
2025-09-12 23:41:20,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [145.0, 113.0, 286.0, 55.0, 1000.0, 1000.0, 497.0, 47.0, 1000.0, 303.0]
2025-09-12 23:41:20,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 12 hours, 57 minutes, 37 seconds)
2025-09-12 23:51:16,124 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:51:16,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:53:09,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 167.36447 ± 123.899
2025-09-12 23:53:09,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [373.04236, 348.7162, 95.47453, 53.65268, 121.60776, 14.704892, 112.43232, 241.75037, 270.18848, 42.07518]
2025-09-12 23:53:09,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 160.0, 57.0, 154.0, 29.0, 183.0, 1000.0, 283.0, 57.0]
2025-09-12 23:53:09,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 12 hours, 25 minutes, 59 seconds)
2025-09-13 00:04:05,332 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:04:05,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:06:15,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 274.94846 ± 184.617
2025-09-13 00:06:15,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [124.489494, 667.0499, 179.50877, 373.76447, 436.60385, 68.40178, 232.21167, 16.192879, 363.49005, 287.77167]
2025-09-13 00:06:15,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [154.0, 1000.0, 247.0, 1000.0, 356.0, 172.0, 238.0, 16.0, 1000.0, 293.0]
2025-09-13 00:06:15,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 12 hours, 26 minutes, 34 seconds)
2025-09-13 00:17:03,876 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:17:03,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:19:47,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 263.66125 ± 201.639
2025-09-13 00:19:47,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [652.555, 45.369667, 72.49602, 475.50363, 90.6356, 389.44315, 66.1323, 121.28088, 329.65244, 393.544]
2025-09-13 00:19:47,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 82.0, 68.0, 1000.0, 108.0, 1000.0, 204.0, 160.0, 1000.0, 1000.0]
2025-09-13 00:19:47,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 12 hours, 11 minutes, 23 seconds)
2025-09-13 00:30:20,173 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:30:20,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:32:42,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 295.02252 ± 220.198
2025-09-13 00:32:42,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [31.271152, 239.34645, 424.8164, 15.128282, 130.56851, 533.24347, 71.48012, 467.71155, 682.18854, 354.47073]
2025-09-13 00:32:42,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 192.0, 1000.0, 14.0, 168.0, 1000.0, 74.0, 1000.0, 1000.0, 345.0]
2025-09-13 00:32:42,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (295.02) for latency ExtremeSparseL4U32
2025-09-13 00:32:42,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 12 hours, 2 minutes, 57 seconds)
2025-09-13 00:44:02,271 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:44:02,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:46:38,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 306.69025 ± 172.874
2025-09-13 00:46:38,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [151.02292, 590.71045, 147.45844, 526.78503, 370.67432, 104.9888, 396.66245, 94.75803, 441.63748, 242.20445]
2025-09-13 00:46:38,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [206.0, 1000.0, 124.0, 1000.0, 376.0, 72.0, 1000.0, 112.0, 1000.0, 388.0]
2025-09-13 00:46:38,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (306.69) for latency ExtremeSparseL4U32
2025-09-13 00:46:38,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 11 hours, 58 minutes, 11 seconds)
2025-09-13 00:56:57,546 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:56:57,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:59:44,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 312.84067 ± 230.488
2025-09-13 00:59:44,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [53.281307, 593.6216, 448.14938, 308.43396, 280.91153, 33.495865, 115.64594, 790.5225, 333.50348, 170.8411]
2025-09-13 00:59:44,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [116.0, 1000.0, 1000.0, 286.0, 1000.0, 85.0, 121.0, 802.0, 1000.0, 238.0]
2025-09-13 00:59:44,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (312.84) for latency ExtremeSparseL4U32
2025-09-13 00:59:44,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 11 hours, 59 minutes, 5 seconds)
2025-09-13 01:10:32,316 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:10:32,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:12:06,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 195.61755 ± 146.226
2025-09-13 01:12:06,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [254.27005, 89.50827, 129.23288, 1.70333, 101.04784, 497.4359, 86.181946, 386.84302, 140.34647, 269.60568]
2025-09-13 01:12:06,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 122.0, 122.0, 14.0, 100.0, 413.0, 71.0, 1000.0, 167.0, 239.0]
2025-09-13 01:12:06,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 11 hours, 38 minutes, 3 seconds)
2025-09-13 01:23:34,719 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:23:34,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:24:57,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 169.47641 ± 144.753
2025-09-13 01:24:57,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [211.09146, 321.89185, 263.81827, 237.57431, 74.04721, 458.39322, 60.427773, 3.2923443, 7.5279307, 56.699795]
2025-09-13 01:24:57,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [118.0, 1000.0, 210.0, 237.0, 70.0, 1000.0, 79.0, 15.0, 16.0, 54.0]
2025-09-13 01:24:57,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 11 hours, 17 minutes, 43 seconds)
2025-09-13 01:35:11,598 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:35:11,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:37:41,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 218.68474 ± 124.748
2025-09-13 01:37:41,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [378.23773, 270.92484, 189.39659, 261.8781, 414.42773, 50.09332, 248.002, 36.172184, 71.80904, 265.90567]
2025-09-13 01:37:41,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [295.0, 281.0, 167.0, 1000.0, 1000.0, 62.0, 1000.0, 39.0, 145.0, 1000.0]
2025-09-13 01:37:41,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 11 hours, 2 minutes, 48 seconds)
2025-09-13 01:48:23,172 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:48:23,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:50:46,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 235.75916 ± 133.290
2025-09-13 01:50:46,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [91.352776, 458.5188, 126.14876, 336.00668, 73.7241, 376.81357, 259.86224, 285.26593, 294.4168, 55.48203]
2025-09-13 01:50:46,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 340.0, 163.0, 1000.0, 76.0, 1000.0, 1000.0, 290.0, 1000.0, 59.0]
2025-09-13 01:50:46,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 10 hours, 41 minutes, 21 seconds)
2025-09-13 02:02:16,348 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:02:16,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:05:45,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 400.89539 ± 272.992
2025-09-13 02:05:45,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [261.77686, 325.90002, 422.84644, 716.4258, 178.44656, 481.40106, 294.9453, 110.78017, 1045.7706, 170.66095]
2025-09-13 02:05:45,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 687.0, 153.0, 1000.0, 1000.0, 69.0, 1000.0, 236.0]
2025-09-13 02:05:45,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (400.90) for latency ExtremeSparseL4U32
2025-09-13 02:05:46,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 10 hours, 46 minutes, 59 seconds)
2025-09-13 02:16:41,897 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:16:41,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:19:54,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 356.20825 ± 297.954
2025-09-13 02:19:54,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [262.0834, 166.69522, 276.02768, 32.616867, 402.7627, -6.4759865, 341.99783, 352.7717, 671.1993, 1062.4038]
2025-09-13 02:19:54,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 249.0, 239.0, 36.0, 1000.0, 16.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:19:54,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 10 hours, 50 minutes, 54 seconds)
2025-09-13 02:30:35,639 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:30:35,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:32:16,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 240.42984 ± 171.328
2025-09-13 02:32:16,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [238.90137, 456.09988, 172.81581, 97.23233, 563.555, 100.94716, 95.86003, 439.6222, 55.168648, 184.09604]
2025-09-13 02:32:16,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [220.0, 1000.0, 137.0, 107.0, 517.0, 154.0, 66.0, 1000.0, 77.0, 147.0]
2025-09-13 02:32:16,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 10 hours, 32 minutes, 45 seconds)
2025-09-13 02:42:49,971 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:42:49,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:45:20,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 201.76640 ± 173.827
2025-09-13 02:45:21,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [81.547966, 268.60843, 296.0275, 74.17056, 34.641666, 313.7941, 497.8271, -2.3368647, 435.3056, 18.07793]
2025-09-13 02:45:21,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [73.0, 1000.0, 1000.0, 51.0, 30.0, 1000.0, 1000.0, 13.0, 1000.0, 16.0]
2025-09-13 02:45:21,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 10 hours, 22 minutes, 27 seconds)
2025-09-13 02:55:56,991 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:55:57,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:57:55,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 232.31267 ± 151.980
2025-09-13 02:57:55,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [303.65332, 281.59082, 93.332275, 55.789078, 212.6283, 331.96686, 63.249664, 551.9561, 89.42448, 339.53577]
2025-09-13 02:57:55,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [337.0, 290.0, 78.0, 41.0, 144.0, 1000.0, 47.0, 1000.0, 132.0, 1000.0]
2025-09-13 02:57:55,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 10 hours, 4 minutes, 22 seconds)
2025-09-13 03:09:25,336 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:09:25,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:11:38,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 275.93942 ± 169.817
2025-09-13 03:11:38,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [644.822, 339.96466, 356.757, 386.31226, 132.98586, 287.03925, 44.325157, 239.01245, 43.874462, 284.30093]
2025-09-13 03:11:38,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [562.0, 302.0, 1000.0, 1000.0, 102.0, 228.0, 90.0, 234.0, 52.0, 1000.0]
2025-09-13 03:11:38,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 9 hours, 39 minutes, 45 seconds)
2025-09-13 03:22:18,201 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:22:18,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:25:33,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 260.66370 ± 120.032
2025-09-13 03:25:33,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [318.08795, 162.12636, 342.30502, 300.50226, 529.01855, 162.49261, 230.10025, 60.670406, 246.48088, 254.85275]
2025-09-13 03:25:33,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 203.0, 257.0, 1000.0, 1000.0, 188.0, 1000.0, 38.0, 1000.0, 1000.0]
2025-09-13 03:25:33,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 9 hours, 24 minutes, 31 seconds)
2025-09-13 03:36:03,046 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:36:03,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:38:27,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 283.48206 ± 184.564
2025-09-13 03:38:27,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [166.17259, 203.91785, 297.63586, 93.40504, 367.35648, 633.6194, 79.433556, 65.08764, 461.57666, 466.61554]
2025-09-13 03:38:27,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [133.0, 221.0, 1000.0, 68.0, 383.0, 1000.0, 78.0, 47.0, 1000.0, 1000.0]
2025-09-13 03:38:27,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 9 hours, 16 minutes, 3 seconds)
2025-09-13 03:48:51,161 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:48:51,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:51:18,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 396.57672 ± 211.408
2025-09-13 03:51:18,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [442.97958, 799.6575, 286.4498, 340.68988, 173.01083, 95.60657, 259.7109, 335.4552, 554.2981, 677.9086]
2025-09-13 03:51:18,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [327.0, 1000.0, 334.0, 306.0, 197.0, 67.0, 177.0, 1000.0, 579.0, 1000.0]
2025-09-13 03:51:18,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 9 hours, 52 seconds)
2025-09-13 04:02:14,003 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:02:14,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:05:17,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 284.60281 ± 194.785
2025-09-13 04:05:17,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [81.594376, 616.4817, 281.65457, 459.28375, 371.47372, 479.30078, 7.034625, 150.08595, 345.29016, 53.828415]
2025-09-13 04:05:17,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [63.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 54.0, 163.0, 1000.0, 57.0]
2025-09-13 04:05:17,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 8 hours, 58 minutes, 53 seconds)
2025-09-13 04:16:51,412 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:16:51,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:19:49,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 388.94312 ± 216.358
2025-09-13 04:19:49,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [165.08322, 427.97275, 414.9442, 844.74896, 536.0154, 26.444191, 475.60147, 467.5592, 339.18427, 191.87756]
2025-09-13 04:19:49,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [156.0, 1000.0, 1000.0, 1000.0, 1000.0, 36.0, 399.0, 327.0, 1000.0, 164.0]
2025-09-13 04:19:49,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 51 minutes, 44 seconds)
2025-09-13 04:29:50,010 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:29:50,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:31:58,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 266.34030 ± 200.595
2025-09-13 04:31:58,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [565.01654, 95.93476, 272.79703, 31.855268, 262.65543, 442.01547, 594.7296, 51.16122, 289.59982, 57.638046]
2025-09-13 04:31:58,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [534.0, 78.0, 1000.0, 35.0, 218.0, 385.0, 1000.0, 121.0, 1000.0, 78.0]
2025-09-13 04:31:58,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 8 hours, 24 minutes, 51 seconds)
2025-09-13 04:43:44,412 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:43:44,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:45:24,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 197.28908 ± 119.157
2025-09-13 04:45:24,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [384.27045, 261.3157, 153.34283, 303.692, 32.484222, 321.92725, 263.43384, 32.20455, 101.599075, 118.62078]
2025-09-13 04:45:24,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [373.0, 1000.0, 163.0, 229.0, 41.0, 362.0, 1000.0, 45.0, 111.0, 86.0]
2025-09-13 04:45:24,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 8 hours, 15 minutes, 21 seconds)
2025-09-13 04:55:53,773 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:55:53,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:57:42,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 209.51926 ± 163.634
2025-09-13 04:57:42,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [40.57561, 107.93142, 305.8861, 340.7667, 74.20487, 326.98633, 31.967966, 225.43044, 559.2245, 82.21858]
2025-09-13 04:57:42,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [24.0, 90.0, 243.0, 1000.0, 101.0, 1000.0, 34.0, 190.0, 1000.0, 72.0]
2025-09-13 04:57:42,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 58 minutes, 7 seconds)
2025-09-13 05:08:16,695 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:08:16,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:11:20,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 331.74432 ± 243.747
2025-09-13 05:11:20,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [258.607, 61.99081, 284.55997, 607.5774, 430.3783, 42.3921, 834.1338, 136.30597, 494.53888, 166.95892]
2025-09-13 05:11:20,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 57.0, 1000.0, 1000.0, 1000.0, 45.0, 1000.0, 102.0, 1000.0, 123.0]
2025-09-13 05:11:20,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 42 minutes, 22 seconds)
2025-09-13 05:22:25,199 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:22:25,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:24:37,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 298.30566 ± 147.439
2025-09-13 05:24:37,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [88.743225, 633.1014, 114.56563, 234.33212, 243.96223, 281.4127, 360.9049, 270.3727, 416.16245, 339.49927]
2025-09-13 05:24:37,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [63.0, 487.0, 101.0, 167.0, 227.0, 1000.0, 1000.0, 254.0, 1000.0, 257.0]
2025-09-13 05:24:37,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 20 minutes, 39 seconds)
2025-09-13 05:34:46,944 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:34:46,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:36:20,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 226.10913 ± 267.522
2025-09-13 05:36:20,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [-1.2000061, 833.35614, 30.862085, 117.04726, 274.12375, 259.8209, 76.80673, 31.869946, 605.95215, 32.452282]
2025-09-13 05:36:20,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [13.0, 669.0, 26.0, 55.0, 1000.0, 237.0, 97.0, 25.0, 1000.0, 49.0]
2025-09-13 05:36:20,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 7 hours, 4 minutes, 46 seconds)
2025-09-13 05:47:52,659 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:47:52,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:51:56,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 354.57861 ± 180.884
2025-09-13 05:51:56,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [271.60028, 628.6035, 754.8073, 338.24747, 156.70372, 253.59633, 344.61493, 344.66843, 261.07022, 191.87405]
2025-09-13 05:51:56,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 107.0, 1000.0, 1000.0, 1000.0, 1000.0, 140.0]
2025-09-13 05:51:56,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 7 hours, 5 minutes, 49 seconds)
2025-09-13 06:02:41,191 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:02:41,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:04:24,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 273.76849 ± 155.139
2025-09-13 06:04:24,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [345.82523, 156.32571, 289.31537, 466.22684, 250.61253, 65.24602, 39.891182, 423.82718, 189.74643, 510.6688]
2025-09-13 06:04:24,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [229.0, 109.0, 1000.0, 377.0, 189.0, 90.0, 45.0, 1000.0, 140.0, 351.0]
2025-09-13 06:04:24,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 53 minutes, 28 seconds)
2025-09-13 06:14:46,109 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:14:46,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:17:59,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 388.21997 ± 231.116
2025-09-13 06:17:59,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [298.42075, 594.9753, 599.07587, 85.46733, 573.6715, 130.50772, 136.11995, 235.20438, 445.3461, 783.4106]
2025-09-13 06:17:59,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 138.0, 1000.0, 111.0, 150.0, 199.0, 1000.0, 1000.0]
2025-09-13 06:17:59,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 39 minutes, 52 seconds)
2025-09-13 06:28:43,755 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:28:43,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:30:42,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 322.71735 ± 156.747
2025-09-13 06:30:42,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [304.14178, 386.25873, 317.4425, 404.385, 686.2291, 334.87543, 78.17116, 353.44745, 208.2759, 153.94647]
2025-09-13 06:30:42,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 362.0, 330.0, 355.0, 526.0, 1000.0, 75.0, 248.0, 148.0, 99.0]
2025-09-13 06:30:42,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 23 minutes, 20 seconds)
2025-09-13 06:41:21,757 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:41:21,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:43:43,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 297.57361 ± 238.490
2025-09-13 06:43:43,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [519.2152, 600.5136, 257.32797, 461.66397, 43.52009, 53.660633, -2.191418, 53.32386, 338.894, 649.80817]
2025-09-13 06:43:43,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 516.0, 1000.0, 1000.0, 31.0, 32.0, 14.0, 51.0, 265.0, 1000.0]
2025-09-13 06:43:43,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 17 minutes, 20 seconds)
2025-09-13 06:54:30,054 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:54:30,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:55:50,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 234.77173 ± 158.843
2025-09-13 06:55:50,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [87.24644, 145.2771, 108.41121, 342.7676, 473.00336, 447.58145, 4.4422145, 252.82086, 383.3538, 102.8132]
2025-09-13 06:55:50,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [81.0, 97.0, 82.0, 1000.0, 362.0, 461.0, 16.0, 229.0, 344.0, 92.0]
2025-09-13 06:55:50,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 45 minutes, 1 second)
2025-09-13 07:07:21,643 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:07:21,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:09:29,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 307.66492 ± 194.034
2025-09-13 07:09:29,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [415.19986, 60.162056, 213.67804, 282.68417, 355.09497, 757.92065, 36.71706, 254.15263, 418.64682, 282.39313]
2025-09-13 07:09:29,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 42.0, 200.0, 217.0, 385.0, 1000.0, 44.0, 187.0, 1000.0, 276.0]
2025-09-13 07:09:29,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 38 minutes, 28 seconds)
2025-09-13 07:19:36,333 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:19:36,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:21:30,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 316.53766 ± 308.223
2025-09-13 07:21:30,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [9.350408, 130.23602, 676.9343, 97.94596, 70.88961, 155.70499, 997.7789, 584.01105, 180.77888, 261.74625]
2025-09-13 07:21:30,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [15.0, 134.0, 470.0, 71.0, 69.0, 179.0, 816.0, 1000.0, 173.0, 1000.0]
2025-09-13 07:21:30,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 17 minutes, 37 seconds)
2025-09-13 07:33:03,205 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:33:03,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:35:09,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 257.82010 ± 157.930
2025-09-13 07:35:09,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [49.512432, 11.718949, 346.96198, 289.824, 429.04248, 282.5575, 61.950775, 457.60773, 422.9681, 226.0572]
2025-09-13 07:35:09,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 16.0, 316.0, 1000.0, 255.0, 232.0, 60.0, 1000.0, 398.0, 1000.0]
2025-09-13 07:35:09,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 9 minutes, 18 seconds)
2025-09-13 07:45:13,702 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:45:13,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:47:22,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 265.45047 ± 149.189
2025-09-13 07:47:22,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [349.0171, 158.92935, 187.83768, 435.67154, 481.61847, 387.6433, 337.5038, 21.919386, 234.10506, 60.259052]
2025-09-13 07:47:22,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 141.0, 209.0, 503.0, 1000.0, 1000.0, 296.0, 24.0, 157.0, 63.0]
2025-09-13 07:47:22,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 52 minutes, 47 seconds)
2025-09-13 07:58:14,993 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:58:15,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:59:46,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 215.50713 ± 163.281
2025-09-13 07:59:46,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [469.84448, 321.06546, 122.074875, 84.79936, 534.95526, 132.24008, 243.0841, 113.962234, 50.46016, 82.58526]
2025-09-13 07:59:46,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [303.0, 1000.0, 102.0, 82.0, 1000.0, 97.0, 196.0, 151.0, 72.0, 98.0]
2025-09-13 07:59:46,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 41 minutes, 21 seconds)
2025-09-13 08:10:48,639 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:10:48,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:12:59,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 268.76035 ± 283.559
2025-09-13 08:12:59,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [684.24786, 276.48956, 43.004192, 277.8544, 59.362587, 182.0078, 73.13453, 916.9942, 135.10724, 39.401062]
2025-09-13 08:12:59,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 212.0, 30.0, 1000.0, 49.0, 1000.0, 57.0, 1000.0, 88.0, 41.0]
2025-09-13 08:12:59,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 26 minutes, 39 seconds)
2025-09-13 08:23:58,567 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:23:58,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:26:03,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 355.30615 ± 229.218
2025-09-13 08:26:03,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [185.0243, 232.68842, 55.00391, 54.94775, 274.59122, 595.948, 663.8554, 358.6001, 728.05164, 404.35114]
2025-09-13 08:26:03,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [152.0, 169.0, 40.0, 31.0, 1000.0, 570.0, 1000.0, 371.0, 572.0, 349.0]
2025-09-13 08:26:03,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 18 minutes, 10 seconds)
2025-09-13 08:37:14,824 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:37:14,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:39:14,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 292.15375 ± 236.018
2025-09-13 08:39:14,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [394.8789, 19.920488, 308.33755, 840.4219, 489.11966, 388.42154, 88.105705, 131.54066, 173.37155, 87.41936]
2025-09-13 08:39:14,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [263.0, 31.0, 1000.0, 880.0, 359.0, 1000.0, 60.0, 115.0, 179.0, 174.0]
2025-09-13 08:39:14,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 3 minutes, 30 seconds)
2025-09-13 08:49:15,361 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:49:15,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:51:36,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 234.67683 ± 141.800
2025-09-13 08:51:36,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [570.0472, 129.31831, 197.27597, 214.93996, 215.4073, 169.78708, 26.145245, 397.97153, 208.8256, 217.05013]
2025-09-13 08:51:36,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 116.0, 207.0, 1000.0, 150.0, 137.0, 25.0, 1000.0, 1000.0, 217.0]
2025-09-13 08:51:36,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 51 minutes, 14 seconds)
2025-09-13 09:02:49,607 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:02:49,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:03:37,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 198.81157 ± 121.939
2025-09-13 09:03:37,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [39.027397, 377.58466, 184.78258, 218.00888, 35.0217, 142.26907, 103.438225, 330.16574, 175.39853, 382.41882]
2025-09-13 09:03:37,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [28.0, 269.0, 134.0, 137.0, 81.0, 140.0, 86.0, 276.0, 251.0, 252.0]
2025-09-13 09:03:37,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 37 minutes, 5 seconds)
2025-09-13 09:14:15,430 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:14:15,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:16:26,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 297.08984 ± 184.865
2025-09-13 09:16:26,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [642.84247, 87.55503, 303.64752, 196.21912, 262.03815, 177.71283, 623.2505, 323.3358, 85.603355, 268.69354]
2025-09-13 09:16:26,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [490.0, 84.0, 1000.0, 139.0, 221.0, 187.0, 1000.0, 289.0, 74.0, 1000.0]
2025-09-13 09:16:26,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 23 minutes, 2 seconds)
2025-09-13 09:27:49,592 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:27:49,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:30:05,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 232.77869 ± 164.179
2025-09-13 09:30:05,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [213.85243, 35.26927, 595.7902, 345.96738, 229.32184, 267.03854, 296.98907, 10.215805, 271.56784, 61.77431]
2025-09-13 09:30:05,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [208.0, 28.0, 1000.0, 215.0, 166.0, 1000.0, 1000.0, 25.0, 1000.0, 60.0]
2025-09-13 09:30:05,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 12 minutes, 7 seconds)
2025-09-13 09:40:04,120 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:40:04,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:42:46,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 375.17340 ± 100.569
2025-09-13 09:42:46,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [391.73624, 173.55266, 333.61508, 399.12424, 381.5415, 311.12363, 551.68524, 421.16547, 294.6361, 493.55374]
2025-09-13 09:42:46,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [300.0, 125.0, 1000.0, 293.0, 1000.0, 261.0, 1000.0, 260.0, 1000.0, 340.0]
2025-09-13 09:42:46,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 57 minutes, 53 seconds)
2025-09-13 09:54:26,900 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:54:26,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:56:05,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 362.85223 ± 216.824
2025-09-13 09:56:05,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [503.27463, 672.7165, 679.7213, 364.33527, 425.828, 141.05147, 35.6063, 424.00653, 315.33075, 66.65163]
2025-09-13 09:56:05,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 582.0, 437.0, 278.0, 323.0, 117.0, 31.0, 307.0, 249.0, 57.0]
2025-09-13 09:56:05,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 47 minutes, 39 seconds)
2025-09-13 10:06:00,292 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:06:00,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:08:36,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 331.82782 ± 226.420
2025-09-13 10:08:36,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [45.59754, 239.60626, 285.2207, 751.1301, 158.01224, 0.7070811, 505.01862, 571.30145, 459.07455, 302.60974]
2025-09-13 10:08:36,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [30.0, 1000.0, 164.0, 512.0, 211.0, 15.0, 1000.0, 1000.0, 343.0, 1000.0]
2025-09-13 10:08:36,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 35 minutes, 57 seconds)
2025-09-13 10:20:03,298 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:20:03,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:22:56,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 416.34161 ± 288.378
2025-09-13 10:22:56,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [664.01135, 125.212326, 282.53476, 239.36264, 350.89072, 469.62628, 39.919155, 257.60376, 1034.2566, 699.99805]
2025-09-13 10:22:56,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [455.0, 63.0, 1000.0, 170.0, 221.0, 1000.0, 41.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:22:56,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (416.34) for latency ExtremeSparseL4U32
2025-09-13 10:22:56,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 26 minutes, 17 seconds)
2025-09-13 10:34:00,845 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:34:00,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:36:38,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 352.92935 ± 236.197
2025-09-13 10:36:38,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [-1.5441432, 751.2303, 446.7848, 332.61, 239.26749, 638.8028, 172.72356, 411.66266, 24.257496, 513.4984]
2025-09-13 10:36:38,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [31.0, 570.0, 316.0, 229.0, 1000.0, 1000.0, 195.0, 1000.0, 40.0, 1000.0]
2025-09-13 10:36:38,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 13 minutes, 4 seconds)
2025-09-13 10:46:38,770 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:46:38,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:48:50,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 273.06625 ± 194.200
2025-09-13 10:48:50,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [265.30414, 94.44062, 299.4526, 98.81126, 444.0441, 249.93144, 636.1407, 518.1529, 20.270208, 104.11477]
2025-09-13 10:48:50,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 57.0, 1000.0, 65.0, 445.0, 1000.0, 456.0, 430.0, 34.0, 53.0]
2025-09-13 10:48:50,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 58 minutes, 55 seconds)
2025-09-13 10:59:32,886 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:59:32,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:02:18,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 547.27283 ± 329.314
2025-09-13 11:02:18,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [307.2431, 725.2, 1142.1888, 17.313938, 196.4262, 417.1021, 542.5813, 893.7375, 402.1336, 828.8019]
2025-09-13 11:02:18,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [226.0, 1000.0, 769.0, 25.0, 173.0, 1000.0, 1000.0, 663.0, 369.0, 493.0]
2025-09-13 11:02:18,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (547.27) for latency ExtremeSparseL4U32
2025-09-13 11:02:18,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 45 minutes, 57 seconds)
2025-09-13 11:13:55,139 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:13:55,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:15:53,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 360.79510 ± 293.714
2025-09-13 11:15:53,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [855.76434, 37.811047, 137.65085, 142.16063, 657.7853, 145.60527, 681.5154, 303.3024, 29.694376, 616.6616]
2025-09-13 11:15:53,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [616.0, 35.0, 85.0, 149.0, 541.0, 136.0, 1000.0, 1000.0, 36.0, 482.0]
2025-09-13 11:15:53,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 34 minutes, 11 seconds)
2025-09-13 11:25:51,319 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:25:51,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:28:41,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 377.55603 ± 333.221
2025-09-13 11:28:41,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [261.34232, 882.01855, 426.60828, 327.10132, 29.523733, 116.185844, 243.76967, 280.02798, 95.78126, 1113.2014]
2025-09-13 11:28:41,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 365.0, 239.0, 51.0, 98.0, 1000.0, 1000.0, 87.0, 1000.0]
2025-09-13 11:28:41,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 18 minutes, 54 seconds)
2025-09-13 11:39:33,867 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:39:33,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:41:57,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 379.86078 ± 240.448
2025-09-13 11:41:57,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [453.1715, 341.21103, 252.0846, 780.8775, 356.92645, 108.911964, 103.677155, 849.08655, 310.3698, 242.29097]
2025-09-13 11:41:57,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [288.0, 341.0, 1000.0, 1000.0, 267.0, 138.0, 107.0, 572.0, 1000.0, 185.0]
2025-09-13 11:41:57,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 5 minutes, 19 seconds)
2025-09-13 11:53:13,318 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:53:13,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:55:43,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 277.60895 ± 138.862
2025-09-13 11:55:43,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [288.8282, 391.79742, 143.53058, 503.42538, 79.80611, 306.8289, 92.62615, 326.58698, 198.30408, 444.35562]
2025-09-13 11:55:43,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [309.0, 299.0, 107.0, 1000.0, 129.0, 1000.0, 61.0, 303.0, 1000.0, 1000.0]
2025-09-13 11:55:43,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 53 minutes, 30 seconds)
2025-09-13 12:06:01,349 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:06:01,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:08:20,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 335.81039 ± 329.725
2025-09-13 12:08:20,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [369.75607, 275.56223, 58.417007, 353.70068, 262.129, 30.353708, 384.8933, 274.84723, 94.89793, 1253.5469]
2025-09-13 12:08:20,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [215.0, 202.0, 33.0, 1000.0, 1000.0, 22.0, 266.0, 1000.0, 149.0, 804.0]
2025-09-13 12:08:20,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 39 minutes, 37 seconds)
2025-09-13 12:19:29,331 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:19:29,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:21:00,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 216.90642 ± 174.866
2025-09-13 12:21:00,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [33.868496, 215.59074, 91.843285, 114.17087, 354.69836, 56.791138, 596.2487, 425.3058, 105.75477, 174.79218]
2025-09-13 12:21:00,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 177.0, 95.0, 67.0, 1000.0, 49.0, 361.0, 304.0, 58.0, 1000.0]
2025-09-13 12:21:00,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 26 minutes, 2 seconds)
2025-09-13 12:31:57,991 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:31:57,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:35:24,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 518.58197 ± 244.843
2025-09-13 12:35:24,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [513.26495, 863.00134, 322.0621, 562.97797, 249.43979, 270.63528, 1055.6499, 456.05798, 488.43817, 404.29248]
2025-09-13 12:35:24,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [419.0, 530.0, 1000.0, 1000.0, 1000.0, 172.0, 732.0, 1000.0, 1000.0, 259.0]
2025-09-13 12:35:25,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 13 minutes, 20 seconds)
2025-09-13 12:46:02,866 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:46:02,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:48:36,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 353.84274 ± 195.904
2025-09-13 12:48:36,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [497.41052, 462.66028, 159.49007, 388.0923, 69.713524, 50.864407, 309.4126, 443.18335, 459.02765, 698.573]
2025-09-13 12:48:36,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 386.0, 128.0, 265.0, 46.0, 46.0, 1000.0, 375.0, 1000.0, 1000.0]
2025-09-13 12:48:36,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1251 [DEBUG]: Training session finished
