2025-09-12 12:10:37,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc5-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay
2025-09-12 12:10:37,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc5-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay
2025-09-12 12:10:37,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14fed0a4f150>}
2025-09-12 12:10:37,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1111 [DEBUG]: using device: cuda
2025-09-12 12:10:37,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1133 [INFO]: Creating new trainer
2025-09-12 12:10:37,920 baseline-mbpac-noiseperc5-ant:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-09-12 12:10:37,920 baseline-mbpac-noiseperc5-ant:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 12:10:37,929 baseline-mbpac-noiseperc5-ant:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=27, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=8, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-12 12:10:38,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1194 [DEBUG]: Starting training session...
2025-09-12 12:10:38,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 1/100
2025-09-12 12:22:00,174 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:22:00,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 12:24:21,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: -251.82613 ± 236.268
2025-09-12 12:24:21,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [-51.144363, -527.5037, -8.37353, -348.1073, -123.452644, -96.01016, -146.64816, -603.4596, -616.54456, 2.9828937]
2025-09-12 12:24:21,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [235.0, 1000.0, 95.0, 502.0, 259.0, 270.0, 237.0, 1000.0, 1000.0, 14.0]
2025-09-12 12:24:21,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (-251.83) for latency ExtremeSparseL4U32
2025-09-12 12:24:21,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 22 hours, 37 minutes, 44 seconds)
2025-09-12 12:35:37,595 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:35:37,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 12:39:24,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 279.30005 ± 90.485
2025-09-12 12:39:24,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [301.1802, 390.28006, 274.75476, 224.27223, 444.53836, 247.72723, 353.5649, 159.47176, 247.76476, 149.44626]
2025-09-12 12:39:24,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 993.0, 497.0, 1000.0, 859.0, 591.0, 1000.0, 299.0, 1000.0, 306.0]
2025-09-12 12:39:24,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (279.30) for latency ExtremeSparseL4U32
2025-09-12 12:39:24,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 23 hours, 29 minutes, 5 seconds)
2025-09-12 12:50:35,487 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:50:35,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 12:53:49,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 348.71191 ± 179.417
2025-09-12 12:53:49,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [500.30997, 670.8248, 151.43582, 337.0559, 228.64114, 565.2837, 191.4638, 440.32135, 106.10481, 295.67792]
2025-09-12 12:53:49,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 623.0, 778.0, 314.0, 831.0, 300.0, 508.0, 128.0, 994.0]
2025-09-12 12:53:49,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (348.71) for latency ExtremeSparseL4U32
2025-09-12 12:53:49,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 23 hours, 16 minutes, 1 second)
2025-09-12 13:05:05,836 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:05:05,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 13:08:53,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 391.45850 ± 165.908
2025-09-12 13:08:53,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [111.27267, 584.96326, 629.65393, 309.77402, 624.5135, 275.7958, 405.3511, 345.70374, 396.99295, 230.56412]
2025-09-12 13:08:53,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [388.0, 1000.0, 722.0, 724.0, 1000.0, 392.0, 1000.0, 1000.0, 1000.0, 267.0]
2025-09-12 13:08:53,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (391.46) for latency ExtremeSparseL4U32
2025-09-12 13:08:53,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 23 hours, 17 minutes, 52 seconds)
2025-09-12 13:19:50,481 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:19:50,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 13:23:43,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 416.36270 ± 179.773
2025-09-12 13:23:43,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [339.05573, 436.4567, 626.18695, 371.23163, 78.29919, 307.6345, 591.17303, 703.1903, 465.47272, 244.92651]
2025-09-12 13:23:43,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [604.0, 1000.0, 1000.0, 764.0, 100.0, 1000.0, 1000.0, 1000.0, 1000.0, 260.0]
2025-09-12 13:23:43,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (416.36) for latency ExtremeSparseL4U32
2025-09-12 13:23:43,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 23 hours, 8 minutes, 20 seconds)
2025-09-12 13:33:59,552 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:33:59,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 13:37:29,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 615.40656 ± 329.189
2025-09-12 13:37:29,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [195.60695, 315.70773, 1087.0789, 1015.559, 272.36353, 259.89664, 978.0941, 554.0557, 851.0916, 624.6111]
2025-09-12 13:37:29,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [164.0, 297.0, 1000.0, 1000.0, 1000.0, 282.0, 1000.0, 552.0, 1000.0, 641.0]
2025-09-12 13:37:29,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (615.41) for latency ExtremeSparseL4U32
2025-09-12 13:37:29,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 22 hours, 54 minutes, 51 seconds)
2025-09-12 13:48:17,223 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:48:17,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 13:52:58,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 670.94049 ± 294.339
2025-09-12 13:52:58,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [348.69662, 446.24054, 1025.2449, 1041.3207, 698.42017, 386.90512, 636.27734, 852.1774, 1044.2069, 229.91486]
2025-09-12 13:52:58,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 685.0, 729.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:52:58,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (670.94) for latency ExtremeSparseL4U32
2025-09-12 13:52:58,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 22 hours, 48 minutes, 23 seconds)
2025-09-12 14:04:19,029 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:04:19,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 14:08:25,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 717.33984 ± 313.459
2025-09-12 14:08:25,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [712.65857, 393.61722, 369.36612, 393.20813, 1146.5454, 1154.5408, 655.66626, 1192.8113, 637.69385, 517.2909]
2025-09-12 14:08:25,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 322.0, 335.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 574.0, 1000.0]
2025-09-12 14:08:25,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (717.34) for latency ExtremeSparseL4U32
2025-09-12 14:08:25,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 22 hours, 52 minutes, 43 seconds)
2025-09-12 14:19:24,044 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:19:24,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 14:23:35,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 804.76233 ± 348.502
2025-09-12 14:23:35,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [781.9921, 871.3689, 1268.6775, 1196.2443, 1214.7865, 1038.2368, 338.09213, 541.1559, 374.44266, 422.6257]
2025-09-12 14:23:35,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [677.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 266.0, 1000.0, 292.0, 1000.0]
2025-09-12 14:23:35,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (804.76) for latency ExtremeSparseL4U32
2025-09-12 14:23:35,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 22 hours, 39 minutes, 26 seconds)
2025-09-12 14:34:24,189 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:34:24,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 14:38:03,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 683.89941 ± 459.110
2025-09-12 14:38:03,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1430.9536, 360.5774, 1197.5802, 591.1221, 175.1535, 132.08739, 262.55197, 678.1113, 669.96326, 1340.8931]
2025-09-12 14:38:03,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 284.0, 876.0, 1000.0, 1000.0, 149.0, 1000.0, 432.0, 524.0, 1000.0]
2025-09-12 14:38:03,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 22 hours, 18 minutes)
2025-09-12 14:48:36,114 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:48:36,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 14:52:43,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 951.53778 ± 407.603
2025-09-12 14:52:43,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [985.63336, 1205.0945, 1370.1835, 1225.7122, 714.5869, 278.69827, 378.11206, 591.3931, 1437.0275, 1328.9376]
2025-09-12 14:52:43,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 561.0, 1000.0, 294.0, 448.0, 1000.0, 1000.0]
2025-09-12 14:52:43,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (951.54) for latency ExtremeSparseL4U32
2025-09-12 14:52:43,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 22 hours, 19 minutes, 14 seconds)
2025-09-12 15:03:47,621 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:03:47,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 15:08:18,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1103.63379 ± 353.118
2025-09-12 15:08:18,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1444.3159, 937.79535, 596.1809, 1437.2865, 1004.17285, 489.34918, 1476.4337, 1446.2317, 1315.5089, 889.06354]
2025-09-12 15:08:18,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 426.0, 1000.0, 731.0, 1000.0, 1000.0, 1000.0, 1000.0, 661.0]
2025-09-12 15:08:18,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (1103.63) for latency ExtremeSparseL4U32
2025-09-12 15:08:18,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 22 hours, 5 minutes, 47 seconds)
2025-09-12 15:19:41,399 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:19:41,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 15:24:01,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1087.37109 ± 412.287
2025-09-12 15:24:01,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [244.91353, 1415.1632, 1173.5347, 1464.3945, 1352.2976, 1342.6975, 1451.0183, 431.86264, 1099.1517, 898.6774]
2025-09-12 15:24:01,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 906.0, 1000.0, 1000.0, 1000.0, 1000.0, 362.0, 835.0, 640.0]
2025-09-12 15:24:01,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 21 hours, 55 minutes, 23 seconds)
2025-09-12 15:34:42,415 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:34:42,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 15:38:50,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1246.04150 ± 425.816
2025-09-12 15:38:50,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1219.2291, 1514.144, 1187.4581, 1503.743, 159.22755, 1501.1198, 1334.692, 1638.1526, 1561.1272, 841.5222]
2025-09-12 15:38:50,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 750.0, 1000.0, 99.0, 1000.0, 876.0, 1000.0, 1000.0, 568.0]
2025-09-12 15:38:50,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (1246.04) for latency ExtremeSparseL4U32
2025-09-12 15:38:50,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 21 hours, 34 minutes, 21 seconds)
2025-09-12 15:49:56,608 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:49:56,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 15:54:19,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1269.36292 ± 449.821
2025-09-12 15:54:19,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1527.0334, 914.13434, 964.6208, 1627.9481, 1560.6825, 965.6285, 1705.112, 1584.6046, 250.52814, 1593.3364]
2025-09-12 15:54:19,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 497.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 157.0, 995.0]
2025-09-12 15:54:19,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (1269.36) for latency ExtremeSparseL4U32
2025-09-12 15:54:19,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 21 hours, 36 minutes, 42 seconds)
2025-09-12 16:05:29,380 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:05:29,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 16:09:34,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1069.57788 ± 381.870
2025-09-12 16:09:34,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1757.4681, 543.2198, 1302.0831, 859.21576, 968.05664, 1712.1558, 775.65955, 881.27936, 1081.0297, 815.61127]
2025-09-12 16:09:34,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 828.0, 557.0, 574.0, 1000.0, 494.0, 1000.0, 589.0, 1000.0]
2025-09-12 16:09:34,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 21 hours, 31 minutes, 2 seconds)
2025-09-12 16:20:45,422 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:20:45,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 16:23:35,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 583.48523 ± 514.339
2025-09-12 16:23:35,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [995.18164, 818.79443, 499.76825, 412.8815, 1886.0089, 352.66635, 88.56879, 210.3258, 94.75876, 475.89752]
2025-09-12 16:23:35,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [664.0, 1000.0, 298.0, 244.0, 1000.0, 1000.0, 69.0, 1000.0, 73.0, 256.0]
2025-09-12 16:23:35,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 20 hours, 49 minutes, 54 seconds)
2025-09-12 16:33:33,294 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:33:33,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 16:36:12,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 475.41571 ± 275.450
2025-09-12 16:36:12,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [332.33444, 696.0972, 568.7169, 152.8054, 1098.6753, 345.21863, 477.54846, 592.04974, 406.4512, 84.2597]
2025-09-12 16:36:12,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [192.0, 449.0, 1000.0, 98.0, 1000.0, 199.0, 1000.0, 1000.0, 218.0, 82.0]
2025-09-12 16:36:12,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 19 hours, 43 minutes, 44 seconds)
2025-09-12 16:47:40,654 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:47:40,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 16:50:32,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 628.13055 ± 416.243
2025-09-12 16:50:32,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [931.1712, 403.39395, 347.83484, 912.59564, 310.94916, 346.8528, 516.15076, 205.7584, 1657.5266, 649.0721]
2025-09-12 16:50:32,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 554.0, 161.0, 133.0, 295.0, 123.0, 1000.0, 382.0]
2025-09-12 16:50:32,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 19 hours, 21 minutes, 28 seconds)
2025-09-12 17:01:13,689 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:01:13,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 17:04:18,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 830.79150 ± 452.347
2025-09-12 17:04:18,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1639.2236, 147.44905, 497.21573, 1286.7549, 825.9095, 1318.6528, 1039.8337, 495.45282, 406.64734, 650.7757]
2025-09-12 17:04:18,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 96.0, 344.0, 685.0, 361.0, 728.0, 575.0, 309.0, 1000.0, 1000.0]
2025-09-12 17:04:18,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 18 hours, 39 minutes, 42 seconds)
2025-09-12 17:15:03,700 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:15:03,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 17:16:50,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 503.33258 ± 430.376
2025-09-12 17:16:50,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [133.75885, 1106.0935, 169.49425, 923.5447, 230.92238, 186.73856, 190.976, 643.22736, 143.30016, 1305.2703]
2025-09-12 17:16:50,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [57.0, 1000.0, 77.0, 512.0, 116.0, 87.0, 94.0, 1000.0, 85.0, 567.0]
2025-09-12 17:16:50,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 17 hours, 42 minutes, 49 seconds)
2025-09-12 17:27:59,617 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:27:59,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 17:31:54,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1519.40576 ± 665.357
2025-09-12 17:31:54,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [883.1687, 2019.3353, 1756.4277, 2141.8386, 1805.4723, 713.3976, 1823.9551, 1816.1053, 94.233475, 2140.1228]
2025-09-12 17:31:54,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [373.0, 1000.0, 1000.0, 1000.0, 1000.0, 389.0, 1000.0, 1000.0, 51.0, 1000.0]
2025-09-12 17:31:54,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (1519.41) for latency ExtremeSparseL4U32
2025-09-12 17:31:54,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 17 hours, 45 minutes, 35 seconds)
2025-09-12 17:42:35,514 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:42:35,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 17:44:18,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 742.58795 ± 538.774
2025-09-12 17:44:18,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [292.48517, 463.17346, 307.9453, 759.96765, 590.18097, 434.6472, 1829.0687, 820.9898, 1681.5054, 245.91585]
2025-09-12 17:44:18,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [129.0, 254.0, 131.0, 339.0, 243.0, 209.0, 1000.0, 370.0, 688.0, 117.0]
2025-09-12 17:44:18,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 17 hours, 28 minutes, 50 seconds)
2025-09-12 17:55:12,366 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:55:12,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 17:57:55,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1074.73389 ± 662.404
2025-09-12 17:57:55,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1180.9932, 1190.1041, 1841.2823, 1245.4178, 1089.9032, 2304.9863, 765.5322, 1046.6871, 40.769016, 41.66371]
2025-09-12 17:57:55,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [589.0, 1000.0, 1000.0, 520.0, 446.0, 1000.0, 391.0, 403.0, 30.0, 38.0]
2025-09-12 17:57:55,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 17 hours, 4 minutes, 15 seconds)
2025-09-12 18:08:55,532 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:08:55,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 18:11:42,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 713.02460 ± 505.609
2025-09-12 18:11:42,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [69.09737, 1385.7883, 139.32674, 547.79376, 1350.2877, 428.70944, 606.94434, 1537.9255, 742.7372, 321.63535]
2025-09-12 18:11:42,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [39.0, 744.0, 80.0, 1000.0, 1000.0, 174.0, 391.0, 1000.0, 1000.0, 133.0]
2025-09-12 18:11:42,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 16 hours, 50 minutes, 58 seconds)
2025-09-12 18:23:30,045 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:23:30,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 18:26:28,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1036.02771 ± 805.521
2025-09-12 18:26:28,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2006.3971, 2132.3274, 219.08853, 179.26328, 1388.6892, 2112.488, 57.476036, 286.98486, 1250.9734, 726.5892]
2025-09-12 18:26:28,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 139.0, 97.0, 1000.0, 1000.0, 38.0, 130.0, 583.0, 1000.0]
2025-09-12 18:26:28,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 17 hours, 10 minutes, 35 seconds)
2025-09-12 18:36:51,322 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:36:51,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 18:40:02,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1285.32446 ± 657.695
2025-09-12 18:40:02,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2234.6619, 1987.1921, 1699.7076, 2146.8745, 902.32635, 1280.7139, 218.6254, 871.02893, 781.92126, 730.1937]
2025-09-12 18:40:02,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [968.0, 856.0, 798.0, 1000.0, 352.0, 559.0, 152.0, 412.0, 1000.0, 339.0]
2025-09-12 18:40:02,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 16 hours, 34 minutes, 52 seconds)
2025-09-12 18:50:52,698 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:50:52,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 18:54:43,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1533.19983 ± 552.734
2025-09-12 18:54:43,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2160.617, 890.04785, 1869.1294, 1432.6147, 1027.4056, 1424.8524, 1460.3464, 2209.4568, 2257.2793, 600.2479]
2025-09-12 18:54:43,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 963.0, 656.0, 512.0, 633.0, 728.0, 1000.0, 1000.0, 268.0]
2025-09-12 18:54:43,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (1533.20) for latency ExtremeSparseL4U32
2025-09-12 18:54:43,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 16 hours, 53 minutes, 52 seconds)
2025-09-12 19:06:09,213 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:06:09,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 19:10:04,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1683.15210 ± 737.279
2025-09-12 19:10:04,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2070.572, 2405.568, 563.7516, 2191.969, 159.69481, 1889.9624, 1880.6906, 2277.491, 2193.7024, 1198.1201]
2025-09-12 19:10:04,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 242.0, 1000.0, 76.0, 1000.0, 1000.0, 1000.0, 1000.0, 549.0]
2025-09-12 19:10:04,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (1683.15) for latency ExtremeSparseL4U32
2025-09-12 19:10:04,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 17 hours, 4 minutes, 29 seconds)
2025-09-12 19:20:44,442 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:20:44,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 19:24:31,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1509.86548 ± 957.630
2025-09-12 19:24:31,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [671.3727, 2495.9636, 2249.669, 2188.5908, 511.7879, 2125.658, 41.734097, 211.01584, 2211.459, 2391.4045]
2025-09-12 19:24:31,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [345.0, 999.0, 1000.0, 1000.0, 1000.0, 1000.0, 31.0, 177.0, 1000.0, 1000.0]
2025-09-12 19:24:31,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 16 hours, 59 minutes, 22 seconds)
2025-09-12 19:35:45,723 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:35:45,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 19:37:58,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 745.22705 ± 601.154
2025-09-12 19:37:58,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [137.62323, 1626.1266, 914.6338, 202.42816, 486.22723, 182.56216, 632.3341, 498.2227, 727.79517, 2044.3175]
2025-09-12 19:37:58,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [87.0, 817.0, 445.0, 181.0, 228.0, 81.0, 265.0, 254.0, 1000.0, 1000.0]
2025-09-12 19:37:58,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 16 hours, 26 minutes, 42 seconds)
2025-09-12 19:48:27,869 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:48:27,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 19:50:54,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 902.40216 ± 543.261
2025-09-12 19:50:54,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [763.2394, 742.47516, 643.68146, 1276.7362, 775.35614, 1450.4729, 2167.306, 412.37628, 273.3319, 519.0459]
2025-09-12 19:50:54,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [302.0, 287.0, 348.0, 589.0, 363.0, 615.0, 1000.0, 1000.0, 116.0, 237.0]
2025-09-12 19:50:54,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 16 hours, 3 minutes, 45 seconds)
2025-09-12 20:02:20,603 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:02:20,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:04:16,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 619.55090 ± 600.903
2025-09-12 20:04:16,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [592.00006, 301.21844, 2121.9165, 319.58246, 126.28916, 314.98276, 1183.5813, 909.5803, 221.48016, 104.877975]
2025-09-12 20:04:16,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [225.0, 124.0, 1000.0, 168.0, 77.0, 119.0, 1000.0, 1000.0, 102.0, 87.0]
2025-09-12 20:04:16,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 15 hours, 32 minutes, 4 seconds)
2025-09-12 20:15:06,709 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:15:06,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:18:08,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1270.60168 ± 674.758
2025-09-12 20:18:08,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [770.04065, 500.9275, 1296.9712, 1642.3337, 1627.0546, 2263.7258, 2387.5618, 309.71362, 1161.4612, 746.22723]
2025-09-12 20:18:08,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [356.0, 205.0, 596.0, 642.0, 715.0, 1000.0, 1000.0, 138.0, 1000.0, 321.0]
2025-09-12 20:18:08,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 14 hours, 58 minutes, 33 seconds)
2025-09-12 20:28:39,666 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:28:39,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:31:48,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1087.21265 ± 779.184
2025-09-12 20:31:48,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [421.1306, 2436.7432, 1348.5331, 296.47202, 355.01422, 2175.5718, 1038.597, 71.767845, 1043.7543, 1684.5432]
2025-09-12 20:31:48,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [208.0, 1000.0, 1000.0, 1000.0, 148.0, 1000.0, 415.0, 58.0, 518.0, 1000.0]
2025-09-12 20:31:48,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 14 hours, 34 minutes, 48 seconds)
2025-09-12 20:42:55,854 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:42:55,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:46:16,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1200.79407 ± 769.120
2025-09-12 20:46:16,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [768.38257, 2383.597, 1759.943, 79.87533, 485.88885, 1730.0952, 869.63654, 772.28644, 2412.6514, 745.5838]
2025-09-12 20:46:16,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 767.0, 46.0, 190.0, 1000.0, 1000.0, 271.0, 1000.0, 301.0]
2025-09-12 20:46:16,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 14 hours, 34 minutes, 10 seconds)
2025-09-12 20:56:52,264 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:56:52,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:01:06,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1872.47815 ± 619.432
2025-09-12 21:01:06,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2354.2812, 1977.0298, 2002.3064, 1836.2338, 2168.7893, 369.69333, 2405.1182, 1099.9287, 2089.4033, 2421.9968]
2025-09-12 21:01:06,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 870.0, 1000.0, 812.0, 982.0, 188.0, 1000.0, 486.0, 1000.0, 1000.0]
2025-09-12 21:01:06,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (1872.48) for latency ExtremeSparseL4U32
2025-09-12 21:01:06,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 14 hours, 44 minutes, 22 seconds)
2025-09-12 21:12:04,791 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:12:04,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:15:30,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1408.54370 ± 837.716
2025-09-12 21:15:30,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [277.24075, 699.1915, 2094.014, 426.8758, 1676.4542, 2294.0256, 1580.3024, 1703.5171, 506.0373, 2827.779]
2025-09-12 21:15:30,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [124.0, 476.0, 777.0, 198.0, 633.0, 1000.0, 1000.0, 636.0, 1000.0, 1000.0]
2025-09-12 21:15:30,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 14 hours, 43 minutes, 9 seconds)
2025-09-12 21:26:25,363 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:26:25,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:30:38,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1895.24182 ± 534.325
2025-09-12 21:30:38,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2280.4712, 875.4078, 999.59106, 2334.7896, 2186.2163, 2295.0054, 2147.438, 2082.4001, 1471.015, 2280.0837]
2025-09-12 21:30:38,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 448.0, 376.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 694.0, 1000.0]
2025-09-12 21:30:38,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (1895.24) for latency ExtremeSparseL4U32
2025-09-12 21:30:38,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 14 hours, 44 minutes, 34 seconds)
2025-09-12 21:41:38,030 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:41:38,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:44:40,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1402.94946 ± 860.792
2025-09-12 21:44:40,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2084.634, 1769.2423, 2102.6997, 1116.958, 562.4687, 2447.498, 92.9714, 2482.2632, 170.08391, 1200.676]
2025-09-12 21:44:40,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 786.0, 1000.0, 431.0, 205.0, 1000.0, 53.0, 1000.0, 87.0, 494.0]
2025-09-12 21:44:41,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 14 hours, 34 minutes, 24 seconds)
2025-09-12 21:56:18,234 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:56:18,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:58:26,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 964.99512 ± 807.081
2025-09-12 21:58:26,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [295.95776, 357.67422, 1456.3763, 194.12984, 1387.4678, 2846.1929, 1546.9579, 888.51, 423.06186, 253.62271]
2025-09-12 21:58:26,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [140.0, 140.0, 1000.0, 142.0, 547.0, 1000.0, 642.0, 396.0, 204.0, 92.0]
2025-09-12 21:58:26,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 14 hours, 11 minutes, 32 seconds)
2025-09-12 22:09:39,906 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:09:39,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:12:23,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1307.73572 ± 668.887
2025-09-12 22:12:23,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [129.29398, 2505.1099, 1810.8662, 1826.3608, 1423.533, 1686.6439, 700.5656, 1382.9276, 1023.79126, 588.2661]
2025-09-12 22:12:23,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [116.0, 1000.0, 678.0, 603.0, 579.0, 1000.0, 255.0, 523.0, 397.0, 222.0]
2025-09-12 22:12:23,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 13 hours, 47 minutes, 3 seconds)
2025-09-12 22:22:43,255 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:22:43,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:24:35,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 944.12958 ± 838.269
2025-09-12 22:24:35,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [321.9274, 1083.5972, 159.56776, 57.89351, 884.77386, 2488.3945, 2431.3535, 1103.9166, 214.99274, 694.879]
2025-09-12 22:24:35,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [187.0, 463.0, 96.0, 39.0, 312.0, 1000.0, 873.0, 384.0, 105.0, 288.0]
2025-09-12 22:24:35,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 13 hours, 7 minutes, 36 seconds)
2025-09-12 22:35:17,508 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:35:17,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:37:47,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1181.01758 ± 912.135
2025-09-12 22:37:47,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2516.1018, 116.86161, 2976.2856, 1676.5221, 1060.3845, 341.9036, 1113.6105, 472.6638, 337.66077, 1198.1808]
2025-09-12 22:37:47,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [897.0, 63.0, 987.0, 626.0, 433.0, 146.0, 1000.0, 204.0, 145.0, 407.0]
2025-09-12 22:37:47,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 12 hours, 32 minutes, 3 seconds)
2025-09-12 22:48:45,041 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:48:45,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:52:20,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1644.35193 ± 1059.921
2025-09-12 22:52:20,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2725.6067, 2801.1843, 2493.372, 367.29724, 109.94866, 2643.4658, 2590.4646, 1340.5918, 371.50882, 1000.079]
2025-09-12 22:52:20,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 66.0, 860.0, 1000.0, 498.0, 120.0, 460.0]
2025-09-12 22:52:20,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 12 hours, 24 minutes, 10 seconds)
2025-09-12 23:03:38,302 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:03:38,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:06:22,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1210.84595 ± 1017.894
2025-09-12 23:06:22,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1256.9913, 26.045494, 1126.8553, 1178.0443, 180.54611, 2757.1665, 2623.617, 2462.1062, 449.56848, 47.519047]
2025-09-12 23:06:22,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [499.0, 24.0, 414.0, 465.0, 77.0, 1000.0, 1000.0, 857.0, 1000.0, 41.0]
2025-09-12 23:06:22,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 12 hours, 13 minutes, 44 seconds)
2025-09-12 23:17:07,231 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:17:07,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:21:14,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1984.38123 ± 735.506
2025-09-12 23:21:14,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1825.1694, 2518.1184, 2629.4456, 2280.5925, 2316.9363, 2572.185, 1690.1465, 593.0736, 717.7065, 2700.4375]
2025-09-12 23:21:14,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [746.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 690.0, 277.0, 370.0, 1000.0]
2025-09-12 23:21:14,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (1984.38) for latency ExtremeSparseL4U32
2025-09-12 23:21:14,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 12 hours, 9 minutes, 44 seconds)
2025-09-12 23:32:12,130 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:32:12,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:36:35,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2042.15625 ± 729.825
2025-09-12 23:36:35,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2376.4055, 1754.5396, 2522.9937, 2284.668, 2641.9995, 387.87448, 2825.417, 1553.462, 2705.3657, 1368.8365]
2025-09-12 23:36:35,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 633.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 472.0, 1000.0, 575.0]
2025-09-12 23:36:35,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (2042.16) for latency ExtremeSparseL4U32
2025-09-12 23:36:35,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 12 hours, 28 minutes, 51 seconds)
2025-09-12 23:47:30,243 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:47:30,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:49:20,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1076.23181 ± 1069.803
2025-09-12 23:49:20,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [957.086, 3135.458, 190.62518, 799.90326, 284.13644, 392.02618, 2909.7256, 83.25628, 343.1837, 1666.9183]
2025-09-12 23:49:20,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [351.0, 905.0, 68.0, 263.0, 118.0, 152.0, 944.0, 45.0, 149.0, 680.0]
2025-09-12 23:49:21,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 12 hours, 9 minutes, 50 seconds)
2025-09-13 00:00:28,021 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:00:28,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:03:30,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1693.36487 ± 1220.240
2025-09-13 00:03:30,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2821.473, 2910.555, 2765.3962, 433.4297, 1161.0271, 619.13367, 301.27686, 12.716954, 3052.089, 2856.5522]
2025-09-13 00:03:30,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 185.0, 434.0, 255.0, 150.0, 21.0, 1000.0, 1000.0]
2025-09-13 00:03:30,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 11 hours, 51 minutes, 39 seconds)
2025-09-13 00:14:48,386 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:14:48,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:19:32,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2351.01685 ± 522.508
2025-09-13 00:19:32,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2728.32, 1484.1244, 2206.1396, 2698.779, 2642.2354, 2687.496, 2518.6138, 1256.0746, 2423.4663, 2864.918]
2025-09-13 00:19:32,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 628.0, 866.0, 1000.0]
2025-09-13 00:19:32,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (2351.02) for latency ExtremeSparseL4U32
2025-09-13 00:19:32,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 11 hours, 57 minutes, 2 seconds)
2025-09-13 00:30:18,165 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:30:18,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:33:04,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1120.21033 ± 992.130
2025-09-13 00:33:04,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [80.124435, 469.1581, 1125.3159, 2306.2856, 873.5339, 2688.2434, 202.15671, 262.17072, 2687.6074, 507.50748]
2025-09-13 00:33:04,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [48.0, 241.0, 1000.0, 830.0, 1000.0, 973.0, 67.0, 98.0, 1000.0, 170.0]
2025-09-13 00:33:04,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 11 hours, 29 minutes, 34 seconds)
2025-09-13 00:44:02,036 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:44:02,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:47:30,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1729.04565 ± 1062.684
2025-09-13 00:47:30,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [336.31482, 682.8992, 605.5556, 2962.5786, 283.32748, 2197.3474, 2366.3296, 2833.5713, 2113.9185, 2908.616]
2025-09-13 00:47:30,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [124.0, 1000.0, 263.0, 1000.0, 109.0, 878.0, 757.0, 1000.0, 764.0, 1000.0]
2025-09-13 00:47:30,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 11 hours, 6 minutes, 32 seconds)
2025-09-13 00:58:15,038 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:58:15,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:59:55,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 924.73145 ± 832.004
2025-09-13 00:59:55,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2178.369, 62.98528, 619.05914, 564.8832, 1048.5596, 30.648165, 169.48149, 438.7019, 1790.8743, 2343.7534]
2025-09-13 00:59:55,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [692.0, 42.0, 233.0, 164.0, 369.0, 25.0, 79.0, 168.0, 564.0, 992.0]
2025-09-13 00:59:55,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 10 hours, 49 minutes, 20 seconds)
2025-09-13 01:11:41,393 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:11:41,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:14:34,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1735.22729 ± 1117.113
2025-09-13 01:14:34,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2676.2158, 2956.4744, 1096.1006, 1092.2128, 2897.9512, 1868.2118, 3244.4993, 1365.7283, 77.88467, 76.993256]
2025-09-13 01:14:34,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 362.0, 333.0, 1000.0, 582.0, 1000.0, 405.0, 41.0, 58.0]
2025-09-13 01:14:34,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 10 hours, 39 minutes, 37 seconds)
2025-09-13 01:25:26,506 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:25:26,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:28:51,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1733.62573 ± 1164.238
2025-09-13 01:28:51,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [637.0993, 306.39703, 2675.5735, 1156.4076, 3045.6572, 2827.8938, 2904.707, 690.67224, 195.70122, 2896.149]
2025-09-13 01:28:51,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [273.0, 132.0, 934.0, 349.0, 924.0, 933.0, 1000.0, 314.0, 1000.0, 1000.0]
2025-09-13 01:28:51,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 10 hours, 9 minutes, 51 seconds)
2025-09-13 01:39:21,133 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:39:21,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:43:53,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2321.72925 ± 946.216
2025-09-13 01:43:53,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2578.7144, 2736.3376, 3453.1746, 97.60134, 1667.8816, 3052.218, 2909.887, 1361.7834, 2638.9387, 2720.7563]
2025-09-13 01:43:53,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [898.0, 1000.0, 1000.0, 1000.0, 651.0, 1000.0, 1000.0, 472.0, 920.0, 1000.0]
2025-09-13 01:43:53,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 10 hours, 8 minutes, 59 seconds)
2025-09-13 01:55:24,772 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:55:24,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:59:14,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2189.20605 ± 1086.919
2025-09-13 01:59:14,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2874.919, 2930.2483, 166.42299, 2982.8818, 2799.3289, 1122.0846, 2559.281, 441.5642, 2883.0396, 3132.2908]
2025-09-13 01:59:14,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 61.0, 1000.0, 959.0, 414.0, 1000.0, 153.0, 1000.0, 1000.0]
2025-09-13 01:59:14,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 10 hours, 2 minutes, 38 seconds)
2025-09-13 02:10:06,980 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:10:06,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:14:12,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2159.28394 ± 1261.105
2025-09-13 02:14:12,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [108.310715, 2996.1624, 2994.3494, 2966.9197, 51.923035, 655.08105, 2403.8008, 3178.1646, 3077.08, 3161.0488]
2025-09-13 02:14:12,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 53.0, 211.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:14:12,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 10 hours, 9 minutes, 8 seconds)
2025-09-13 02:24:47,548 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:24:47,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:27:08,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1262.71362 ± 869.067
2025-09-13 02:27:08,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1087.5469, 3537.6519, 689.9706, 874.2811, 597.8496, 910.25073, 1448.2263, 1292.259, 316.82352, 1872.2769]
2025-09-13 02:27:08,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [372.0, 1000.0, 1000.0, 234.0, 210.0, 324.0, 498.0, 426.0, 102.0, 572.0]
2025-09-13 02:27:08,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 9 hours, 40 minutes, 34 seconds)
2025-09-13 02:38:23,608 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:38:23,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:41:45,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1725.56836 ± 1136.977
2025-09-13 02:41:45,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2929.2402, 590.42084, 642.2365, 3251.364, 2619.3015, 24.168081, 2519.0884, 844.62256, 1047.252, 2787.9888]
2025-09-13 02:41:45,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 255.0, 1000.0, 1000.0, 848.0, 20.0, 818.0, 359.0, 324.0, 1000.0]
2025-09-13 02:41:45,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 9 hours, 28 minutes, 42 seconds)
2025-09-13 02:52:13,451 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:52:13,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:54:24,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1270.38745 ± 1030.887
2025-09-13 02:54:24,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1473.2408, 107.31564, 2503.1682, 2778.5857, 318.2083, 244.62158, 1850.9102, 371.87152, 478.42645, 2577.5266]
2025-09-13 02:54:24,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [466.0, 62.0, 764.0, 1000.0, 134.0, 91.0, 572.0, 121.0, 138.0, 962.0]
2025-09-13 02:54:24,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 8 hours, 55 minutes, 58 seconds)
2025-09-13 03:05:15,847 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:05:15,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:08:32,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1739.64355 ± 1373.873
2025-09-13 03:08:32,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3073.2468, 68.13023, 718.43665, 2468.8699, 2877.4822, 3331.5447, 1069.5371, 87.64162, 163.09773, 3538.4497]
2025-09-13 03:08:32,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 59.0, 1000.0, 906.0, 1000.0, 1000.0, 306.0, 52.0, 112.0, 1000.0]
2025-09-13 03:08:32,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 8 hours, 32 minutes, 46 seconds)
2025-09-13 03:19:54,394 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:19:54,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:23:22,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2037.97192 ± 986.725
2025-09-13 03:23:22,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [164.41646, 3139.3408, 3435.4329, 2216.411, 1086.7407, 1629.926, 1756.3347, 1789.0293, 3363.9246, 1798.1624]
2025-09-13 03:23:22,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [69.0, 1000.0, 1000.0, 763.0, 420.0, 473.0, 571.0, 1000.0, 1000.0, 510.0]
2025-09-13 03:23:22,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 8 hours, 17 minutes, 59 seconds)
2025-09-13 03:33:56,370 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:33:56,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:36:43,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1415.64380 ± 1054.753
2025-09-13 03:36:43,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [937.3666, 1734.2999, 624.42883, 299.1193, 113.20869, 247.99727, 2478.0195, 1944.9209, 2649.0063, 3128.0703]
2025-09-13 03:36:43,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [249.0, 527.0, 210.0, 1000.0, 43.0, 121.0, 1000.0, 589.0, 771.0, 1000.0]
2025-09-13 03:36:43,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 8 hours, 7 minutes, 6 seconds)
2025-09-13 03:48:02,223 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:48:02,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:51:52,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1832.98828 ± 1114.400
2025-09-13 03:51:52,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3042.3896, 920.9718, 1288.5507, 893.8978, 2612.601, 339.1293, 368.7992, 3158.0352, 3105.025, 2600.482]
2025-09-13 03:51:52,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 292.0, 479.0, 1000.0, 749.0, 1000.0, 169.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:51:52,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 56 minutes, 44 seconds)
2025-09-13 04:03:09,572 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:03:09,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:05:28,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1404.90369 ± 1261.694
2025-09-13 04:05:28,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1274.2513, 202.04933, 3382.3347, 870.49646, 29.566107, 3023.157, 478.07858, 129.78665, 1448.352, 3210.9646]
2025-09-13 04:05:28,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [373.0, 127.0, 1000.0, 314.0, 45.0, 1000.0, 153.0, 79.0, 474.0, 1000.0]
2025-09-13 04:05:28,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 7 hours, 49 minutes)
2025-09-13 04:15:38,478 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:15:38,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:19:40,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1986.17346 ± 1143.117
2025-09-13 04:19:40,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1956.0656, 3557.81, 1365.2902, 3219.047, 364.17407, 298.0041, 2424.3145, 2112.5293, 1093.4082, 3471.0908]
2025-09-13 04:19:40,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 448.0, 1000.0, 1000.0, 118.0, 1000.0, 1000.0, 401.0, 1000.0]
2025-09-13 04:19:40,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 7 hours, 35 minutes, 16 seconds)
2025-09-13 04:31:09,827 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:31:09,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:35:15,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2571.29492 ± 740.959
2025-09-13 04:35:15,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3129.5354, 1266.1766, 2734.7466, 3031.5596, 2953.25, 1315.3088, 3387.4224, 2549.7898, 2029.8341, 3315.326]
2025-09-13 04:35:15,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 452.0, 1000.0, 1000.0, 834.0, 387.0, 1000.0, 762.0, 592.0, 1000.0]
2025-09-13 04:35:15,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (2571.29) for latency ExtremeSparseL4U32
2025-09-13 04:35:15,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 25 minutes, 38 seconds)
2025-09-13 04:45:45,217 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:45:45,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:47:34,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1156.93469 ± 1073.348
2025-09-13 04:47:34,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [904.0452, 232.37512, 3369.493, 1273.4899, 2967.3552, 335.2621, 1026.2349, 89.067566, 422.123, 949.90076]
2025-09-13 04:47:34,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [256.0, 97.0, 1000.0, 329.0, 1000.0, 146.0, 314.0, 47.0, 123.0, 326.0]
2025-09-13 04:47:34,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 5 minutes, 5 seconds)
2025-09-13 04:58:33,534 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:58:33,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:00:34,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1128.13159 ± 799.844
2025-09-13 05:00:34,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [663.2145, 2372.1777, 1419.8422, 175.0736, 2060.9067, 131.69452, 2011.4443, 352.55493, 609.8966, 1484.5122]
2025-09-13 05:00:34,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [194.0, 659.0, 476.0, 68.0, 682.0, 80.0, 582.0, 108.0, 188.0, 1000.0]
2025-09-13 05:00:34,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 38 minutes, 29 seconds)
2025-09-13 05:11:41,829 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:11:41,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:16:02,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2613.47754 ± 981.649
2025-09-13 05:16:02,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3507.1733, 3027.5837, 3371.4595, 2861.8394, 3036.7046, 280.477, 2748.9702, 3284.4734, 1231.433, 2784.662]
2025-09-13 05:16:02,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 995.0, 1000.0, 1000.0, 100.0, 1000.0, 1000.0, 465.0, 1000.0]
2025-09-13 05:16:02,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (2613.48) for latency ExtremeSparseL4U32
2025-09-13 05:16:02,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 35 minutes, 14 seconds)
2025-09-13 05:27:38,078 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:27:38,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:30:47,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1872.91833 ± 1075.384
2025-09-13 05:30:47,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1791.8082, 1635.7206, 3270.583, 1536.7861, 3419.3235, 15.542341, 1158.7909, 3393.9697, 1289.7384, 1216.9204]
2025-09-13 05:30:47,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [581.0, 479.0, 1000.0, 418.0, 1000.0, 20.0, 348.0, 1000.0, 427.0, 1000.0]
2025-09-13 05:30:47,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 23 minutes, 59 seconds)
2025-09-13 05:40:49,331 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:40:49,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:45:14,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2677.00171 ± 638.557
2025-09-13 05:45:14,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2883.45, 1466.0181, 1903.549, 2971.3652, 3293.2017, 3144.5422, 2253.9565, 3389.3127, 2187.8823, 3276.7405]
2025-09-13 05:45:14,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 475.0, 774.0, 1000.0, 1000.0, 1000.0, 789.0, 1000.0, 692.0, 1000.0]
2025-09-13 05:45:14,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (2677.00) for latency ExtremeSparseL4U32
2025-09-13 05:45:14,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 3 minutes, 55 seconds)
2025-09-13 05:56:47,704 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:56:47,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:00:10,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2168.56348 ± 1139.667
2025-09-13 06:00:10,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2734.424, 1489.9874, 1475.4222, 497.42313, 1708.283, 434.96356, 2995.5476, 3641.4514, 3517.2153, 3190.9155]
2025-09-13 06:00:10,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [963.0, 420.0, 465.0, 213.0, 527.0, 132.0, 972.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:00:10,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 3 minutes)
2025-09-13 06:11:16,318 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:11:16,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:14:52,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2290.98413 ± 1000.778
2025-09-13 06:14:52,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2969.25, 3432.6096, 1090.8184, 1441.572, 931.3069, 2408.9062, 3306.588, 3590.8518, 2627.3862, 1110.5526]
2025-09-13 06:14:52,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 329.0, 544.0, 289.0, 715.0, 1000.0, 1000.0, 765.0, 382.0]
2025-09-13 06:14:52,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 56 minutes, 37 seconds)
2025-09-13 06:26:02,422 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:26:02,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:29:49,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1920.60718 ± 1035.661
2025-09-13 06:29:49,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [82.88319, 1289.1458, 2019.1195, 2174.6375, 2038.5652, 2850.7988, 2893.382, 2887.399, 118.288216, 2851.8518]
2025-09-13 06:29:49,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 395.0, 1000.0, 645.0, 643.0, 853.0, 790.0, 1000.0, 55.0, 1000.0]
2025-09-13 06:29:49,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 39 minutes, 20 seconds)
2025-09-13 06:40:20,627 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:40:20,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:44:47,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2917.69238 ± 496.999
2025-09-13 06:44:47,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2265.9917, 3471.3328, 3488.5476, 1975.9142, 3091.8435, 2556.37, 2955.5803, 3408.0886, 2741.7874, 3221.4695]
2025-09-13 06:44:47,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [715.0, 1000.0, 1000.0, 623.0, 1000.0, 685.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:44:47,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (2917.69) for latency ExtremeSparseL4U32
2025-09-13 06:44:47,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 25 minutes, 38 seconds)
2025-09-13 06:55:25,620 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:55:25,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:59:15,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2696.74268 ± 1308.498
2025-09-13 06:59:15,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3594.9275, 563.339, 210.14473, 3535.6162, 3709.6003, 1545.0819, 3294.819, 3204.6584, 3462.658, 3846.5825]
2025-09-13 06:59:15,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 184.0, 90.0, 1000.0, 1000.0, 465.0, 1000.0, 856.0, 1000.0, 1000.0]
2025-09-13 06:59:15,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 10 minutes, 52 seconds)
2025-09-13 07:10:20,799 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:10:20,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:14:05,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2403.76221 ± 1171.033
2025-09-13 07:14:05,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3240.8386, 3459.5327, 3509.4558, 2835.79, 127.69339, 3279.7456, 614.6202, 3249.0312, 1728.3254, 1992.5903]
2025-09-13 07:14:05,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 56.0, 1000.0, 245.0, 1000.0, 466.0, 753.0]
2025-09-13 07:14:05,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 55 minutes, 38 seconds)
2025-09-13 07:24:47,174 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:24:47,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:29:04,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2803.02319 ± 814.499
2025-09-13 07:29:04,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3444.8977, 3436.5173, 3603.871, 3164.9202, 1944.6486, 3306.3325, 1148.0944, 3212.318, 1751.3197, 3017.312]
2025-09-13 07:29:04,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 581.0, 1000.0, 426.0, 1000.0, 579.0, 1000.0]
2025-09-13 07:29:04,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 42 minutes)
2025-09-13 07:40:05,255 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:40:05,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:43:37,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2008.42151 ± 1328.058
2025-09-13 07:43:37,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3853.0474, 433.10983, 3467.3901, 1928.3622, 1068.5248, 3449.4277, 3081.4631, 469.83307, 238.63461, 2094.4202]
2025-09-13 07:43:37,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 174.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 186.0, 93.0, 596.0]
2025-09-13 07:43:37,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 25 minutes, 40 seconds)
2025-09-13 07:54:55,882 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:54:55,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:57:46,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1794.88501 ± 1198.864
2025-09-13 07:57:46,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3572.9233, 486.70734, 1784.9498, 2931.9626, 2660.6284, 1541.0142, 3377.4045, 457.1601, 357.2528, 778.84784]
2025-09-13 07:57:46,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 172.0, 520.0, 1000.0, 849.0, 457.0, 1000.0, 196.0, 234.0, 226.0]
2025-09-13 07:57:46,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 8 minutes, 6 seconds)
2025-09-13 08:08:11,625 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:08:11,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:12:39,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2537.69336 ± 1049.016
2025-09-13 08:12:39,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1480.3492, 2709.9084, 3251.3594, 2944.9895, 3536.2947, 3132.9272, 3481.348, 10.489513, 2925.7446, 1903.5256]
2025-09-13 08:12:39,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 871.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 35.0, 921.0, 1000.0]
2025-09-13 08:12:39,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 54 minutes, 51 seconds)
2025-09-13 08:24:37,589 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:24:37,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:27:22,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1706.83716 ± 1335.363
2025-09-13 08:27:22,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3380.7822, 496.23615, 3453.0642, 58.793285, 1673.2375, 117.19686, 952.92834, 763.5063, 3144.173, 3028.453]
2025-09-13 08:27:22,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 154.0, 963.0, 38.0, 527.0, 89.0, 314.0, 310.0, 1000.0, 1000.0]
2025-09-13 08:27:22,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 39 minutes, 49 seconds)
2025-09-13 08:37:54,542 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:37:54,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:41:44,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2610.00098 ± 1318.099
2025-09-13 08:41:44,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3578.6475, 3083.719, 3348.2183, 3545.3809, 3696.7407, 1670.1238, 74.62806, 3128.9565, 3607.3987, 366.1968]
2025-09-13 08:41:44,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 866.0, 1000.0, 1000.0, 1000.0, 533.0, 49.0, 1000.0, 1000.0, 114.0]
2025-09-13 08:41:44,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 23 minutes, 26 seconds)
2025-09-13 08:52:41,600 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:52:41,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:56:26,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2412.15039 ± 1078.469
2025-09-13 08:56:26,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3600.1375, 3555.7249, 2379.3506, 3376.7144, 1962.4156, 2561.2917, 268.4018, 1574.707, 1312.6011, 3530.1577]
2025-09-13 08:56:26,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 977.0, 1000.0, 1000.0, 583.0, 1000.0, 94.0, 476.0, 389.0, 1000.0]
2025-09-13 08:56:26,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 9 minutes, 20 seconds)
2025-09-13 09:07:15,400 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:07:15,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:11:53,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2391.80225 ± 863.109
2025-09-13 09:11:53,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3561.65, 1697.7446, 2210.6174, 2780.8928, 3386.2234, 3123.0334, 3093.9375, 1655.3315, 1070.4961, 1338.0984]
2025-09-13 09:11:53,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 868.0, 1000.0, 1000.0, 1000.0, 1000.0, 291.0, 1000.0]
2025-09-13 09:11:53,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 57 minutes, 53 seconds)
2025-09-13 09:22:29,816 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:22:29,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:24:50,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1639.83276 ± 1388.194
2025-09-13 09:24:50,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3470.2112, 396.09494, 3802.6833, 431.86343, 87.94124, 1003.52997, 634.92395, 2422.966, 765.8043, 3382.3096]
2025-09-13 09:24:50,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 117.0, 1000.0, 163.0, 42.0, 268.0, 225.0, 623.0, 317.0, 953.0]
2025-09-13 09:24:50,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 38 minutes, 48 seconds)
2025-09-13 09:36:44,006 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:36:44,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:39:32,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1871.41931 ± 1395.924
2025-09-13 09:39:32,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3501.8699, 696.1912, 712.7707, 3542.4648, 3637.6406, 115.515396, 713.28, 3124.3555, 2235.7214, 434.38422]
2025-09-13 09:39:32,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 236.0, 221.0, 1000.0, 1000.0, 75.0, 228.0, 1000.0, 654.0, 130.0]
2025-09-13 09:39:32,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 24 minutes, 20 seconds)
2025-09-13 09:49:47,199 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:49:47,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:53:15,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1981.73474 ± 1270.656
2025-09-13 09:53:15,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3419.6929, 1102.7596, 331.6708, 2448.7317, 3246.3042, 336.77414, 3223.7556, 1893.8127, 407.25867, 3406.5886]
2025-09-13 09:53:15,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 362.0, 112.0, 760.0, 1000.0, 101.0, 1000.0, 559.0, 1000.0, 1000.0]
2025-09-13 09:53:15,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 8 minutes, 43 seconds)
2025-09-13 10:04:29,389 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:04:29,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:07:18,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1640.44495 ± 1238.896
2025-09-13 10:07:18,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1012.71924, 2418.646, 1369.1067, 1722.8527, 786.83167, 201.84268, 3621.7915, 1114.7705, 3933.4065, 222.48244]
2025-09-13 10:07:18,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [256.0, 620.0, 1000.0, 1000.0, 225.0, 87.0, 1000.0, 320.0, 1000.0, 78.0]
2025-09-13 10:07:18,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 53 minutes, 23 seconds)
2025-09-13 10:17:49,185 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:17:49,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:21:30,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1984.59302 ± 1215.021
2025-09-13 10:21:30,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [620.7279, 448.9061, 1584.4384, 46.053596, 2260.0637, 3175.9263, 3416.7742, 1860.0839, 3337.617, 3095.3398]
2025-09-13 10:21:30,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 138.0, 575.0, 39.0, 634.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:21:30,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 37 minutes, 28 seconds)
2025-09-13 10:32:47,834 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:32:47,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:35:42,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1897.56763 ± 1268.677
2025-09-13 10:35:42,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1232.7906, 3195.601, 2609.4727, 3052.9524, 412.6075, 3663.6167, 275.8541, 921.7003, 3071.17, 539.911]
2025-09-13 10:35:42,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [384.0, 1000.0, 815.0, 1000.0, 133.0, 1000.0, 101.0, 250.0, 1000.0, 158.0]
2025-09-13 10:35:42,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 25 minutes, 2 seconds)
2025-09-13 10:47:11,483 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:47:11,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:51:13,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2407.97021 ± 1320.412
2025-09-13 10:51:13,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3722.2195, 3594.4976, 1195.8645, 2125.7932, 3485.2607, 3136.1274, 2970.8074, 292.8246, 3432.879, 123.42777]
2025-09-13 10:51:13,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 357.0, 624.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 63.0]
2025-09-13 10:51:13,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 11 minutes, 41 seconds)
2025-09-13 11:01:18,758 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:01:18,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:06:02,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 3128.43042 ± 325.834
2025-09-13 11:06:02,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3096.287, 2856.9795, 3743.273, 3494.684, 3106.1624, 2762.6897, 2618.219, 3396.1829, 3165.787, 3044.0417]
2025-09-13 11:06:02,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 827.0, 1000.0, 1000.0, 1000.0, 816.0, 760.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:06:02,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (3128.43) for latency ExtremeSparseL4U32
2025-09-13 11:06:02,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 58 minutes, 13 seconds)
2025-09-13 11:17:10,522 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:17:10,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:22:00,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2889.45923 ± 802.118
2025-09-13 11:22:00,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3424.9407, 3130.8457, 3414.5881, 3124.2605, 3251.557, 3485.4128, 2278.6616, 2826.6616, 706.77454, 3250.889]
2025-09-13 11:22:00,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 683.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:22:00,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 44 minutes, 48 seconds)
2025-09-13 11:32:42,889 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:32:42,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:36:46,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2631.74780 ± 1030.054
2025-09-13 11:36:46,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2251.1729, 2536.049, 2482.305, 1871.4875, 3613.7268, 3334.7625, 54.572075, 3284.9355, 3378.9753, 3509.492]
2025-09-13 11:36:46,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [661.0, 1000.0, 800.0, 555.0, 1000.0, 1000.0, 31.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:36:46,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 30 minutes, 6 seconds)
2025-09-13 11:48:27,894 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:48:27,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:51:19,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2000.40747 ± 1222.338
2025-09-13 11:51:19,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [209.14043, 2755.4172, 3428.5051, 2132.3801, 6.6878333, 1596.1461, 1222.6866, 1724.4685, 3456.4163, 3472.224]
2025-09-13 11:51:19,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [92.0, 718.0, 1000.0, 551.0, 16.0, 480.0, 450.0, 446.0, 1000.0, 887.0]
2025-09-13 11:51:19,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 15 minutes, 7 seconds)
2025-09-13 12:02:09,438 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:02:09,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:06:23,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2502.60181 ± 1016.667
2025-09-13 12:06:23,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3495.8672, 1280.463, 1254.9028, 1271.746, 2274.1807, 3129.1895, 3363.1306, 3784.5354, 1600.7067, 3571.2942]
2025-09-13 12:06:23,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 379.0, 1000.0, 387.0, 675.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 12:06:23,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1251 [DEBUG]: Training session finished
