2025-09-12 19:53:28,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc20-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay
2025-09-12 19:53:28,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc20-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay
2025-09-12 19:53:28,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x146f18544d50>}
2025-09-12 19:53:28,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1111 [DEBUG]: using device: cuda
2025-09-12 19:53:28,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1133 [INFO]: Creating new trainer
2025-09-12 19:53:28,613 baseline-mbpac-noiseperc20-ant:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-09-12 19:53:28,613 baseline-mbpac-noiseperc20-ant:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 19:53:28,622 baseline-mbpac-noiseperc20-ant:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=27, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=8, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-12 19:53:29,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1194 [DEBUG]: Starting training session...
2025-09-12 19:53:29,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 1/100
2025-09-12 20:05:44,538 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:05:44,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:07:07,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -232.87566 ± 331.653
2025-09-12 20:07:07,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-0.33829024, -192.4537, -111.992744, -891.2511, -43.008762, -9.375655, -47.74069, -884.928, -66.032745, -81.63492]
2025-09-12 20:07:07,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [18.0, 172.0, 174.0, 1000.0, 40.0, 13.0, 102.0, 1000.0, 49.0, 69.0]
2025-09-12 20:07:07,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (-232.88) for latency ExtremeSparseL4U32
2025-09-12 20:07:07,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 22 hours, 29 minutes, 36 seconds)
2025-09-12 20:18:39,030 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:18:39,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:18:56,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -29.95539 ± 35.563
2025-09-12 20:18:56,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-11.659223, -13.577385, -25.617558, -10.494231, -36.450966, -4.796458, -25.122772, -37.143337, -3.7049315, -130.987]
2025-09-12 20:18:56,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [39.0, 18.0, 85.0, 13.0, 56.0, 28.0, 53.0, 89.0, 14.0, 186.0]
2025-09-12 20:18:56,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (-29.96) for latency ExtremeSparseL4U32
2025-09-12 20:18:56,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 20 hours, 47 minutes, 19 seconds)
2025-09-12 20:29:21,903 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:29:21,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:30:12,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -61.37914 ± 123.779
2025-09-12 20:30:12,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [25.09012, -19.939234, 23.405167, 2.5136864, -417.96213, -71.337456, -46.44189, -23.529655, -82.489174, -3.1009018]
2025-09-12 20:30:12,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [20.0, 41.0, 80.0, 13.0, 1000.0, 199.0, 122.0, 44.0, 119.0, 26.0]
2025-09-12 20:30:12,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 19 hours, 46 minutes, 58 seconds)
2025-09-12 20:41:02,359 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:41:02,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:42:52,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -108.92377 ± 163.279
2025-09-12 20:42:52,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-9.52354, -347.11026, 15.757627, -2.559556, -323.96182, -17.679972, -29.139248, -60.980843, 72.40572, -386.44574]
2025-09-12 20:42:52,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [48.0, 1000.0, 47.0, 29.0, 1000.0, 86.0, 128.0, 118.0, 160.0, 1000.0]
2025-09-12 20:42:52,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 19 hours, 45 minutes, 18 seconds)
2025-09-12 20:53:49,788 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:53:49,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:55:01,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -79.70665 ± 144.796
2025-09-12 20:55:01,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-356.6048, -368.31372, 8.429974, -3.1951451, -4.79538, 27.969797, -14.935518, -95.62967, -0.68679637, 10.694746]
2025-09-12 20:55:01,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 29.0, 20.0, 24.0, 37.0, 62.0, 126.0, 22.0, 20.0]
2025-09-12 20:55:01,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 19 hours, 29 minutes, 1 second)
2025-09-12 21:06:05,889 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:06:05,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:06:39,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -21.42112 ± 26.474
2025-09-12 21:06:39,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-0.33746988, -23.187284, -36.785908, -13.427136, -4.5961795, -81.589554, -7.664979, -41.77429, -25.1434, 20.295008]
2025-09-12 21:06:39,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [25.0, 180.0, 68.0, 213.0, 110.0, 273.0, 32.0, 94.0, 35.0, 69.0]
2025-09-12 21:06:39,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (-21.42) for latency ExtremeSparseL4U32
2025-09-12 21:06:39,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 18 hours, 39 minutes, 5 seconds)
2025-09-12 21:18:09,742 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:18:09,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:19:07,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -66.80070 ± 121.212
2025-09-12 21:19:07,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-16.729918, -74.10131, -16.821016, -6.507323, -1.9035395, -4.419997, -81.0291, -18.415506, -421.69534, -26.384003]
2025-09-12 21:19:07,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [202.0, 112.0, 47.0, 43.0, 92.0, 41.0, 203.0, 67.0, 1000.0, 70.0]
2025-09-12 21:19:07,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 18 hours, 39 minutes, 20 seconds)
2025-09-12 21:29:35,516 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:29:35,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:30:59,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -110.53746 ± 173.028
2025-09-12 21:30:59,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-519.27826, -63.782417, -7.9814553, -41.867496, -43.14194, 0.010537555, -377.2658, -41.10748, 1.4297899, -12.390189]
2025-09-12 21:30:59,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 188.0, 70.0, 133.0, 122.0, 65.0, 1000.0, 115.0, 36.0, 87.0]
2025-09-12 21:30:59,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 18 hours, 38 minutes, 35 seconds)
2025-09-12 21:42:09,894 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:42:09,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:43:26,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -87.90022 ± 154.433
2025-09-12 21:43:26,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-30.32429, -4.7229958, -22.909266, -0.5749883, -11.699653, -375.42114, -3.9803832, -414.35223, 18.931744, -33.949078]
2025-09-12 21:43:26,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [111.0, 18.0, 154.0, 19.0, 14.0, 1000.0, 22.0, 1000.0, 86.0, 70.0]
2025-09-12 21:43:26,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 18 hours, 22 minutes, 7 seconds)
2025-09-12 21:54:19,984 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:54:19,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:55:49,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -80.30558 ± 144.072
2025-09-12 21:55:49,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-19.390903, -42.707092, -319.93768, -403.39975, -11.178849, 39.969788, -43.618298, 26.479065, -17.146618, -12.125519]
2025-09-12 21:55:49,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [106.0, 61.0, 1000.0, 1000.0, 156.0, 87.0, 250.0, 48.0, 121.0, 47.0]
2025-09-12 21:55:49,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 18 hours, 14 minutes, 22 seconds)
2025-09-12 22:06:58,153 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:06:58,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:07:22,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -9.91746 ± 21.734
2025-09-12 22:07:22,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-7.6085606, -0.29252115, 2.6248853, 1.0310845, 10.546106, -71.13762, -19.66164, -3.2512264, -7.61975, -3.8053677]
2025-09-12 22:07:22,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [59.0, 47.0, 101.0, 115.0, 16.0, 126.0, 82.0, 151.0, 71.0, 18.0]
2025-09-12 22:07:22,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (-9.92) for latency ExtremeSparseL4U32
2025-09-12 22:07:22,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 18 hours, 46 seconds)
2025-09-12 22:18:50,730 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:18:50,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:19:03,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -15.15724 ± 15.375
2025-09-12 22:19:03,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-18.148136, -10.266045, -13.17591, 7.4329586, -36.632504, -46.920876, -17.104136, -11.809738, -4.693164, -0.2548441]
2025-09-12 22:19:03,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 22.0, 30.0, 40.0, 83.0, 47.0, 25.0, 29.0, 78.0, 30.0]
2025-09-12 22:19:03,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 17 hours, 34 minutes, 50 seconds)
2025-09-12 22:29:45,781 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:29:45,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:31:01,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -57.44767 ± 85.286
2025-09-12 22:31:01,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-23.774597, -16.024166, -16.771173, -20.713615, -6.9135094, -22.534025, 2.8907335, -198.06267, -19.526459, -253.0472]
2025-09-12 22:31:01,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [82.0, 58.0, 70.0, 42.0, 57.0, 53.0, 46.0, 1000.0, 98.0, 1000.0]
2025-09-12 22:31:01,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 17 hours, 24 minutes, 24 seconds)
2025-09-12 22:41:46,623 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:41:46,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:42:02,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -2.62436 ± 17.160
2025-09-12 22:42:02,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-32.173317, 34.194984, 11.497927, 0.93728065, -14.311207, -15.971526, -9.955746, 2.0089808, 6.183585, -8.654594]
2025-09-12 22:42:02,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [69.0, 72.0, 109.0, 78.0, 44.0, 26.0, 63.0, 16.0, 13.0, 18.0]
2025-09-12 22:42:02,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (-2.62) for latency ExtremeSparseL4U32
2025-09-12 22:42:02,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 16 hours, 47 minutes, 52 seconds)
2025-09-12 22:53:28,735 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:53:28,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:53:51,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -14.07047 ± 29.472
2025-09-12 22:53:51,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [1.3607694, -5.696874, 1.689681, 19.262074, -7.335123, -14.860358, -17.019562, -7.5100465, -97.25789, -13.337323]
2025-09-12 22:53:51,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [23.0, 26.0, 15.0, 63.0, 140.0, 56.0, 73.0, 42.0, 244.0, 57.0]
2025-09-12 22:53:51,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 16 hours, 26 minutes, 36 seconds)
2025-09-12 23:04:26,902 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:04:26,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:05:11,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -4.74047 ± 21.479
2025-09-12 23:05:11,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-5.9168863, 45.244083, -1.8319023, -7.1217084, -5.9554257, -30.334616, -16.949362, 15.959705, -34.11389, -6.3847194]
2025-09-12 23:05:11,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [113.0, 70.0, 13.0, 37.0, 16.0, 126.0, 57.0, 30.0, 1000.0, 31.0]
2025-09-12 23:05:11,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 16 hours, 11 minutes, 28 seconds)
2025-09-12 23:17:13,867 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:17:13,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:17:38,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 16.73523 ± 48.109
2025-09-12 23:17:38,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [35.413403, -23.34368, 15.780329, 103.84615, 9.845057, 6.3415055, 75.975, -13.689591, 33.818073, -76.63399]
2025-09-12 23:17:38,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [48.0, 93.0, 20.0, 165.0, 24.0, 28.0, 58.0, 25.0, 171.0, 193.0]
2025-09-12 23:17:38,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (16.74) for latency ExtremeSparseL4U32
2025-09-12 23:17:38,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 16 hours, 12 minutes, 26 seconds)
2025-09-12 23:27:42,380 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:27:42,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:28:27,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 21.05066 ± 38.689
2025-09-12 23:28:27,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-10.303224, 114.84958, 52.232227, 46.626137, 21.806797, 13.916285, -18.058485, -4.1759286, -12.342789, 5.9560328]
2025-09-12 23:28:27,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 1000.0, 115.0, 61.0, 74.0, 73.0, 34.0, 52.0, 28.0, 40.0]
2025-09-12 23:28:27,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (21.05) for latency ExtremeSparseL4U32
2025-09-12 23:28:27,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 15 hours, 42 minutes, 7 seconds)
2025-09-12 23:39:30,796 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:39:30,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:40:23,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 13.52799 ± 31.909
2025-09-12 23:40:23,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [4.0833383, -27.26266, -6.2439337, 24.415438, 6.651808, 6.1156397, 23.376478, 94.7139, 26.781446, -17.351547]
2025-09-12 23:40:23,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [16.0, 134.0, 19.0, 59.0, 83.0, 167.0, 44.0, 1000.0, 80.0, 152.0]
2025-09-12 23:40:23,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 15 hours, 45 minutes, 27 seconds)
2025-09-12 23:51:16,358 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:51:16,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:51:30,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 11.62106 ± 22.008
2025-09-12 23:51:30,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-0.2338281, -7.6289043, -2.2498274, 13.671394, -15.718874, 36.552025, 61.269463, 7.2002387, 23.550577, -0.20165087]
2025-09-12 23:51:30,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [24.0, 25.0, 21.0, 31.0, 23.0, 48.0, 143.0, 21.0, 86.0, 58.0]
2025-09-12 23:51:30,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 15 hours, 22 minutes, 33 seconds)
2025-09-13 00:03:21,316 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:03:21,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:03:41,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -0.72372 ± 31.083
2025-09-13 00:03:41,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-83.37627, 6.3311133, -1.3194865, -5.6248116, 8.476169, 35.694035, 13.761939, 29.630981, -14.631055, 3.8202147]
2025-09-13 00:03:41,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [119.0, 59.0, 21.0, 96.0, 62.0, 35.0, 64.0, 38.0, 112.0, 46.0]
2025-09-13 00:03:41,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 15 hours, 24 minutes, 11 seconds)
2025-09-13 00:14:42,850 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:14:42,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:15:57,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 32.22119 ± 55.866
2025-09-13 00:15:57,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [150.51039, -13.580491, 8.018812, 12.04102, 131.12218, 25.722677, 3.3648968, -19.385502, 13.948607, 10.449325]
2025-09-13 00:15:57,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 21.0, 59.0, 64.0, 1000.0, 29.0, 44.0, 72.0, 145.0, 44.0]
2025-09-13 00:15:57,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (32.22) for latency ExtremeSparseL4U32
2025-09-13 00:15:57,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 15 hours, 9 minutes, 39 seconds)
2025-09-13 00:26:11,835 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:26:11,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:26:49,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 9.81833 ± 15.233
2025-09-13 00:26:49,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-0.50437975, 46.274208, 12.672735, 1.0405803, 11.892798, 8.941755, 26.468615, -7.2302227, 1.0420629, -2.4148433]
2025-09-13 00:26:49,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [16.0, 1000.0, 17.0, 15.0, 64.0, 17.0, 45.0, 39.0, 27.0, 17.0]
2025-09-13 00:26:49,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 14 hours, 58 minutes, 47 seconds)
2025-09-13 00:37:46,452 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:37:46,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:38:04,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 14.94983 ± 32.135
2025-09-13 00:38:04,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [45.495335, 15.065319, -0.08957685, -15.595328, -51.095066, 14.78832, 49.11473, 14.256908, 10.7703705, 66.78733]
2025-09-13 00:38:04,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [50.0, 37.0, 14.0, 44.0, 103.0, 40.0, 68.0, 23.0, 30.0, 161.0]
2025-09-13 00:38:04,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 14 hours, 36 minutes, 35 seconds)
2025-09-13 00:49:40,341 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:49:40,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:50:22,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 9.93332 ± 28.633
2025-09-13 00:50:22,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-5.837635, -21.335768, 39.81304, -17.389242, 7.809906, -5.9124312, 80.18068, 15.099794, 7.421233, -0.5163544]
2025-09-13 00:50:22,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [52.0, 45.0, 87.0, 59.0, 40.0, 34.0, 1000.0, 29.0, 20.0, 11.0]
2025-09-13 00:50:22,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 14 hours, 42 minutes, 54 seconds)
2025-09-13 01:01:42,667 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:01:42,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:02:23,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 18.99365 ± 19.605
2025-09-13 01:02:23,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-4.389091, 45.692158, -1.247096, 46.34778, 36.729923, 10.279743, -5.642181, 19.985186, 35.562267, 6.6178217]
2025-09-13 01:02:23,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [12.0, 1000.0, 50.0, 73.0, 40.0, 14.0, 17.0, 32.0, 63.0, 27.0]
2025-09-13 01:02:23,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 14 hours, 28 minutes, 51 seconds)
2025-09-13 01:12:30,558 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:12:30,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:12:42,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -7.14958 ± 14.610
2025-09-13 01:12:42,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-0.8133421, -35.67095, -14.466542, -23.04553, -6.922, 12.570962, 2.9532816, -15.805467, -2.9135172, 12.617325]
2025-09-13 01:12:42,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [29.0, 62.0, 29.0, 83.0, 29.0, 48.0, 32.0, 45.0, 13.0, 41.0]
2025-09-13 01:12:42,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 13 hours, 48 minutes, 42 seconds)
2025-09-13 01:23:42,424 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:23:42,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:24:00,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 7.46125 ± 25.578
2025-09-13 01:24:00,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-33.035038, 18.510786, 6.9959164, 12.979901, 12.105149, 42.336246, -44.75486, 8.324857, 33.58324, 17.566303]
2025-09-13 01:24:00,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [128.0, 45.0, 18.0, 22.0, 48.0, 81.0, 123.0, 82.0, 28.0, 22.0]
2025-09-13 01:24:00,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 13 hours, 43 minutes, 25 seconds)
2025-09-13 01:35:17,207 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:35:17,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:36:39,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 15.59273 ± 34.895
2025-09-13 01:36:39,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-63.76613, 75.45571, -10.608733, 20.5124, 38.05119, 8.553347, 45.796635, 19.985565, 16.293215, 5.654115]
2025-09-13 01:36:39,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [193.0, 1000.0, 78.0, 33.0, 94.0, 110.0, 1000.0, 32.0, 51.0, 65.0]
2025-09-13 01:36:39,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 13 hours, 51 minutes, 54 seconds)
2025-09-13 01:47:28,879 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:47:28,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:48:14,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 20.71431 ± 30.505
2025-09-13 01:48:14,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [6.5277357, 6.855649, 0.13468295, 13.658333, 99.48997, -13.296652, 11.210254, 11.187695, 20.73493, 50.6405]
2025-09-13 01:48:14,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [79.0, 21.0, 48.0, 22.0, 1000.0, 33.0, 25.0, 58.0, 48.0, 149.0]
2025-09-13 01:48:14,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 13 hours, 30 minutes, 6 seconds)
2025-09-13 01:59:31,730 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:59:31,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:00:13,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 14.34163 ± 24.940
2025-09-13 02:00:13,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [2.6099627, 75.35122, 29.01994, -3.155338, 8.130742, 8.131881, 11.095736, 34.722855, -7.617634, -14.873012]
2025-09-13 02:00:13,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 1000.0, 38.0, 21.0, 20.0, 52.0, 31.0, 79.0, 60.0, 53.0]
2025-09-13 02:00:13,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 13 hours, 18 minutes, 3 seconds)
2025-09-13 02:10:56,616 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:10:56,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:11:41,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 28.66671 ± 35.480
2025-09-13 02:11:41,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [82.05399, 1.5020491, 19.020975, 2.8886113, 106.71233, 8.993742, 11.45371, -6.1857147, 41.279747, 18.947622]
2025-09-13 02:11:41,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [183.0, 15.0, 57.0, 15.0, 1000.0, 29.0, 19.0, 16.0, 51.0, 96.0]
2025-09-13 02:11:41,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 13 hours, 22 minutes, 5 seconds)
2025-09-13 02:23:26,575 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:23:26,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:24:09,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 22.79495 ± 28.624
2025-09-13 02:24:09,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-25.547993, 75.308365, 7.2322555, 36.216736, 20.609337, 41.2444, 56.146713, -5.1603856, 19.796045, 2.1039758]
2025-09-13 02:24:09,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [86.0, 1000.0, 26.0, 40.0, 24.0, 54.0, 71.0, 43.0, 42.0, 16.0]
2025-09-13 02:24:09,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 13 hours, 26 minutes)
2025-09-13 02:34:30,545 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:34:30,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:35:14,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 29.88710 ± 37.209
2025-09-13 02:35:14,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [42.94602, -14.169258, 36.27683, 53.847473, -1.3794098, 44.914112, -3.4983034, 21.313383, 118.45256, 0.16758387]
2025-09-13 02:35:14,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [53.0, 20.0, 35.0, 80.0, 53.0, 85.0, 20.0, 58.0, 1000.0, 25.0]
2025-09-13 02:35:14,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 12 hours, 53 minutes, 20 seconds)
2025-09-13 02:47:19,922 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:47:19,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:48:15,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 23.50591 ± 32.546
2025-09-13 02:48:15,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [66.448456, 15.866087, 14.539689, 28.874603, 59.292763, 0.20440309, 14.755051, 74.271286, -33.51643, -5.676822]
2025-09-13 02:48:15,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [99.0, 62.0, 228.0, 31.0, 55.0, 34.0, 53.0, 1000.0, 254.0, 58.0]
2025-09-13 02:48:15,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 13 hours, 19 seconds)
2025-09-13 02:58:13,578 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:58:13,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:58:58,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 21.49328 ± 23.775
2025-09-13 02:58:58,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-2.673034, 72.87172, 13.74646, -4.0767694, 14.515639, 54.329426, 23.02766, 11.517294, 30.366695, 1.307701]
2025-09-13 02:58:58,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [21.0, 82.0, 25.0, 15.0, 16.0, 148.0, 63.0, 26.0, 1000.0, 56.0]
2025-09-13 02:58:58,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 12 hours, 31 minutes, 52 seconds)
2025-09-13 03:10:50,443 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:10:50,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:11:37,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 23.08327 ± 22.849
2025-09-13 03:11:37,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [74.21082, 0.3003944, 45.731686, 23.093817, 21.707073, 37.136135, -4.022523, 11.740546, 20.308598, 0.6261217]
2025-09-13 03:11:37,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 12.0, 114.0, 25.0, 108.0, 74.0, 56.0, 16.0, 45.0, 115.0]
2025-09-13 03:11:37,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 12 hours, 35 minutes, 7 seconds)
2025-09-13 03:21:48,063 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:21:48,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:22:04,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 24.63515 ± 22.744
2025-09-13 03:22:04,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [33.541332, 16.807827, 20.694836, -21.872906, 24.833933, 58.001408, -1.2414091, 56.499092, 31.454342, 27.633087]
2025-09-13 03:22:04,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [42.0, 20.0, 33.0, 24.0, 27.0, 50.0, 73.0, 49.0, 45.0, 200.0]
2025-09-13 03:22:04,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 11 hours, 58 minutes, 11 seconds)
2025-09-13 03:33:26,285 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:33:26,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:33:42,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 13.54050 ± 27.671
2025-09-13 03:33:42,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [75.06154, 40.723267, 13.269095, 7.065771, -1.1009153, 1.9662296, -24.256414, 15.156728, 27.426401, -19.906733]
2025-09-13 03:33:42,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [66.0, 37.0, 63.0, 40.0, 26.0, 18.0, 174.0, 30.0, 45.0, 45.0]
2025-09-13 03:33:42,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 11 hours, 53 minutes, 21 seconds)
2025-09-13 03:44:21,288 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:44:21,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:44:40,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 16.83523 ± 15.809
2025-09-13 03:44:40,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [9.505289, 3.744412, 41.17723, 9.218368, 26.055674, 30.577993, 1.6725483, 41.96434, -3.3026328, 7.739036]
2025-09-13 03:44:40,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [30.0, 16.0, 93.0, 102.0, 47.0, 34.0, 22.0, 56.0, 87.0, 128.0]
2025-09-13 03:44:40,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 11 hours, 16 minutes, 48 seconds)
2025-09-13 03:55:43,398 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:55:43,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:56:04,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 20.87546 ± 18.442
2025-09-13 03:56:04,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [38.871906, 27.34815, 29.997087, 3.3198187, 34.266315, -18.85782, 17.087921, 9.685341, 18.912039, 48.123814]
2025-09-13 03:56:04,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [63.0, 82.0, 34.0, 24.0, 53.0, 300.0, 38.0, 44.0, 24.0, 44.0]
2025-09-13 03:56:04,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 11 hours, 13 minutes, 52 seconds)
2025-09-13 04:07:06,776 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:07:06,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:07:47,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 29.57188 ± 26.947
2025-09-13 04:07:47,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [42.046955, 49.117733, 15.452657, 8.517105, 6.4118967, 92.24405, 8.037157, -1.5995903, 42.638325, 32.852528]
2025-09-13 04:07:47,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [68.0, 73.0, 27.0, 16.0, 19.0, 1000.0, 25.0, 15.0, 31.0, 71.0]
2025-09-13 04:07:47,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 10 hours, 51 minutes, 35 seconds)
2025-09-13 04:19:42,003 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:19:42,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:20:55,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 31.78525 ± 35.220
2025-09-13 04:20:55,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-20.704433, 24.675508, 41.411263, 18.096977, 20.626434, 14.620261, 108.047806, 81.69242, 12.621684, 16.764585]
2025-09-13 04:20:55,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [67.0, 75.0, 39.0, 41.0, 30.0, 54.0, 1000.0, 1000.0, 45.0, 40.0]
2025-09-13 04:20:55,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 11 hours, 10 minutes, 47 seconds)
2025-09-13 04:31:20,712 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:31:20,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:32:05,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 21.27951 ± 25.465
2025-09-13 04:32:05,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [64.31563, 8.280867, 68.04615, 24.208403, -10.766532, 6.6059494, -5.94578, 31.96385, 8.896477, 17.19006]
2025-09-13 04:32:05,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [37.0, 18.0, 1000.0, 127.0, 30.0, 23.0, 101.0, 35.0, 55.0, 27.0]
2025-09-13 04:32:05,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 10 hours, 53 minutes, 53 seconds)
2025-09-13 04:42:46,709 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:42:46,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:43:25,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 19.50717 ± 20.602
2025-09-13 04:43:25,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [8.684758, 20.757826, 5.3327756, 37.38873, 19.574387, 60.26761, -3.1136305, 18.561747, 39.94267, -12.325228]
2025-09-13 04:43:25,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [14.0, 39.0, 33.0, 72.0, 17.0, 1000.0, 24.0, 24.0, 35.0, 22.0]
2025-09-13 04:43:25,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 10 hours, 46 minutes, 22 seconds)
2025-09-13 04:55:01,352 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:55:01,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:55:17,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 26.27712 ± 21.288
2025-09-13 04:55:17,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [48.885757, 29.842583, -7.417076, 47.87244, 19.256567, 12.11518, 66.61103, 19.605211, 4.858566, 21.14095]
2025-09-13 04:55:17,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [91.0, 45.0, 40.0, 44.0, 86.0, 19.0, 89.0, 43.0, 20.0, 69.0]
2025-09-13 04:55:17,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 10 hours, 39 minutes, 37 seconds)
2025-09-13 05:05:57,576 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:05:57,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:07:08,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 42.15426 ± 53.826
2025-09-13 05:07:08,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [7.8864, 106.60555, 42.523758, 142.94316, 0.7275129, 3.8505747, 9.979698, 113.482735, 2.267591, -8.724355]
2025-09-13 05:07:08,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [15.0, 1000.0, 68.0, 1000.0, 23.0, 16.0, 44.0, 175.0, 10.0, 26.0]
2025-09-13 05:07:08,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (42.15) for latency ExtremeSparseL4U32
2025-09-13 05:07:09,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 10 hours, 29 minutes, 10 seconds)
2025-09-13 05:18:52,267 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:18:52,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:19:40,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 22.64885 ± 27.101
2025-09-13 05:19:40,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [27.67236, 2.0610573, 23.752146, 37.089558, 46.328506, 84.52269, 13.497996, -8.957528, -10.01995, 10.541673]
2025-09-13 05:19:40,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [48.0, 28.0, 17.0, 50.0, 86.0, 196.0, 33.0, 1000.0, 43.0, 83.0]
2025-09-13 05:19:40,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 10 hours, 11 minutes, 4 seconds)
2025-09-13 05:29:46,753 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:29:46,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:30:08,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 25.43255 ± 34.066
2025-09-13 05:30:08,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [27.70471, 20.317373, 38.821827, 26.05563, 53.98931, 0.8940177, -29.044384, 6.571974, 4.6983786, 104.31665]
2025-09-13 05:30:08,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [34.0, 25.0, 88.0, 24.0, 150.0, 36.0, 170.0, 58.0, 15.0, 120.0]
2025-09-13 05:30:08,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 9 hours, 52 minutes, 4 seconds)
2025-09-13 05:41:20,285 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:41:20,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:41:31,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 16.12106 ± 20.365
2025-09-13 05:41:31,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [11.125308, 22.194798, 0.10969932, 69.33437, -3.7768476, 7.364312, 14.044322, 5.2760267, 31.87027, 3.6683397]
2025-09-13 05:41:31,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [21.0, 74.0, 20.0, 93.0, 48.0, 19.0, 18.0, 15.0, 32.0, 25.0]
2025-09-13 05:41:31,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 9 hours, 40 minutes, 54 seconds)
2025-09-13 05:52:41,600 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:52:41,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:53:06,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 41.20355 ± 32.996
2025-09-13 05:53:06,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [40.78419, 23.930214, 28.194168, 6.738807, 35.717796, 54.814064, 124.863846, 30.315084, 63.12937, 3.548007]
2025-09-13 05:53:06,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [153.0, 47.0, 38.0, 50.0, 50.0, 124.0, 179.0, 106.0, 69.0, 14.0]
2025-09-13 05:53:06,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 9 hours, 26 minutes, 36 seconds)
2025-09-13 06:04:51,094 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:04:51,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:05:10,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 29.63007 ± 26.578
2025-09-13 06:05:10,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [42.90114, 33.84854, 11.139632, 18.751032, 81.73932, -19.151672, 45.525436, 4.2976413, 49.9887, 27.260897]
2025-09-13 06:05:10,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [86.0, 60.0, 38.0, 26.0, 142.0, 45.0, 97.0, 75.0, 25.0, 60.0]
2025-09-13 06:05:10,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 9 hours, 17 minutes, 4 seconds)
2025-09-13 06:15:16,310 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:15:16,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:16:45,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 76.22379 ± 39.889
2025-09-13 06:16:45,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [87.30334, 59.182564, 37.845814, 45.28038, 110.103615, 107.9538, 80.198654, 162.94034, 42.618813, 28.810638]
2025-09-13 06:16:45,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [180.0, 43.0, 229.0, 1000.0, 91.0, 1000.0, 142.0, 105.0, 44.0, 93.0]
2025-09-13 06:16:45,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (76.22) for latency ExtremeSparseL4U32
2025-09-13 06:16:45,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 8 hours, 56 minutes, 30 seconds)
2025-09-13 06:28:39,492 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:28:39,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:29:52,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 39.20694 ± 54.725
2025-09-13 06:29:52,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [14.182121, -26.17081, 16.17785, 7.735327, 14.233718, 147.6594, 26.89965, 53.98335, 1.4775215, 135.89133]
2025-09-13 06:29:52,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [23.0, 79.0, 61.0, 15.0, 35.0, 1000.0, 61.0, 135.0, 12.0, 1000.0]
2025-09-13 06:29:52,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 9 hours, 9 minutes, 37 seconds)
2025-09-13 06:40:55,179 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:40:55,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:42:37,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 39.45375 ± 35.486
2025-09-13 06:42:37,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [48.041557, 58.113316, 7.1780367, 83.17954, 14.646831, 6.7733717, 9.1462, -4.391039, 99.58487, 72.26484]
2025-09-13 06:42:37,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 79.0, 27.0, 1000.0, 32.0, 19.0, 15.0, 41.0, 166.0, 1000.0]
2025-09-13 06:42:37,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 9 hours, 9 minutes, 54 seconds)
2025-09-13 06:52:56,646 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:52:56,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:53:46,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 51.62349 ± 33.408
2025-09-13 06:53:46,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [46.890293, 55.93392, 24.71977, 12.42971, 40.244358, 101.990105, 115.75113, 29.2504, 17.312792, 71.712456]
2025-09-13 06:53:46,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 31.0, 48.0, 25.0, 53.0, 265.0, 1000.0, 32.0, 29.0, 89.0]
2025-09-13 06:53:46,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 8 hours, 53 minutes, 48 seconds)
2025-09-13 07:04:48,289 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:04:48,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:05:04,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 42.58288 ± 29.060
2025-09-13 07:05:04,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [17.970808, 16.849064, 10.449896, 73.56716, 61.70954, 46.588303, 64.2028, 98.422615, 20.475292, 15.593297]
2025-09-13 07:05:04,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [20.0, 31.0, 18.0, 71.0, 64.0, 44.0, 49.0, 142.0, 41.0, 48.0]
2025-09-13 07:05:04,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 8 hours, 35 minutes, 2 seconds)
2025-09-13 07:16:16,740 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:16:16,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:17:59,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 49.92327 ± 40.872
2025-09-13 07:17:59,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [42.98037, 51.74168, 7.3651605, 14.312897, 75.84037, 24.34254, 86.53944, 11.766727, 37.942013, 146.40157]
2025-09-13 07:17:59,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [64.0, 1000.0, 26.0, 47.0, 1000.0, 50.0, 108.0, 16.0, 43.0, 1000.0]
2025-09-13 07:17:59,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 8 hours, 34 minutes, 23 seconds)
2025-09-13 07:28:35,649 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:28:35,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:28:57,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 50.54972 ± 47.356
2025-09-13 07:28:57,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [70.18352, 2.34478, 166.74806, 27.539238, 80.602715, 57.45117, -7.8598914, 52.912525, 15.928307, 39.64682]
2025-09-13 07:28:57,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [42.0, 40.0, 211.0, 35.0, 114.0, 90.0, 17.0, 113.0, 35.0, 36.0]
2025-09-13 07:28:57,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 4 minutes, 26 seconds)
2025-09-13 07:40:54,534 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:40:54,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:42:04,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 53.09827 ± 44.930
2025-09-13 07:42:04,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [64.76239, 10.4798565, 80.205795, -1.3206918, -10.623362, 89.73643, 49.98674, 39.54533, 147.33954, 60.870655]
2025-09-13 07:42:04,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [64.0, 35.0, 40.0, 43.0, 22.0, 1000.0, 65.0, 41.0, 1000.0, 36.0]
2025-09-13 07:42:04,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 7 hours, 55 minutes, 40 seconds)
2025-09-13 07:52:09,994 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:52:10,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:53:26,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 37.05320 ± 43.387
2025-09-13 07:53:26,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [138.17375, 31.016193, 13.01736, 71.77875, 73.4706, 28.929115, -8.335967, -7.381791, 1.275772, 28.58827]
2025-09-13 07:53:26,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 67.0, 16.0, 116.0, 1000.0, 46.0, 150.0, 45.0, 31.0, 78.0]
2025-09-13 07:53:26,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 7 hours, 45 minutes, 24 seconds)
2025-09-13 08:04:28,388 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:04:28,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:05:13,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 47.32405 ± 40.921
2025-09-13 08:05:13,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [142.26756, 57.150238, 8.707538, 22.835516, 31.756662, 35.415474, 53.472584, -12.117927, 46.58163, 87.17128]
2025-09-13 08:05:13,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 51.0, 27.0, 56.0, 39.0, 24.0, 87.0, 19.0, 61.0, 102.0]
2025-09-13 08:05:13,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 7 hours, 37 minutes, 11 seconds)
2025-09-13 08:16:51,880 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:16:51,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:17:05,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 35.60076 ± 18.208
2025-09-13 08:17:05,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [21.512823, 22.590834, 38.86263, 55.833157, 62.130245, 38.425053, 7.3762727, 55.266438, 11.435849, 42.57433]
2025-09-13 08:17:05,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [41.0, 25.0, 25.0, 49.0, 61.0, 81.0, 39.0, 30.0, 21.0, 74.0]
2025-09-13 08:17:05,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 7 hours, 17 minutes, 22 seconds)
2025-09-13 08:28:19,278 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:28:19,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:28:35,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 16.15254 ± 21.165
2025-09-13 08:28:35,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [17.715588, 28.716513, 11.78884, 62.425365, 3.2542622, 6.804439, 11.284503, 32.110863, 11.316377, -23.891356]
2025-09-13 08:28:35,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 60.0, 104.0, 67.0, 37.0, 51.0, 21.0, 30.0, 16.0, 105.0]
2025-09-13 08:28:35,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 9 minutes, 22 seconds)
2025-09-13 08:38:53,164 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:38:53,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:39:16,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 45.62598 ± 35.347
2025-09-13 08:39:16,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [19.795279, 45.81388, 115.839386, 41.815197, 102.39625, -0.25863534, 17.506092, 57.145973, 25.19831, 31.008099]
2025-09-13 08:39:16,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [41.0, 133.0, 140.0, 173.0, 67.0, 23.0, 20.0, 63.0, 52.0, 66.0]
2025-09-13 08:39:16,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 6 hours, 40 minutes, 20 seconds)
2025-09-13 08:50:18,066 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:50:18,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:50:49,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 67.19706 ± 49.755
2025-09-13 08:50:49,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [23.091509, 66.04318, 148.51712, 4.986901, 44.90005, 132.45082, 127.16232, 48.18278, 69.50695, 7.128911]
2025-09-13 08:50:49,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [156.0, 82.0, 148.0, 78.0, 113.0, 196.0, 120.0, 44.0, 77.0, 37.0]
2025-09-13 08:50:49,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 6 hours, 30 minutes, 11 seconds)
2025-09-13 09:01:52,827 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:01:52,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:03:03,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 41.30565 ± 39.778
2025-09-13 09:03:03,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [45.987194, 35.360043, 20.106695, 29.027563, 0.8518031, 4.116685, 15.443669, 29.724104, 108.505844, 123.93294]
2025-09-13 09:03:03,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [46.0, 36.0, 26.0, 90.0, 10.0, 19.0, 22.0, 31.0, 1000.0, 1000.0]
2025-09-13 09:03:03,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 6 hours, 21 minutes, 39 seconds)
2025-09-13 09:14:03,956 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:14:03,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:14:51,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 84.58040 ± 97.402
2025-09-13 09:14:51,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [23.291464, 32.552906, 127.87137, 45.936474, 357.7243, 61.23234, 31.069387, 72.046165, 2.0494447, 92.03008]
2025-09-13 09:14:51,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [50.0, 39.0, 109.0, 308.0, 630.0, 51.0, 73.0, 90.0, 19.0, 193.0]
2025-09-13 09:14:51,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (84.58) for latency ExtremeSparseL4U32
2025-09-13 09:14:51,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 6 hours, 9 minutes, 39 seconds)
2025-09-13 09:26:22,593 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:26:22,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:27:18,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 57.48304 ± 58.986
2025-09-13 09:27:18,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [81.90114, 85.45578, 14.737031, 47.736885, 7.4692135, 160.6263, -8.161726, 17.940502, 8.997189, 158.1281]
2025-09-13 09:27:18,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [53.0, 1000.0, 78.0, 93.0, 24.0, 149.0, 14.0, 67.0, 160.0, 157.0]
2025-09-13 09:27:18,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 3 minutes, 59 seconds)
2025-09-13 09:38:20,849 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:38:20,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:39:17,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 74.16309 ± 66.560
2025-09-13 09:39:17,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [70.16164, 53.111454, 64.712204, 98.66377, 19.67504, 33.208717, 11.220245, 257.82983, 47.869236, 85.17876]
2025-09-13 09:39:17,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [91.0, 113.0, 74.0, 1000.0, 56.0, 81.0, 17.0, 230.0, 36.0, 142.0]
2025-09-13 09:39:17,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 8 seconds)
2025-09-13 09:49:50,257 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:49:50,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:50:48,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 64.65646 ± 45.798
2025-09-13 09:50:48,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [9.956206, 68.81913, 59.72899, 3.3337073, 72.03342, 16.35657, 82.987305, 53.578915, 137.11913, 142.6512]
2025-09-13 09:50:48,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [156.0, 122.0, 1000.0, 140.0, 71.0, 28.0, 87.0, 62.0, 130.0, 150.0]
2025-09-13 09:50:48,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 5 hours, 47 minutes, 56 seconds)
2025-09-13 10:02:25,712 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:02:25,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:04:14,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 59.01455 ± 61.967
2025-09-13 10:04:14,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [113.46188, -2.873479, 63.457146, 14.601875, -0.24182282, 19.01363, 204.4051, 23.669294, 50.465626, 104.18623]
2025-09-13 10:04:14,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 17.0, 72.0, 18.0, 40.0, 31.0, 1000.0, 97.0, 1000.0, 256.0]
2025-09-13 10:04:14,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 5 hours, 42 minutes, 41 seconds)
2025-09-13 10:15:37,058 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:15:37,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:17:00,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 74.89397 ± 74.937
2025-09-13 10:17:00,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-22.76699, 10.244279, 76.68135, 52.860073, 60.78398, 276.34744, 67.01853, 72.866295, 54.134655, 100.77006]
2025-09-13 10:17:00,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [102.0, 16.0, 1000.0, 50.0, 39.0, 263.0, 67.0, 74.0, 127.0, 1000.0]
2025-09-13 10:17:00,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 35 minutes, 37 seconds)
2025-09-13 10:27:54,338 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:27:54,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:28:52,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 65.04866 ± 39.203
2025-09-13 10:28:52,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [99.25399, 74.60156, 56.45777, 56.86605, 89.52763, 8.520524, 55.15345, 16.308926, 150.14207, 43.65464]
2025-09-13 10:28:52,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 129.0, 102.0, 94.0, 162.0, 25.0, 91.0, 54.0, 224.0, 67.0]
2025-09-13 10:28:52,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 20 minutes, 10 seconds)
2025-09-13 10:39:05,598 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:39:05,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:39:53,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 55.28192 ± 47.616
2025-09-13 10:39:53,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [44.37924, 33.637478, 0.3155607, 14.9390545, 125.042274, 70.13329, 157.0601, 55.749363, 36.967335, 14.595537]
2025-09-13 10:39:53,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [77.0, 141.0, 18.0, 36.0, 118.0, 92.0, 1000.0, 66.0, 39.0, 28.0]
2025-09-13 10:39:53,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 3 minutes)
2025-09-13 10:50:48,714 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:50:48,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:51:16,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 69.81223 ± 64.950
2025-09-13 10:51:16,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [12.660129, 18.407618, 69.94966, 246.03773, 15.529725, 62.619747, 67.12101, 106.84579, 52.29727, 46.65362]
2025-09-13 10:51:16,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [37.0, 36.0, 73.0, 241.0, 32.0, 108.0, 160.0, 104.0, 56.0, 64.0]
2025-09-13 10:51:16,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 4 hours, 50 minutes, 10 seconds)
2025-09-13 11:02:15,430 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:02:15,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:02:44,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 70.87914 ± 38.567
2025-09-13 11:02:44,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [80.26123, 19.476162, 56.254612, 56.091885, 26.181658, 62.3667, 142.57257, 135.72058, 54.313885, 75.55215]
2025-09-13 11:02:44,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [96.0, 25.0, 58.0, 86.0, 31.0, 82.0, 137.0, 277.0, 108.0, 50.0]
2025-09-13 11:02:44,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 29 minutes, 5 seconds)
2025-09-13 11:14:02,447 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:14:02,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:14:20,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 60.49667 ± 32.107
2025-09-13 11:14:20,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [46.7799, 77.74637, 121.75248, 35.856117, 42.29793, 68.74274, 2.1874447, 54.593586, 54.582596, 100.427635]
2025-09-13 11:14:20,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [23.0, 91.0, 125.0, 39.0, 37.0, 77.0, 14.0, 35.0, 75.0, 76.0]
2025-09-13 11:14:20,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 12 minutes, 15 seconds)
2025-09-13 11:25:39,122 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:25:39,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:26:26,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 52.77781 ± 43.038
2025-09-13 11:26:26,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [28.280853, 59.283684, 41.568737, 81.46112, 102.109116, 20.164387, 38.881523, -3.7710357, 14.429734, 145.36998]
2025-09-13 11:26:26,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [19.0, 56.0, 51.0, 71.0, 1000.0, 67.0, 50.0, 39.0, 45.0, 138.0]
2025-09-13 11:26:26,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 1 minute, 45 seconds)
2025-09-13 11:36:58,975 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:36:58,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:37:47,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 62.44241 ± 48.140
2025-09-13 11:37:47,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [62.333004, -12.271625, 30.114029, 158.17284, 94.34619, 58.94225, 102.54724, 92.24964, 12.658844, 25.331741]
2025-09-13 11:37:47,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [51.0, 18.0, 44.0, 124.0, 141.0, 48.0, 1000.0, 71.0, 17.0, 56.0]
2025-09-13 11:37:47,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 3 hours, 51 minutes, 34 seconds)
2025-09-13 11:48:58,170 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:48:58,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:49:11,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 37.25235 ± 36.581
2025-09-13 11:49:11,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [48.59831, 38.98908, 28.667019, 26.65073, 7.1712837, 35.65362, 137.10349, 36.716732, -5.615714, 18.588959]
2025-09-13 11:49:11,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [32.0, 34.0, 28.0, 53.0, 14.0, 29.0, 135.0, 50.0, 15.0, 26.0]
2025-09-13 11:49:11,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 3 hours, 40 minutes, 4 seconds)
2025-09-13 12:01:05,104 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:01:05,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:01:30,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 58.82218 ± 75.769
2025-09-13 12:01:30,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [51.785145, 51.886032, 270.53308, 2.2966764, -0.80666167, 9.479562, 81.258156, 6.538211, 63.30784, 51.943787]
2025-09-13 12:01:30,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [79.0, 75.0, 264.0, 26.0, 35.0, 16.0, 107.0, 64.0, 149.0, 39.0]
2025-09-13 12:01:30,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 31 minutes, 34 seconds)
2025-09-13 12:12:01,061 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:12:01,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:12:46,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 42.47062 ± 33.389
2025-09-13 12:12:46,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [25.217852, 78.40425, 24.931765, 42.24171, 53.650715, 119.824135, -3.84367, 21.125132, 43.928394, 19.225931]
2025-09-13 12:12:46,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [18.0, 110.0, 32.0, 70.0, 39.0, 1000.0, 38.0, 47.0, 49.0, 68.0]
2025-09-13 12:12:46,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 18 minutes, 41 seconds)
2025-09-13 12:24:10,747 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:24:10,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:25:25,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 50.48364 ± 38.368
2025-09-13 12:25:25,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [8.185679, 14.848521, 34.116417, 27.25411, 130.54173, 83.12373, 20.820639, 98.60476, 36.898643, 50.442135]
2025-09-13 12:25:25,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [23.0, 51.0, 1000.0, 25.0, 163.0, 63.0, 25.0, 75.0, 1000.0, 46.0]
2025-09-13 12:25:25,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 8 minutes, 46 seconds)
2025-09-13 12:36:22,121 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:36:22,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:37:46,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 59.50747 ± 52.156
2025-09-13 12:37:46,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [64.9777, 18.239534, 43.778297, 159.12859, 35.87414, 14.107811, 155.81143, 43.72967, 55.975567, 3.4519176]
2025-09-13 12:37:46,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 158.0, 53.0, 1000.0, 63.0, 25.0, 120.0, 109.0, 167.0, 23.0]
2025-09-13 12:37:46,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 2 hours, 59 minutes, 55 seconds)
2025-09-13 12:48:03,232 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:48:03,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:48:52,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 50.66995 ± 49.460
2025-09-13 12:48:52,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [165.51292, 93.345634, 94.426285, 28.147507, 50.169415, 9.358704, 22.153357, 25.854908, -5.2775908, 23.008394]
2025-09-13 12:48:52,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [205.0, 65.0, 1000.0, 28.0, 89.0, 32.0, 46.0, 84.0, 10.0, 39.0]
2025-09-13 12:48:52,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 47 minutes, 7 seconds)
2025-09-13 12:59:56,322 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:59:56,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:00:31,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 83.07515 ± 58.432
2025-09-13 13:00:31,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [206.84555, 124.090034, 21.64113, 61.517467, 101.5946, 62.24646, 20.723103, 58.887142, 24.84359, 148.36243]
2025-09-13 13:00:31,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [341.0, 216.0, 34.0, 47.0, 144.0, 49.0, 30.0, 135.0, 37.0, 109.0]
2025-09-13 13:00:31,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 33 minutes, 25 seconds)
2025-09-13 13:11:32,142 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:11:32,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:12:47,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 82.76912 ± 71.286
2025-09-13 13:12:47,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [10.75229, 52.433605, 235.6842, 19.045227, 68.56122, 59.346085, 21.884941, 44.54945, 165.25584, 150.1783]
2025-09-13 13:12:47,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [15.0, 117.0, 1000.0, 22.0, 70.0, 40.0, 15.0, 35.0, 197.0, 1000.0]
2025-09-13 13:12:47,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 24 minutes, 2 seconds)
2025-09-13 13:23:51,668 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:23:51,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:24:09,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 43.66350 ± 18.525
2025-09-13 13:24:09,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [61.92265, 31.014238, 62.033623, 40.130215, 11.334151, 68.51305, 22.79718, 60.8339, 30.30012, 47.755882]
2025-09-13 13:24:09,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [186.0, 21.0, 49.0, 40.0, 27.0, 50.0, 27.0, 73.0, 34.0, 84.0]
2025-09-13 13:24:09,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 9 minutes, 12 seconds)
2025-09-13 13:35:06,970 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:35:06,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:35:31,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 69.76777 ± 84.158
2025-09-13 13:35:31,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [95.33618, 60.396255, 3.9932423, 0.12519431, 80.11274, 13.942421, 152.17189, 31.566204, 275.23105, -15.197597]
2025-09-13 13:35:31,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [164.0, 83.0, 36.0, 9.0, 75.0, 27.0, 144.0, 52.0, 191.0, 44.0]
2025-09-13 13:35:31,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 1 hour, 55 minutes, 30 seconds)
2025-09-13 13:47:35,372 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:47:35,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:48:12,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 89.16062 ± 78.783
2025-09-13 13:48:12,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [255.10077, 42.625996, 57.701073, 74.8629, 18.147545, 18.425632, 185.37367, 103.300644, -5.37118, 141.43916]
2025-09-13 13:48:12,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [268.0, 61.0, 133.0, 117.0, 17.0, 47.0, 194.0, 117.0, 14.0, 251.0]
2025-09-13 13:48:12,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (89.16) for latency ExtremeSparseL4U32
2025-09-13 13:48:12,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 46 minutes, 49 seconds)
2025-09-13 13:58:17,381 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:58:17,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:00:10,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 83.96467 ± 57.415
2025-09-13 14:00:10,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [10.391321, -1.6519715, 109.11983, 65.58634, 75.59193, 148.0392, 136.8566, 56.014526, 186.63748, 53.061443]
2025-09-13 14:00:10,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [70.0, 14.0, 207.0, 65.0, 134.0, 116.0, 1000.0, 1000.0, 1000.0, 62.0]
2025-09-13 14:00:10,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 35 minutes, 26 seconds)
2025-09-13 14:11:15,065 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:11:15,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:12:40,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 138.69473 ± 94.207
2025-09-13 14:12:40,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [278.08905, 291.16245, 28.072903, 186.39867, 66.87407, 102.08756, 47.11719, 109.16869, 227.8807, 50.096027]
2025-09-13 14:12:40,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 45.0, 103.0, 109.0, 129.0, 56.0, 74.0, 205.0, 36.0]
2025-09-13 14:12:40,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (138.69) for latency ExtremeSparseL4U32
2025-09-13 14:12:40,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 23 minutes, 50 seconds)
2025-09-13 14:23:39,654 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:23:39,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:24:02,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 62.56618 ± 54.091
2025-09-13 14:24:02,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [62.936634, 23.508766, 80.083496, 13.483434, 10.429564, 17.152346, 77.877975, 119.8793, 188.21817, 32.09214]
2025-09-13 14:24:02,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [62.0, 28.0, 176.0, 20.0, 38.0, 26.0, 78.0, 117.0, 138.0, 53.0]
2025-09-13 14:24:02,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 11 minutes, 51 seconds)
2025-09-13 14:36:04,551 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:36:04,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:37:31,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 50.48820 ± 56.423
2025-09-13 14:37:31,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [42.75102, 129.28227, 13.075556, -49.967815, 17.552664, -8.060707, 130.84334, 102.67605, 58.332577, 68.397095]
2025-09-13 14:37:31,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [61.0, 314.0, 50.0, 114.0, 47.0, 67.0, 1000.0, 1000.0, 111.0, 56.0]
2025-09-13 14:37:31,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 1 minute, 59 seconds)
2025-09-13 14:48:20,535 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:48:20,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:49:25,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 99.37060 ± 98.359
2025-09-13 14:49:25,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [5.4391, 4.550866, 140.20763, 61.266624, 162.88535, 348.51547, 61.863262, 27.239105, 52.470623, 129.26794]
2025-09-13 14:49:25,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [62.0, 34.0, 1000.0, 87.0, 258.0, 372.0, 66.0, 61.0, 79.0, 140.0]
2025-09-13 14:49:25,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 48 minutes, 57 seconds)
2025-09-13 14:59:45,548 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:59:45,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:01:00,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 77.67265 ± 87.070
2025-09-13 15:01:00,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [47.278625, 96.24782, -2.624893, 12.460143, 129.74255, 62.45583, 80.67488, 17.04211, 22.930288, 310.51904]
2025-09-13 15:01:00,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [81.0, 117.0, 33.0, 31.0, 1000.0, 42.0, 78.0, 24.0, 49.0, 1000.0]
2025-09-13 15:01:00,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 36 minutes, 30 seconds)
2025-09-13 15:12:00,770 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:12:00,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:12:55,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 74.51064 ± 66.721
2025-09-13 15:12:55,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [21.276314, 149.52597, 5.643564, -13.282535, 77.46971, 36.699978, 137.16768, 15.86101, 134.24538, 180.49934]
2025-09-13 15:12:55,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 277.0, 64.0, 40.0, 67.0, 22.0, 1000.0, 15.0, 164.0, 134.0]
2025-09-13 15:12:55,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 24 minutes, 5 seconds)
2025-09-13 15:24:28,341 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:24:28,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:25:31,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 112.26482 ± 85.556
2025-09-13 15:25:31,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [280.36792, 179.25732, 56.461273, 154.6612, 12.388872, 158.97435, -8.853269, 121.621025, 143.11188, 24.657661]
2025-09-13 15:25:31,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [256.0, 143.0, 40.0, 181.0, 59.0, 146.0, 44.0, 1000.0, 159.0, 34.0]
2025-09-13 15:25:31,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 12 minutes, 17 seconds)
2025-09-13 15:36:34,058 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:36:34,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:37:54,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 94.11903 ± 110.830
2025-09-13 15:37:54,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [9.485967, 57.08055, 118.50533, 105.71241, 407.07718, 35.45026, 44.607903, 43.177353, 10.979199, 109.11424]
2025-09-13 15:37:54,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 58.0, 1000.0, 84.0, 314.0, 1000.0, 39.0, 29.0, 16.0, 76.0]
2025-09-13 15:37:54,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1251 [DEBUG]: Training session finished
