2025-09-12 19:53:39,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc25-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay
2025-09-12 19:53:39,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc25-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay
2025-09-12 19:53:39,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14c2ec09c410>}
2025-09-12 19:53:39,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1111 [DEBUG]: using device: cuda
2025-09-12 19:53:39,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1133 [INFO]: Creating new trainer
2025-09-12 19:53:39,456 baseline-mbpac-noiseperc25-ant:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-09-12 19:53:39,456 baseline-mbpac-noiseperc25-ant:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 19:53:39,466 baseline-mbpac-noiseperc25-ant:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=27, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=8, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-12 19:53:41,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1194 [DEBUG]: Starting training session...
2025-09-12 19:53:41,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 1/100
2025-09-12 20:06:01,014 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:06:01,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:06:27,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -78.36535 ± 66.565
2025-09-12 20:06:27,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-70.270775, -35.865845, -70.613495, -146.43965, -16.73406, -46.037636, -188.95146, -184.96951, -27.166067, 3.395037]
2025-09-12 20:06:27,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [81.0, 31.0, 67.0, 130.0, 56.0, 29.0, 172.0, 173.0, 100.0, 43.0]
2025-09-12 20:06:27,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (-78.37) for latency ExtremeSparseL4U32
2025-09-12 20:06:27,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 21 hours, 4 minutes, 34 seconds)
2025-09-12 20:17:15,568 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:17:15,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:18:33,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -162.89693 ± 250.269
2025-09-12 20:18:33,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-41.994915, -112.33733, -8.306592, -1.2168223, -8.312316, -667.7643, -5.33065, -89.84028, -44.905926, -648.96014]
2025-09-12 20:18:33,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [29.0, 83.0, 103.0, 29.0, 16.0, 1000.0, 30.0, 123.0, 104.0, 1000.0]
2025-09-12 20:18:33,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 20 hours, 18 minutes, 51 seconds)
2025-09-12 20:29:15,394 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:29:15,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:29:32,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -38.55223 ± 35.465
2025-09-12 20:29:32,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-35.879784, -52.590996, -94.753845, -20.668589, -20.074558, -14.101029, -28.644226, -11.203392, 3.836888, -111.44271]
2025-09-12 20:29:32,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 91.0, 108.0, 33.0, 28.0, 67.0, 23.0, 43.0, 25.0, 101.0]
2025-09-12 20:29:32,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (-38.55) for latency ExtremeSparseL4U32
2025-09-12 20:29:32,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 19 hours, 19 minutes, 33 seconds)
2025-09-12 20:40:40,672 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:40:40,673 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:41:24,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -99.17451 ± 196.187
2025-09-12 20:41:24,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [4.6395116, -14.8126745, -60.988033, -32.02522, -1.7777854, -72.94646, -39.820534, -682.4977, -19.65069, -71.86558]
2025-09-12 20:41:24,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [38.0, 51.0, 75.0, 32.0, 36.0, 98.0, 36.0, 1000.0, 36.0, 71.0]
2025-09-12 20:41:24,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 19 hours, 5 minutes, 30 seconds)
2025-09-12 20:52:23,986 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:52:23,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:52:36,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -19.55091 ± 22.389
2025-09-12 20:52:36,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-34.4225, -78.12326, -18.117764, -6.291915, -4.8032813, 3.0442924, -24.406775, -13.2758255, -19.358265, 0.24614781]
2025-09-12 20:52:36,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [71.0, 126.0, 18.0, 23.0, 12.0, 33.0, 50.0, 27.0, 28.0, 14.0]
2025-09-12 20:52:36,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (-19.55) for latency ExtremeSparseL4U32
2025-09-12 20:52:36,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 18 hours, 39 minutes, 22 seconds)
2025-09-12 21:04:04,186 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:04:04,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:04:20,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -44.39095 ± 71.673
2025-09-12 21:04:20,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-239.06967, -5.3851175, -12.14232, -3.0670404, -3.397944, -92.763985, 6.0470347, -17.64769, -7.8590837, -68.62374]
2025-09-12 21:04:20,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [142.0, 16.0, 14.0, 13.0, 28.0, 107.0, 19.0, 25.0, 18.0, 144.0]
2025-09-12 21:04:20,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 18 hours, 8 minutes, 1 second)
2025-09-12 21:15:24,072 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:15:24,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:16:32,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -159.80936 ± 278.682
2025-09-12 21:16:32,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-16.517668, -11.351251, -719.62555, -7.9851317, -712.13165, -13.634804, -2.5449483, -4.65218, -41.720528, -67.929756]
2025-09-12 21:16:32,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [14.0, 23.0, 1000.0, 38.0, 1000.0, 41.0, 34.0, 12.0, 58.0, 56.0]
2025-09-12 21:16:32,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 17 hours, 58 minutes, 35 seconds)
2025-09-12 21:27:45,304 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:27:45,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:28:52,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -149.98335 ± 278.490
2025-09-12 21:28:52,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-702.7942, -10.901481, -4.3250294, -9.790387, -710.2168, -12.884578, -6.2342634, 4.533509, -41.126278, -6.093992]
2025-09-12 21:28:52,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 34.0, 15.0, 14.0, 1000.0, 54.0, 15.0, 43.0, 49.0, 15.0]
2025-09-12 21:28:52,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 18 hours, 11 minutes, 33 seconds)
2025-09-12 21:38:52,540 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:38:52,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:40:10,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -198.17149 ± 295.708
2025-09-12 21:40:10,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [0.27586606, -830.2939, -24.335798, -29.766169, -53.991024, -167.77296, -25.08089, -82.287476, -35.981804, -732.48083]
2025-09-12 21:40:10,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [23.0, 1000.0, 57.0, 44.0, 60.0, 158.0, 35.0, 183.0, 36.0, 1000.0]
2025-09-12 21:40:10,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 17 hours, 49 minutes, 18 seconds)
2025-09-12 21:51:04,300 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:51:04,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:51:49,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -102.81038 ± 217.232
2025-09-12 21:51:49,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-25.409225, -36.62711, -15.000879, 5.6262445, -4.3971047, -80.06957, -99.39301, -21.636232, -747.36804, -3.82887]
2025-09-12 21:51:49,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [38.0, 93.0, 55.0, 29.0, 16.0, 57.0, 155.0, 52.0, 1000.0, 24.0]
2025-09-12 21:51:49,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 17 hours, 46 minutes, 2 seconds)
2025-09-12 22:03:06,299 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:03:06,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:03:14,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -16.91803 ± 23.139
2025-09-12 22:03:14,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-0.46014953, -81.36908, -20.822094, -6.3268294, -27.025585, -11.1650505, 4.6312423, -11.200602, -8.198507, -7.243615]
2025-09-12 22:03:14,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [15.0, 91.0, 12.0, 28.0, 45.0, 14.0, 20.0, 15.0, 16.0, 12.0]
2025-09-12 22:03:14,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (-16.92) for latency ExtremeSparseL4U32
2025-09-12 22:03:14,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 17 hours, 28 minutes, 32 seconds)
2025-09-12 22:13:50,559 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:13:50,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:14:28,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -67.86816 ± 163.590
2025-09-12 22:14:28,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-7.8162713, -40.129696, -9.064985, -17.761204, -12.603125, -557.6569, -11.583548, -12.580963, 3.3086154, -12.793494]
2025-09-12 22:14:28,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 64.0, 12.0, 21.0, 53.0, 1000.0, 12.0, 11.0, 24.0, 14.0]
2025-09-12 22:14:28,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 16 hours, 59 minutes, 30 seconds)
2025-09-12 22:25:26,112 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:25:26,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:25:32,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -10.01762 ± 8.779
2025-09-12 22:25:32,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-6.5527716, -3.4569151, -5.7011294, 3.4726446, -30.00737, -9.258106, -13.739432, -10.266144, -5.6133423, -19.053644]
2025-09-12 22:25:32,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [20.0, 17.0, 14.0, 14.0, 44.0, 24.0, 12.0, 13.0, 14.0, 33.0]
2025-09-12 22:25:32,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (-10.02) for latency ExtremeSparseL4U32
2025-09-12 22:25:32,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 16 hours, 25 minutes, 56 seconds)
2025-09-12 22:36:20,566 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:36:20,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:36:27,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -11.44477 ± 11.454
2025-09-12 22:36:27,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [1.6512321, -0.5081243, -10.421458, -4.1069736, -36.49493, -8.773901, -18.248888, -10.266615, -25.343786, -1.9342006]
2025-09-12 22:36:27,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [51.0, 12.0, 11.0, 11.0, 35.0, 14.0, 46.0, 24.0, 30.0, 16.0]
2025-09-12 22:36:27,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 16 hours, 8 minutes, 18 seconds)
2025-09-12 22:47:21,592 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:47:21,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:47:55,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -59.41507 ± 157.411
2025-09-12 22:47:55,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-1.6177909, -8.122673, -4.5646014, -531.5365, -7.446996, -8.464225, -2.1996095, -5.9707217, -11.039546, -13.18806]
2025-09-12 22:47:55,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [12.0, 11.0, 11.0, 1000.0, 11.0, 12.0, 18.0, 51.0, 11.0, 13.0]
2025-09-12 22:47:55,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 15 hours, 53 minutes, 49 seconds)
2025-09-12 22:58:49,073 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:58:49,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:58:55,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -0.19391 ± 9.071
2025-09-12 22:58:55,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-2.9042277, -4.635284, -2.0972884, -4.5000815, -2.6237762, -3.2366505, -8.189452, -1.4324967, 1.6128063, 26.067327]
2025-09-12 22:58:55,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [13.0, 12.0, 14.0, 11.0, 41.0, 12.0, 12.0, 14.0, 46.0, 43.0]
2025-09-12 22:58:55,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (-0.19) for latency ExtremeSparseL4U32
2025-09-12 22:58:55,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 15 hours, 35 minutes, 33 seconds)
2025-09-12 23:09:49,425 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:09:49,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:09:56,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -8.32036 ± 10.705
2025-09-12 23:09:56,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-23.133219, -12.502681, 0.5893398, -8.860759, -27.681484, 7.334171, -1.4006552, -15.622601, -1.1145947, -0.81112945]
2025-09-12 23:09:56,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [63.0, 13.0, 16.0, 11.0, 38.0, 17.0, 44.0, 19.0, 12.0, 12.0]
2025-09-12 23:09:56,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 15 hours, 20 minutes, 48 seconds)
2025-09-12 23:20:46,117 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:20:46,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:21:22,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -57.48064 ± 133.581
2025-09-12 23:21:22,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-4.14052, -47.000393, -3.5407476, -0.19192211, -10.85127, -26.937157, -453.58087, 11.2247095, 9.622888, -49.411148]
2025-09-12 23:21:22,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [12.0, 35.0, 12.0, 13.0, 12.0, 29.0, 1000.0, 16.0, 55.0, 40.0]
2025-09-12 23:21:22,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 15 hours, 15 minutes, 53 seconds)
2025-09-12 23:33:09,721 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:33:09,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:33:14,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -1.72351 ± 3.627
2025-09-12 23:33:14,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [1.284623, 2.3512857, 0.27890602, -7.099093, -2.30487, -3.810098, -4.20262, 4.992308, -5.6120906, -3.113415]
2025-09-12 23:33:14,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [12.0, 11.0, 11.0, 14.0, 14.0, 13.0, 12.0, 38.0, 11.0, 12.0]
2025-09-12 23:33:14,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 15 hours, 19 minutes, 40 seconds)
2025-09-12 23:44:01,943 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:44:01,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:44:09,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -2.56080 ± 18.534
2025-09-12 23:44:09,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-5.9296703, -11.839088, -14.452534, -8.207675, -10.513909, -2.5721686, 52.265633, -6.8916607, -8.57364, -8.893285]
2025-09-12 23:44:09,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [22.0, 39.0, 45.0, 11.0, 13.0, 12.0, 61.0, 14.0, 12.0, 12.0]
2025-09-12 23:44:09,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 14 hours, 59 minutes, 33 seconds)
2025-09-12 23:54:10,786 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:54:10,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:54:19,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -9.99737 ± 13.408
2025-09-12 23:54:19,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-45.258102, -2.174966, -1.1033837, -4.461134, -16.656729, -16.375662, 4.3945384, -5.6279616, -1.5769979, -11.133305]
2025-09-12 23:54:19,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [101.0, 11.0, 14.0, 22.0, 48.0, 30.0, 16.0, 13.0, 12.0, 14.0]
2025-09-12 23:54:19,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 14 hours, 35 minutes, 12 seconds)
2025-09-13 00:05:44,968 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:05:44,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:06:20,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -39.19769 ± 93.414
2025-09-13 00:06:20,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-12.553395, -12.587762, 2.828436, -317.7364, 3.4601562, 2.7374272, -6.723515, -2.1487648, -30.274569, -18.978502]
2025-09-13 00:06:20,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [12.0, 12.0, 13.0, 1000.0, 11.0, 11.0, 11.0, 18.0, 87.0, 28.0]
2025-09-13 00:06:20,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 14 hours, 39 minutes, 45 seconds)
2025-09-13 00:16:31,885 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:16:31,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:16:40,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -10.14593 ± 13.039
2025-09-13 00:16:40,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-10.801661, -3.1209118, 6.3967786, -43.668037, -3.4781144, -12.536224, -1.8736588, -4.277159, -19.944702, -8.155586]
2025-09-13 00:16:40,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [15.0, 11.0, 31.0, 57.0, 55.0, 19.0, 11.0, 64.0, 25.0, 12.0]
2025-09-13 00:16:40,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 14 hours, 11 minutes, 35 seconds)
2025-09-13 00:27:30,304 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:27:30,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:27:35,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -10.07699 ± 10.403
2025-09-13 00:27:35,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [9.154094, -0.53225857, -7.915613, -3.9639845, -12.126228, -15.35111, -13.306068, -16.379482, -8.067917, -32.28136]
2025-09-13 00:27:35,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [13.0, 11.0, 12.0, 16.0, 11.0, 23.0, 12.0, 21.0, 11.0, 49.0]
2025-09-13 00:27:35,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 13 hours, 46 minutes, 16 seconds)
2025-09-13 00:38:25,456 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:38:25,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:38:33,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -5.80902 ± 11.316
2025-09-13 00:38:33,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [2.879984, 11.177608, -12.209243, -6.3624063, 11.97057, -26.86394, -8.76079, -11.850451, -4.0458703, -14.025661]
2025-09-13 00:38:33,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [16.0, 79.0, 12.0, 11.0, 26.0, 57.0, 11.0, 11.0, 11.0, 28.0]
2025-09-13 00:38:33,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 13 hours, 36 minutes, 2 seconds)
2025-09-13 00:49:51,131 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:49:51,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:49:55,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -4.65935 ± 6.641
2025-09-13 00:49:55,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-15.763145, -4.2697654, 1.9837499, -0.7009751, -11.046024, 7.335622, 0.05810209, -7.626674, -11.09579, -5.4686046]
2025-09-13 00:49:55,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [11.0, 26.0, 15.0, 12.0, 11.0, 27.0, 13.0, 11.0, 12.0, 11.0]
2025-09-13 00:49:55,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 13 hours, 42 minutes, 58 seconds)
2025-09-13 01:01:03,635 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:01:03,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:01:07,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -8.74456 ± 7.275
2025-09-13 01:01:07,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-11.113576, -14.192861, -23.422983, -13.6876, -8.822274, -3.3265257, -4.452345, -6.3216066, 5.353191, -7.458971]
2025-09-13 01:01:07,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [15.0, 12.0, 16.0, 11.0, 12.0, 12.0, 14.0, 11.0, 12.0, 13.0]
2025-09-13 01:01:07,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 13 hours, 19 minutes, 49 seconds)
2025-09-13 01:11:10,132 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:11:10,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:11:17,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -11.99310 ± 14.093
2025-09-13 01:11:17,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-10.822751, -15.313305, -3.6255605, -4.4383583, 6.567726, -34.23631, -41.633633, -5.959015, -5.9187293, -4.551036]
2025-09-13 01:11:17,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [11.0, 28.0, 13.0, 11.0, 33.0, 68.0, 39.0, 12.0, 12.0, 11.0]
2025-09-13 01:11:17,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 13 hours, 6 minutes, 22 seconds)
2025-09-13 01:22:05,252 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:22:05,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:22:11,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -5.56941 ± 10.280
2025-09-13 01:22:11,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [4.799463, -11.391551, -10.457106, 0.31752393, -29.505215, -1.26142, -2.198473, -13.479702, 0.73886275, 6.7434864]
2025-09-13 01:22:11,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [12.0, 12.0, 27.0, 13.0, 24.0, 12.0, 74.0, 12.0, 11.0, 12.0]
2025-09-13 01:22:11,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 12 hours, 55 minutes, 17 seconds)
2025-09-13 01:33:00,211 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:33:00,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:33:04,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -3.21455 ± 7.351
2025-09-13 01:33:04,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [1.5972815, 1.2967236, -17.002346, 3.5108771, -3.7456715, -11.569077, -9.930438, -1.8806249, 8.812042, -3.2343016]
2025-09-13 01:33:04,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [13.0, 12.0, 11.0, 12.0, 12.0, 11.0, 12.0, 13.0, 35.0, 19.0]
2025-09-13 01:33:04,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 12 hours, 43 minutes, 17 seconds)
2025-09-13 01:43:56,246 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:43:56,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:44:04,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -8.26209 ± 20.003
2025-09-13 01:44:04,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [22.645304, -14.396372, -1.4567643, 0.83306646, -2.716671, -4.6150074, -0.9581666, -11.733548, -9.341045, -60.881676]
2025-09-13 01:44:04,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [27.0, 32.0, 13.0, 12.0, 12.0, 13.0, 11.0, 13.0, 45.0, 102.0]
2025-09-13 01:44:04,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 12 hours, 27 minutes, 15 seconds)
2025-09-13 01:54:50,777 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:54:50,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:54:56,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -2.93434 ± 11.915
2025-09-13 01:54:56,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [13.180944, -7.8411894, -31.521036, 2.787465, 0.8493856, -2.9251, 10.538295, -6.530162, -10.075141, 2.1931658]
2025-09-13 01:54:56,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [16.0, 13.0, 52.0, 17.0, 11.0, 12.0, 13.0, 39.0, 13.0, 11.0]
2025-09-13 01:54:56,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 12 hours, 11 minutes, 57 seconds)
2025-09-13 02:05:45,642 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:05:45,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:05:52,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -3.02035 ± 12.111
2025-09-13 02:05:52,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [15.986747, -6.351794, -25.826132, 12.386269, -5.4847465, -8.769449, -16.673878, 6.9508967, 0.44655684, -2.8679278]
2025-09-13 02:05:52,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [20.0, 27.0, 59.0, 20.0, 12.0, 11.0, 30.0, 31.0, 11.0, 12.0]
2025-09-13 02:05:52,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 12 hours, 11 minutes, 26 seconds)
2025-09-13 02:16:42,381 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:16:42,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:17:18,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -27.38379 ± 71.865
2025-09-13 02:17:18,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-241.92897, -4.1771717, -4.199871, 2.395301, -3.7785006, -1.8308383, 10.834432, -17.203176, -11.774882, -2.174183]
2025-09-13 02:17:18,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 12.0, 12.0, 14.0, 12.0, 12.0, 44.0, 81.0, 11.0, 12.0]
2025-09-13 02:17:18,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 12 hours, 7 minutes, 33 seconds)
2025-09-13 02:28:36,589 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:28:36,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:28:43,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 0.87290 ± 13.579
2025-09-13 02:28:43,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-9.014858, -2.6943555, -8.756033, 1.6694009, 1.500583, -13.720737, -5.05218, -1.2009892, 8.714998, 37.28313]
2025-09-13 02:28:43,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 14.0, 13.0, 12.0, 14.0, 13.0, 11.0, 11.0, 23.0, 63.0]
2025-09-13 02:28:43,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (0.87) for latency ExtremeSparseL4U32
2025-09-13 02:28:43,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 12 hours, 3 minutes, 19 seconds)
2025-09-13 02:39:08,436 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:39:08,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:39:13,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -5.61592 ± 8.570
2025-09-13 02:39:13,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-18.582968, -8.332734, 4.9003863, 8.633941, -6.119688, -16.452904, -3.180216, -9.581988, 3.3055303, -10.748565]
2025-09-13 02:39:13,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [38.0, 12.0, 13.0, 17.0, 19.0, 24.0, 12.0, 12.0, 12.0, 14.0]
2025-09-13 02:39:13,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 11 hours, 45 minutes, 55 seconds)
2025-09-13 02:50:06,848 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:50:06,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:50:10,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -5.57777 ± 7.086
2025-09-13 02:50:10,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [1.9240445, -18.888948, 0.52352864, -14.780835, 1.7912376, 1.2393192, -4.8978157, -9.188788, -10.606804, -2.8926826]
2025-09-13 02:50:10,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [13.0, 13.0, 12.0, 15.0, 13.0, 12.0, 13.0, 12.0, 11.0, 22.0]
2025-09-13 02:50:10,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 11 hours, 35 minutes, 59 seconds)
2025-09-13 03:01:33,158 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:01:33,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:01:37,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -5.33992 ± 14.724
2025-09-13 03:01:37,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-46.099014, 1.8273622, -1.4687699, 7.682419, 8.190253, 2.8413846, -9.18271, -5.7540436, -7.3574266, -4.078643]
2025-09-13 03:01:37,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [38.0, 13.0, 12.0, 15.0, 14.0, 12.0, 12.0, 11.0, 15.0, 13.0]
2025-09-13 03:01:37,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 11 hours, 31 minutes, 20 seconds)
2025-09-13 03:12:00,825 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:12:00,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:12:04,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -4.49373 ± 5.629
2025-09-13 03:12:04,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-8.745058, -6.2380366, -11.97929, -8.753125, -0.49526277, -3.617261, -2.7296915, 5.838638, 2.7695422, -10.987804]
2025-09-13 03:12:04,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [12.0, 14.0, 13.0, 13.0, 14.0, 24.0, 14.0, 12.0, 11.0, 12.0]
2025-09-13 03:12:04,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 11 hours, 8 minutes, 12 seconds)
2025-09-13 03:23:58,961 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:23:58,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:24:06,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -3.05204 ± 19.223
2025-09-13 03:24:06,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [15.230961, -12.557098, 1.1072283, 18.347795, -2.654296, -10.603994, 9.708365, -51.901745, -7.2392178, 10.041578]
2025-09-13 03:24:06,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [34.0, 11.0, 23.0, 34.0, 11.0, 22.0, 18.0, 48.0, 15.0, 21.0]
2025-09-13 03:24:06,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 11 hours, 4 minutes, 36 seconds)
2025-09-13 03:33:58,145 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:33:58,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:34:03,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -1.05996 ± 7.662
2025-09-13 03:34:03,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-16.444908, -3.2796507, -12.274535, -4.6661935, 5.907355, 3.8228734, 0.8457064, 3.4039896, 3.8210616, 8.264657]
2025-09-13 03:34:03,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [15.0, 12.0, 54.0, 13.0, 14.0, 12.0, 13.0, 12.0, 21.0, 20.0]
2025-09-13 03:34:03,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 10 hours, 47 minutes, 3 seconds)
2025-09-13 03:44:53,425 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:44:53,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:44:58,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 6.32709 ± 7.875
2025-09-13 03:44:58,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [2.075531, 5.2612433, -7.142167, -1.2957174, 6.35314, 14.159122, 19.040033, 3.5910995, 17.476694, 3.7518933]
2025-09-13 03:44:58,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [12.0, 24.0, 15.0, 31.0, 11.0, 18.0, 24.0, 13.0, 16.0, 12.0]
2025-09-13 03:44:58,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (6.33) for latency ExtremeSparseL4U32
2025-09-13 03:44:58,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 10 hours, 35 minutes, 38 seconds)
2025-09-13 03:55:49,137 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:55:49,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:55:58,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 0.10062 ± 13.740
2025-09-13 03:55:58,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-16.466442, 19.016846, -2.4148543, -27.25023, -9.379231, 8.050692, -1.1433269, 13.98594, 5.2051477, 11.401673]
2025-09-13 03:55:58,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [49.0, 28.0, 19.0, 79.0, 36.0, 11.0, 13.0, 38.0, 12.0, 13.0]
2025-09-13 03:55:58,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 10 hours, 19 minutes, 27 seconds)
2025-09-13 04:07:30,766 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:07:30,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:07:36,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 3.16719 ± 9.594
2025-09-13 04:07:36,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-9.25272, -2.4292216, 1.6601131, 7.700609, 0.5027553, -10.413743, 9.92928, -1.3848798, 13.628659, 21.731085]
2025-09-13 04:07:36,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [13.0, 12.0, 12.0, 31.0, 13.0, 12.0, 19.0, 13.0, 30.0, 26.0]
2025-09-13 04:07:36,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 10 hours, 21 minutes, 49 seconds)
2025-09-13 04:17:46,798 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:17:46,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:17:52,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 2.38327 ± 7.741
2025-09-13 04:17:52,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [3.2157845, 15.548913, -1.7534652, 12.4340925, -5.034941, -7.429371, 4.5556855, -3.1331618, -5.0417304, 10.47094]
2025-09-13 04:17:52,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [24.0, 23.0, 12.0, 28.0, 12.0, 15.0, 11.0, 12.0, 19.0, 30.0]
2025-09-13 04:17:52,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 9 hours, 51 minutes, 29 seconds)
2025-09-13 04:29:14,079 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:29:14,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:30:18,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -62.26381 ± 113.004
2025-09-13 04:30:18,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-226.63303, -19.363096, 0.0823406, -1.5901945, -41.136482, -6.8987684, -1.9123918, -335.95334, -0.0938795, 10.8607855]
2025-09-13 04:30:18,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 20.0, 12.0, 22.0, 35.0, 13.0, 17.0, 1000.0, 13.0, 24.0]
2025-09-13 04:30:18,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 10 hours, 7 minutes, 23 seconds)
2025-09-13 04:40:36,494 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:40:36,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:40:46,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 0.61522 ± 24.221
2025-09-13 04:40:46,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-5.7523856, 10.793719, -4.7471743, 6.4123864, -0.58475184, 46.264767, -43.61592, 12.514322, 18.347141, -33.47993]
2025-09-13 04:40:46,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [14.0, 32.0, 11.0, 14.0, 42.0, 30.0, 51.0, 39.0, 48.0, 39.0]
2025-09-13 04:40:46,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 9 hours, 51 minutes, 24 seconds)
2025-09-13 04:51:59,453 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:51:59,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:52:07,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 1.11411 ± 9.084
2025-09-13 04:52:07,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [0.059865322, 1.7098101, -0.1208571, 11.28972, -17.180769, 4.6133523, 17.721498, -8.643735, 1.3519622, 0.34020597]
2025-09-13 04:52:07,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [16.0, 13.0, 13.0, 40.0, 34.0, 14.0, 40.0, 13.0, 16.0, 57.0]
2025-09-13 04:52:07,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 9 hours, 43 minutes, 57 seconds)
2025-09-13 05:02:43,942 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:02:43,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:02:50,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 2.12715 ± 6.985
2025-09-13 05:02:50,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [4.654853, 6.216328, 13.149714, -0.6459994, -1.1985972, 13.232509, 2.2260582, -9.474101, -4.9580684, -1.9312357]
2025-09-13 05:02:50,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [22.0, 36.0, 38.0, 11.0, 12.0, 19.0, 35.0, 19.0, 13.0, 24.0]
2025-09-13 05:02:50,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 9 hours, 23 minutes, 29 seconds)
2025-09-13 05:13:40,939 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:13:40,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:13:50,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 2.65926 ± 14.720
2025-09-13 05:13:50,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [26.26868, 25.041212, 10.396875, -14.146808, 4.028526, -4.7413774, -13.681767, -17.681223, 7.982744, 3.1257598]
2025-09-13 05:13:50,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [31.0, 36.0, 26.0, 75.0, 13.0, 16.0, 40.0, 27.0, 41.0, 18.0]
2025-09-13 05:13:50,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 9 hours, 19 minutes, 40 seconds)
2025-09-13 05:24:40,324 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:24:40,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:24:45,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -4.19908 ± 5.628
2025-09-13 05:24:45,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [3.4384196, -8.751127, -6.653192, -0.7655671, -8.253926, -15.726176, -2.3001347, 2.96086, -5.4883294, -0.4516365]
2025-09-13 05:24:45,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [12.0, 28.0, 13.0, 24.0, 22.0, 15.0, 11.0, 23.0, 15.0, 11.0]
2025-09-13 05:24:45,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 8 hours, 53 minutes, 39 seconds)
2025-09-13 05:35:40,780 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:35:40,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:35:45,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 0.60635 ± 6.926
2025-09-13 05:35:45,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [3.0222666, -8.252339, 9.384592, 8.020103, -4.7792864, -10.9672575, 9.879222, 2.8019621, -0.9845009, -2.061221]
2025-09-13 05:35:45,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [25.0, 12.0, 26.0, 14.0, 12.0, 17.0, 13.0, 12.0, 25.0, 12.0]
2025-09-13 05:35:45,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 8 hours, 47 minutes, 54 seconds)
2025-09-13 05:46:32,582 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:46:32,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:46:39,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 6.91788 ± 9.778
2025-09-13 05:46:39,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [12.299717, 16.137205, 1.7738367, 27.113062, 2.2998178, -3.309024, 15.095107, 3.2441876, -5.483138, 0.008020156]
2025-09-13 05:46:39,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [58.0, 14.0, 22.0, 43.0, 12.0, 30.0, 15.0, 17.0, 15.0, 12.0]
2025-09-13 05:46:39,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (6.92) for latency ExtremeSparseL4U32
2025-09-13 05:46:39,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 8 hours, 32 minutes, 42 seconds)
2025-09-13 05:58:00,486 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:58:00,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:58:07,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 1.96038 ± 12.986
2025-09-13 05:58:07,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-0.58057815, -20.424915, 7.0634713, 19.477655, 9.575167, 3.4507544, -12.769245, -10.02394, 22.56453, 1.2708707]
2025-09-13 05:58:07,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [50.0, 26.0, 13.0, 34.0, 25.0, 12.0, 12.0, 28.0, 23.0, 12.0]
2025-09-13 05:58:07,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 8 hours, 28 minutes, 33 seconds)
2025-09-13 06:08:31,440 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:08:31,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:08:41,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -3.26632 ± 33.785
2025-09-13 06:08:41,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-12.519279, -88.85192, 8.554663, 54.69787, 2.1841245, -3.1670532, 10.915989, 1.2736825, 8.3560505, -14.107331]
2025-09-13 06:08:41,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 106.0, 14.0, 42.0, 14.0, 30.0, 12.0, 14.0, 14.0, 60.0]
2025-09-13 06:08:41,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 8 hours, 13 minutes, 37 seconds)
2025-09-13 06:20:26,819 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:20:26,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:20:36,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -5.41191 ± 15.797
2025-09-13 06:20:36,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-7.0363355, -0.80757225, 27.441952, -7.311722, -0.66801965, -14.911577, -1.2581558, -34.98296, 6.4406414, -21.025387]
2025-09-13 06:20:36,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [38.0, 27.0, 71.0, 13.0, 14.0, 44.0, 15.0, 65.0, 14.0, 26.0]
2025-09-13 06:20:36,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 8 hours, 11 minutes, 28 seconds)
2025-09-13 06:31:21,108 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:31:21,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:31:26,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -4.26803 ± 10.523
2025-09-13 06:31:26,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-14.453124, -22.008942, -12.917526, 0.4928148, -10.805895, 9.42994, 0.05065652, -5.575937, -0.73758, 13.845301]
2025-09-13 06:31:26,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [17.0, 15.0, 22.0, 12.0, 20.0, 23.0, 12.0, 19.0, 13.0, 26.0]
2025-09-13 06:31:26,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 7 hours, 58 minutes, 50 seconds)
2025-09-13 06:41:43,378 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:41:43,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:41:49,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 6.78525 ± 9.017
2025-09-13 06:41:49,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-1.6617038, 19.05389, 24.405874, 2.159707, 6.83386, -1.7287039, 14.2280655, -4.119919, 4.8341866, 3.8472319]
2025-09-13 06:41:49,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [27.0, 23.0, 38.0, 22.0, 21.0, 12.0, 14.0, 12.0, 14.0, 13.0]
2025-09-13 06:41:49,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 7 hours, 43 minutes, 19 seconds)
2025-09-13 06:52:27,883 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:52:27,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:52:41,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 4.41150 ± 28.435
2025-09-13 06:52:41,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-21.970198, -6.5596867, -27.93469, 6.890565, 7.9152255, -13.90722, 80.890114, 3.868099, 8.202353, 6.7204204]
2025-09-13 06:52:41,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [86.0, 41.0, 89.0, 32.0, 20.0, 50.0, 86.0, 12.0, 13.0, 12.0]
2025-09-13 06:52:41,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 7 hours, 27 minutes, 23 seconds)
2025-09-13 07:03:33,112 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:03:33,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:03:41,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 3.84199 ± 14.486
2025-09-13 07:03:41,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [37.610565, 1.3538129, 12.77734, 0.7914774, 0.1613716, -13.530943, -8.585848, 3.519658, 16.077564, -11.755139]
2025-09-13 07:03:41,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 13.0, 14.0, 13.0, 22.0, 30.0, 17.0, 13.0, 24.0, 99.0]
2025-09-13 07:03:41,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 7 hours, 20 minutes, 2 seconds)
2025-09-13 07:15:21,216 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:15:21,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:15:27,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 5.75174 ± 8.893
2025-09-13 07:15:27,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [10.959595, 20.941597, 9.636717, 13.443865, -13.004248, -2.8556361, 0.5296887, 4.056941, 6.9716964, 6.8371577]
2025-09-13 07:15:27,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [14.0, 37.0, 15.0, 37.0, 16.0, 23.0, 13.0, 13.0, 14.0, 13.0]
2025-09-13 07:15:27,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 7 hours, 7 minutes, 47 seconds)
2025-09-13 07:25:37,452 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:25:37,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:25:44,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 1.48538 ± 16.147
2025-09-13 07:25:44,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-0.025184155, 13.834159, -2.3445342, 19.366053, -6.2452683, 19.35318, -11.181626, 4.043327, -36.088326, 14.142063]
2025-09-13 07:25:44,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [12.0, 17.0, 14.0, 18.0, 30.0, 40.0, 12.0, 16.0, 21.0, 42.0]
2025-09-13 07:25:44,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 6 hours, 52 minutes, 38 seconds)
2025-09-13 07:36:35,114 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:36:35,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:36:45,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 0.70558 ± 18.035
2025-09-13 07:36:45,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-7.2895417, 2.208181, 6.9235706, -34.093872, -23.101633, 4.1785607, 7.1731753, 35.504528, 6.170625, 9.382199]
2025-09-13 07:36:45,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [13.0, 50.0, 42.0, 95.0, 51.0, 19.0, 27.0, 27.0, 15.0, 13.0]
2025-09-13 07:36:45,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 6 hours, 46 minutes, 34 seconds)
2025-09-13 07:47:33,025 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:47:33,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:47:42,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 19.17502 ± 14.173
2025-09-13 07:47:42,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [16.15888, 4.3203373, 19.932516, 31.118378, 32.03662, 20.521612, -6.0724993, 42.40458, 4.1121545, 27.217606]
2025-09-13 07:47:42,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [16.0, 12.0, 20.0, 28.0, 122.0, 19.0, 16.0, 43.0, 15.0, 23.0]
2025-09-13 07:47:42,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (19.18) for latency ExtremeSparseL4U32
2025-09-13 07:47:42,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 6 hours, 36 minutes, 8 seconds)
2025-09-13 07:58:32,977 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:58:32,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:58:45,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 12.11948 ± 16.291
2025-09-13 07:58:45,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [26.297064, 44.25859, -3.3812127, -3.4878013, 10.306319, 1.504084, 24.620775, 24.85544, -9.280511, 5.5020866]
2025-09-13 07:58:45,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [49.0, 123.0, 32.0, 14.0, 17.0, 13.0, 66.0, 25.0, 50.0, 28.0]
2025-09-13 07:58:45,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 6 hours, 25 minutes, 25 seconds)
2025-09-13 08:09:35,441 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:09:35,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:09:40,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 5.65616 ± 9.201
2025-09-13 08:09:40,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [15.936007, 3.3612123, -3.849284, -1.2125291, 6.365196, -6.7895875, 17.76484, 21.291988, -1.516304, 5.210083]
2025-09-13 08:09:40,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [16.0, 12.0, 14.0, 17.0, 14.0, 14.0, 19.0, 28.0, 26.0, 15.0]
2025-09-13 08:09:40,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 6 hours, 8 minutes, 44 seconds)
2025-09-13 08:20:34,206 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:20:34,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:20:43,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 6.66683 ± 22.701
2025-09-13 08:20:43,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-9.935577, -17.882345, -1.0958453, 4.586483, 1.3678854, 60.360565, -13.224738, 31.1612, 17.946928, -6.6162133]
2025-09-13 08:20:43,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [18.0, 27.0, 12.0, 25.0, 12.0, 88.0, 15.0, 30.0, 26.0, 43.0]
2025-09-13 08:20:43,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 6 hours, 2 minutes, 52 seconds)
2025-09-13 08:31:32,927 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:31:32,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:31:44,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 9.44512 ± 18.059
2025-09-13 08:31:44,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [9.712784, -13.493117, 3.9447632, 36.499912, -8.217911, -1.2263378, 47.9559, 8.069222, 1.5725913, 9.63336]
2025-09-13 08:31:44,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 24.0, 14.0, 62.0, 13.0, 20.0, 66.0, 127.0, 27.0, 13.0]
2025-09-13 08:31:44,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 5 hours, 51 minutes, 54 seconds)
2025-09-13 08:42:34,453 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:42:34,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:42:44,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 3.49206 ± 24.371
2025-09-13 08:42:44,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-11.663436, 9.467403, 14.327779, 15.0869665, 48.765957, -0.917904, -50.028095, 19.437683, -11.762554, 2.2068315]
2025-09-13 08:42:44,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 62.0, 14.0, 31.0, 38.0, 85.0, 26.0, 20.0, 16.0]
2025-09-13 08:42:44,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 5 hours, 41 minutes, 10 seconds)
2025-09-13 08:53:38,255 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:53:38,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:53:46,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -5.15782 ± 19.922
2025-09-13 08:53:46,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-23.489586, 3.3006868, 4.555466, -58.009003, 5.7791667, 8.132485, 6.9052753, -7.805655, 0.20335919, 8.849598]
2025-09-13 08:53:46,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [49.0, 13.0, 14.0, 59.0, 17.0, 12.0, 54.0, 29.0, 22.0, 14.0]
2025-09-13 08:53:46,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 5 hours, 30 minutes, 7 seconds)
2025-09-13 09:05:22,375 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:05:22,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:05:28,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 8.09341 ± 12.788
2025-09-13 09:05:28,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-5.005834, -3.6105304, 6.5063863, 27.92224, 2.8977146, 0.10468684, -5.5791526, 29.476534, 21.97455, 6.2474933]
2025-09-13 09:05:28,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [27.0, 38.0, 18.0, 31.0, 13.0, 14.0, 16.0, 28.0, 16.0, 13.0]
2025-09-13 09:05:28,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 5 hours, 23 minutes, 38 seconds)
2025-09-13 09:15:39,223 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:15:39,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:15:47,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 12.80811 ± 13.091
2025-09-13 09:15:47,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [25.294529, -1.2999337, 1.5943967, -4.562397, 13.61268, 23.893364, -4.6073246, 17.54749, 27.330536, 29.277777]
2025-09-13 09:15:47,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 17.0, 14.0, 15.0, 37.0, 31.0, 13.0, 46.0, 39.0, 46.0]
2025-09-13 09:15:47,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 5 hours, 8 minutes, 26 seconds)
2025-09-13 09:26:35,668 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:26:35,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:26:47,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 12.14682 ± 22.723
2025-09-13 09:26:47,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [19.037254, -40.7555, 3.4046533, -4.3765073, 21.271395, 42.270252, 41.779, 9.352217, 8.513189, 20.97229]
2025-09-13 09:26:47,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [24.0, 134.0, 13.0, 30.0, 24.0, 38.0, 53.0, 14.0, 30.0, 45.0]
2025-09-13 09:26:47,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 4 hours, 57 minutes, 15 seconds)
2025-09-13 09:37:39,819 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:37:39,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:37:48,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 7.00557 ± 21.263
2025-09-13 09:37:48,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-2.1544905, 26.042332, 7.8974075, 1.7533618, -24.021187, -12.2065325, 42.993828, -18.212408, 13.799213, 34.16419]
2025-09-13 09:37:48,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [13.0, 30.0, 76.0, 29.0, 24.0, 11.0, 36.0, 35.0, 18.0, 25.0]
2025-09-13 09:37:48,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 4 hours, 46 minutes, 24 seconds)
2025-09-13 09:48:38,519 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:48:38,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:48:49,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 10.29122 ± 16.193
2025-09-13 09:48:49,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-8.6595955, 8.147134, 7.643358, 20.278511, 6.613977, -4.0204782, 30.009321, 45.747093, -3.9503222, 1.1031942]
2025-09-13 09:48:49,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 12.0, 13.0, 52.0, 14.0, 26.0, 28.0, 101.0, 19.0, 58.0]
2025-09-13 09:48:49,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 4 hours, 35 minutes, 12 seconds)
2025-09-13 09:59:39,339 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:59:39,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:59:47,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 6.65521 ± 16.171
2025-09-13 09:59:47,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [43.30817, -0.2509263, 9.468056, -11.676161, -5.482315, 8.823503, -14.107925, 4.805573, 23.74121, 7.922902]
2025-09-13 09:59:47,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 13.0, 27.0, 43.0, 38.0, 14.0, 32.0, 15.0, 30.0, 14.0]
2025-09-13 09:59:47,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 4 hours, 20 minutes, 41 seconds)
2025-09-13 10:10:38,432 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:10:38,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:10:46,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 11.30292 ± 19.411
2025-09-13 10:10:46,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [4.3523197, -8.678817, 16.813835, 26.576082, 21.197155, 49.07679, -2.379496, -24.060644, 20.795694, 9.336323]
2025-09-13 10:10:46,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [14.0, 13.0, 27.0, 36.0, 23.0, 42.0, 19.0, 47.0, 22.0, 15.0]
2025-09-13 10:10:46,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 12 minutes, 52 seconds)
2025-09-13 10:21:40,163 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:21:40,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:22:16,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -17.79314 ± 90.867
2025-09-13 10:22:16,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [5.9057727, 2.8394158, -10.325495, -3.7447104, -282.66675, -2.6386514, -0.7961342, 32.710323, 66.16635, 14.618427]
2025-09-13 10:22:16,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 13.0, 20.0, 12.0, 1000.0, 12.0, 12.0, 41.0, 43.0, 16.0]
2025-09-13 10:22:16,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 4 minutes, 5 seconds)
2025-09-13 10:33:09,054 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:33:09,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:33:20,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 28.22737 ± 44.688
2025-09-13 10:33:20,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-9.193048, 46.62962, 5.9873624, 20.481556, -8.658383, 5.1253047, 153.01785, 16.644026, 31.907095, 20.332317]
2025-09-13 10:33:20,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [13.0, 46.0, 22.0, 32.0, 31.0, 13.0, 158.0, 18.0, 26.0, 25.0]
2025-09-13 10:33:20,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (28.23) for latency ExtremeSparseL4U32
2025-09-13 10:33:20,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 3 hours, 53 minutes, 12 seconds)
2025-09-13 10:44:12,470 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:44:12,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:44:19,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 6.96310 ± 11.313
2025-09-13 10:44:19,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [29.737103, -0.13021997, -0.5813957, 1.238988, 4.213611, 20.716885, 15.905713, -11.235777, 6.4170675, 3.3490436]
2025-09-13 10:44:19,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 13.0, 49.0, 12.0, 18.0, 26.0, 17.0, 31.0, 26.0, 18.0]
2025-09-13 10:44:19,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 3 hours, 42 minutes, 1 second)
2025-09-13 10:55:17,824 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:55:17,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:55:29,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 25.83382 ± 34.714
2025-09-13 10:55:29,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-10.638391, 30.326035, 25.871677, 51.12598, 39.962734, -7.311201, 10.055985, 2.6678002, 3.8723607, 112.40524]
2025-09-13 10:55:29,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [59.0, 37.0, 36.0, 30.0, 61.0, 13.0, 29.0, 35.0, 13.0, 64.0]
2025-09-13 10:55:29,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 3 hours, 31 minutes, 38 seconds)
2025-09-13 11:06:21,772 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:06:21,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:06:33,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 21.33786 ± 24.483
2025-09-13 11:06:33,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-4.393715, 61.173874, 52.607105, 18.57242, 36.269268, 15.436713, -13.675274, 17.965006, 38.970142, -9.546892]
2025-09-13 11:06:33,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [13.0, 49.0, 41.0, 27.0, 56.0, 95.0, 24.0, 18.0, 35.0, 15.0]
2025-09-13 11:06:33,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 20 minutes, 49 seconds)
2025-09-13 11:17:19,275 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:17:19,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:17:30,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 20.21299 ± 42.990
2025-09-13 11:17:30,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-43.077526, 18.49171, -0.77565044, -5.3236556, -3.111909, 101.493774, -4.2891383, 90.991325, 6.6462016, 41.08477]
2025-09-13 11:17:30,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [25.0, 39.0, 14.0, 22.0, 16.0, 94.0, 12.0, 100.0, 27.0, 35.0]
2025-09-13 11:17:30,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 7 minutes, 49 seconds)
2025-09-13 11:28:23,827 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:28:23,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:28:31,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 15.30331 ± 13.527
2025-09-13 11:28:31,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [21.761747, -3.6307492, 11.734011, 15.456062, 34.82314, 38.110126, 10.272056, -3.7347088, 6.214062, 22.02731]
2025-09-13 11:28:31,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [38.0, 13.0, 21.0, 44.0, 29.0, 38.0, 15.0, 24.0, 14.0, 30.0]
2025-09-13 11:28:31,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 2 hours, 56 minutes, 36 seconds)
2025-09-13 11:40:05,281 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:40:05,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:40:16,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 20.63215 ± 25.094
2025-09-13 11:40:16,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [71.96106, -0.38006523, -21.438356, 7.9843016, 40.480762, 37.105103, 9.034362, 38.54755, 12.955618, 10.071222]
2025-09-13 11:40:16,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [35.0, 14.0, 21.0, 38.0, 51.0, 51.0, 41.0, 27.0, 86.0, 21.0]
2025-09-13 11:40:16,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 2 hours, 47 minutes, 52 seconds)
2025-09-13 11:51:34,321 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:51:34,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:51:44,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 21.93110 ± 25.548
2025-09-13 11:51:44,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [4.253365, 35.4984, 16.909555, 43.563793, 13.086563, -0.5198575, 18.865091, 12.239917, -8.951894, 84.36606]
2025-09-13 11:51:44,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [14.0, 35.0, 26.0, 31.0, 78.0, 14.0, 32.0, 25.0, 13.0, 67.0]
2025-09-13 11:51:44,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 37 minutes, 30 seconds)
2025-09-13 12:01:32,112 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:01:32,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:02:12,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -2.27022 ± 80.912
2025-09-13 12:02:12,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [21.696144, -237.92712, 20.11589, 23.544035, 6.63975, 47.93257, 6.275755, 3.83992, 69.577774, 15.60309]
2025-09-13 12:02:12,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [41.0, 1000.0, 45.0, 26.0, 27.0, 49.0, 27.0, 16.0, 70.0, 43.0]
2025-09-13 12:02:12,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 24 minutes, 42 seconds)
2025-09-13 12:13:05,439 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:13:05,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:13:17,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 29.26202 ± 32.363
2025-09-13 12:13:17,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-4.3668027, 17.581898, 1.1045674, 79.0996, 34.956738, -8.166779, 54.632076, 16.29937, 87.90188, 13.577643]
2025-09-13 12:13:17,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [52.0, 36.0, 16.0, 64.0, 42.0, 41.0, 34.0, 15.0, 72.0, 29.0]
2025-09-13 12:13:17,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (29.26) for latency ExtremeSparseL4U32
2025-09-13 12:13:17,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 13 minutes, 51 seconds)
2025-09-13 12:24:09,909 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:24:09,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:24:17,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 24.09431 ± 23.017
2025-09-13 12:24:17,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [38.30852, 18.131647, 19.26339, 0.9351455, 81.22636, 0.20648961, 25.23744, 30.62682, -0.59165263, 27.598925]
2025-09-13 12:24:17,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [24.0, 19.0, 18.0, 15.0, 39.0, 14.0, 29.0, 42.0, 17.0, 25.0]
2025-09-13 12:24:17,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 2 minutes, 40 seconds)
2025-09-13 12:35:22,792 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:35:22,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:35:36,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -10.21503 ± 55.056
2025-09-13 12:35:36,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [12.431711, 8.158609, -9.565178, 12.474511, 70.350655, -150.1806, -0.06144143, 1.0216497, 10.431744, -57.21193]
2025-09-13 12:35:36,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 18.0, 15.0, 33.0, 53.0, 124.0, 20.0, 28.0, 35.0, 99.0]
2025-09-13 12:35:36,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 1 hour, 50 minutes, 39 seconds)
2025-09-13 12:46:47,540 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:46:47,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:46:54,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 10.05131 ± 14.805
2025-09-13 12:46:54,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-12.065505, 16.895756, 6.5803976, 1.906948, 38.19762, 30.278625, -1.355822, 17.736479, 4.9936337, -2.6550448]
2025-09-13 12:46:54,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [12.0, 35.0, 25.0, 29.0, 29.0, 35.0, 14.0, 23.0, 17.0, 22.0]
2025-09-13 12:46:54,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 39 minutes, 18 seconds)
2025-09-13 12:57:20,286 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:57:20,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:57:27,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 9.30262 ± 12.662
2025-09-13 12:57:27,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [8.568534, 28.793785, 9.608972, -14.328493, 28.903008, 18.794943, 6.4370294, 4.42646, -2.0154612, 3.8374064]
2025-09-13 12:57:27,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [20.0, 30.0, 23.0, 24.0, 28.0, 25.0, 15.0, 28.0, 17.0, 13.0]
2025-09-13 12:57:27,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 28 minutes, 23 seconds)
2025-09-13 13:08:19,267 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:08:19,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:08:27,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 13.74113 ± 22.868
2025-09-13 13:08:27,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [29.22692, -8.651418, -0.5496744, 33.437405, -1.5109091, -1.6248072, 33.344933, -23.153028, 53.63262, 23.259233]
2025-09-13 13:08:27,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [68.0, 12.0, 21.0, 25.0, 14.0, 15.0, 29.0, 31.0, 54.0, 18.0]
2025-09-13 13:08:27,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 17 minutes, 14 seconds)
2025-09-13 13:19:17,792 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:19:17,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:19:29,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 16.53168 ± 23.270
2025-09-13 13:19:29,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-1.1603588, 23.047682, 71.52571, -21.016502, 17.817984, 15.288736, 1.0454873, 34.604504, 14.914597, 9.24897]
2025-09-13 13:19:29,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [14.0, 18.0, 66.0, 161.0, 16.0, 17.0, 14.0, 28.0, 35.0, 27.0]
2025-09-13 13:19:29,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 6 minutes, 14 seconds)
2025-09-13 13:30:26,370 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:30:26,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:30:35,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 12.72900 ± 24.783
2025-09-13 13:30:35,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [61.378693, 1.4900632, 7.417584, 58.42504, 1.8258033, 2.4803605, 12.3064, -3.634418, 3.8938134, -18.293362]
2025-09-13 13:30:35,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [69.0, 38.0, 19.0, 56.0, 17.0, 14.0, 21.0, 15.0, 35.0, 22.0]
2025-09-13 13:30:35,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 54 minutes, 58 seconds)
2025-09-13 13:41:27,632 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:41:27,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:41:34,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 13.82981 ± 12.241
2025-09-13 13:41:34,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [26.708996, 0.6091775, 23.094944, 8.67018, 40.804626, 16.151825, 10.469332, 3.3073137, 6.0285745, 2.4531145]
2025-09-13 13:41:34,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [29.0, 21.0, 27.0, 15.0, 31.0, 33.0, 25.0, 16.0, 14.0, 13.0]
2025-09-13 13:41:34,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 43 minutes, 43 seconds)
2025-09-13 13:52:36,266 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:52:36,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:52:44,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 14.79620 ± 24.344
2025-09-13 13:52:44,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [3.6008947, 4.640997, 53.12981, 7.078528, 3.7913153, -5.596135, -9.395929, -8.766104, 54.71861, 44.76003]
2025-09-13 13:52:44,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [28.0, 17.0, 39.0, 15.0, 26.0, 25.0, 12.0, 24.0, 36.0, 48.0]
2025-09-13 13:52:44,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 33 minutes, 10 seconds)
2025-09-13 14:04:15,624 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:04:15,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:04:23,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 18.03489 ± 17.322
2025-09-13 14:04:23,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [32.289696, 26.323664, 36.212574, 0.87484795, 34.77005, 5.8714433, 9.822146, -7.617404, 42.510113, -0.70826226]
2025-09-13 14:04:23,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [35.0, 47.0, 49.0, 13.0, 24.0, 21.0, 20.0, 12.0, 40.0, 16.0]
2025-09-13 14:04:23,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 22 minutes, 22 seconds)
2025-09-13 14:14:33,576 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:14:33,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:14:46,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 24.84319 ± 28.139
2025-09-13 14:14:46,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-8.67328, 56.98408, 55.921577, 78.70296, 3.1321702, 1.7698379, 12.367989, 28.42459, -1.8522621, 21.65421]
2025-09-13 14:14:46,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [13.0, 57.0, 82.0, 50.0, 24.0, 14.0, 30.0, 119.0, 14.0, 38.0]
2025-09-13 14:14:46,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 11 minutes, 3 seconds)
2025-09-13 14:26:44,594 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:26:44,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:26:54,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 17.78609 ± 45.681
2025-09-13 14:26:54,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-26.817871, -17.874628, -15.12355, 137.36436, 31.64159, 21.240202, -12.311658, 9.236205, 3.6880865, 46.81811]
2025-09-13 14:26:54,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [25.0, 30.0, 12.0, 79.0, 65.0, 23.0, 15.0, 19.0, 15.0, 48.0]
2025-09-13 14:26:54,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1251 [DEBUG]: Training session finished
