2025-09-12 14:15:53,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc10-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay
2025-09-12 14:15:53,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc10-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay
2025-09-12 14:15:53,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x147b98810d50>}
2025-09-12 14:15:53,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1111 [DEBUG]: using device: cuda
2025-09-12 14:15:53,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1133 [INFO]: Creating new trainer
2025-09-12 14:15:53,161 baseline-mbpac-noiseperc10-ant:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-09-12 14:15:53,162 baseline-mbpac-noiseperc10-ant:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 14:15:53,171 baseline-mbpac-noiseperc10-ant:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=27, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=8, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-12 14:15:54,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1194 [DEBUG]: Starting training session...
2025-09-12 14:15:54,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 1/100
2025-09-12 14:27:25,468 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:27:25,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 14:28:16,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: -114.09328 ± 177.201
2025-09-12 14:28:16,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [-25.549032, -33.766533, -182.37164, -29.339249, -84.865974, -622.5604, -4.5586004, -19.080187, -27.859184, -110.98208]
2025-09-12 14:28:16,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [46.0, 59.0, 266.0, 45.0, 92.0, 1000.0, 18.0, 34.0, 54.0, 119.0]
2025-09-12 14:28:16,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (-114.09) for latency ExtremeSparseL4U32
2025-09-12 14:28:16,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 20 hours, 25 minutes, 1 second)
2025-09-12 14:39:14,752 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:39:14,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 14:40:08,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: -9.40934 ± 23.359
2025-09-12 14:40:08,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [-3.8338144, 5.389103, 19.588188, -29.609188, -46.90507, -9.03986, -4.4996576, -44.10162, -8.027135, 26.945614]
2025-09-12 14:40:08,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [18.0, 58.0, 79.0, 61.0, 1000.0, 235.0, 151.0, 58.0, 46.0, 125.0]
2025-09-12 14:40:08,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (-9.41) for latency ExtremeSparseL4U32
2025-09-12 14:40:08,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 19 hours, 47 minutes, 59 seconds)
2025-09-12 14:51:23,392 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:51:23,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 14:54:00,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 46.23591 ± 67.053
2025-09-12 14:54:00,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [-7.7575407, 52.985138, 240.3876, 29.887486, 16.713873, 7.7147245, 15.668448, 46.678555, 19.462748, 40.61804]
2025-09-12 14:54:00,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [155.0, 329.0, 1000.0, 148.0, 279.0, 1000.0, 156.0, 1000.0, 935.0, 261.0]
2025-09-12 14:54:00,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (46.24) for latency ExtremeSparseL4U32
2025-09-12 14:54:00,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 20 hours, 31 minutes, 47 seconds)
2025-09-12 15:04:28,240 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:04:28,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 15:06:48,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 50.07684 ± 39.279
2025-09-12 15:06:48,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [33.819427, 18.102257, 80.82917, 21.631184, 23.717129, 20.781166, 133.07999, 106.70373, 28.452126, 33.65221]
2025-09-12 15:06:48,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [99.0, 1000.0, 920.0, 1000.0, 64.0, 103.0, 1000.0, 215.0, 135.0, 123.0]
2025-09-12 15:06:48,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (50.08) for latency ExtremeSparseL4U32
2025-09-12 15:06:48,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 20 hours, 21 minutes, 46 seconds)
2025-09-12 15:17:41,261 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:17:41,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 15:20:03,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 52.76960 ± 40.128
2025-09-12 15:20:03,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [0.66963166, 17.024105, 41.337646, 48.33913, 26.44846, 150.35297, 89.250534, 57.857315, 35.05365, 61.362568]
2025-09-12 15:20:03,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [134.0, 176.0, 107.0, 1000.0, 62.0, 1000.0, 1000.0, 1000.0, 134.0, 194.0]
2025-09-12 15:20:03,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (52.77) for latency ExtremeSparseL4U32
2025-09-12 15:20:03,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 20 hours, 18 minutes, 46 seconds)
2025-09-12 15:31:34,117 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:31:34,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 15:34:50,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 142.29976 ± 103.416
2025-09-12 15:34:50,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [182.86133, 19.08615, 254.21791, 290.8388, 26.436083, 4.0964103, 124.89344, 134.40218, 286.58093, 99.58435]
2025-09-12 15:34:50,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 185.0, 1000.0, 893.0, 262.0, 35.0, 1000.0, 1000.0, 1000.0, 286.0]
2025-09-12 15:34:50,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (142.30) for latency ExtremeSparseL4U32
2025-09-12 15:34:50,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 20 hours, 51 minutes, 24 seconds)
2025-09-12 15:45:33,763 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:45:33,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 15:48:45,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 209.28511 ± 165.772
2025-09-12 15:48:45,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [28.05238, 18.698784, 290.14694, 480.2921, 328.58954, -2.4713533, 398.95706, 288.0958, 209.37753, 53.1121]
2025-09-12 15:48:45,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [174.0, 188.0, 1000.0, 1000.0, 1000.0, 32.0, 1000.0, 1000.0, 1000.0, 81.0]
2025-09-12 15:48:45,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (209.29) for latency ExtremeSparseL4U32
2025-09-12 15:48:45,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 21 hours, 16 minutes, 11 seconds)
2025-09-12 15:59:28,980 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:59:28,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 16:02:10,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 165.16745 ± 139.192
2025-09-12 16:02:10,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [429.4446, 280.2313, 177.2667, 18.604424, 234.60823, -3.9170837, 145.62912, 50.106594, 304.67078, 15.029735]
2025-09-12 16:02:10,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 640.0, 44.0, 934.0, 118.0, 412.0, 146.0, 1000.0, 191.0]
2025-09-12 16:02:10,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 20 hours, 54 minutes, 20 seconds)
2025-09-12 16:13:14,392 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:13:14,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 16:16:05,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 225.96199 ± 182.629
2025-09-12 16:16:05,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [402.95947, 22.603058, 13.040514, 454.10898, 198.45438, 383.36563, 318.4382, 430.36874, 42.799004, -6.5181327]
2025-09-12 16:16:05,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 52.0, 30.0, 1000.0, 694.0, 1000.0, 829.0, 1000.0, 82.0, 60.0]
2025-09-12 16:16:05,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (225.96) for latency ExtremeSparseL4U32
2025-09-12 16:16:05,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 21 hours, 47 seconds)
2025-09-12 16:27:08,235 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:27:08,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 16:30:05,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 279.40033 ± 170.635
2025-09-12 16:30:05,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [84.141396, 134.46562, 478.31134, 377.90714, 439.47998, 117.7398, 77.563896, 487.39362, 450.61026, 146.39017]
2025-09-12 16:30:05,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [134.0, 306.0, 1000.0, 1000.0, 1000.0, 192.0, 148.0, 1000.0, 1000.0, 221.0]
2025-09-12 16:30:05,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (279.40) for latency ExtremeSparseL4U32
2025-09-12 16:30:05,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 21 hours, 40 seconds)
2025-09-12 16:40:38,964 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:40:38,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 16:43:08,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 262.53748 ± 214.219
2025-09-12 16:43:08,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [478.7593, 53.940674, 16.293015, 571.7572, 512.7984, 86.3452, 247.46791, 33.423714, 137.60481, 486.98447]
2025-09-12 16:43:08,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 145.0, 38.0, 1000.0, 1000.0, 147.0, 317.0, 90.0, 362.0, 1000.0]
2025-09-12 16:43:08,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 20 hours, 15 minutes, 45 seconds)
2025-09-12 16:54:01,041 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:54:01,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 16:57:04,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 339.97464 ± 214.714
2025-09-12 16:57:04,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [132.29024, 390.54422, 523.2598, 422.2729, 35.406925, 548.3046, 33.500805, 635.5602, 169.7392, 508.86743]
2025-09-12 16:57:04,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [186.0, 578.0, 1000.0, 1000.0, 88.0, 1000.0, 49.0, 1000.0, 341.0, 1000.0]
2025-09-12 16:57:04,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (339.97) for latency ExtremeSparseL4U32
2025-09-12 16:57:04,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 20 hours, 2 minutes, 20 seconds)
2025-09-12 17:08:43,172 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:08:43,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 17:11:14,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 286.57123 ± 269.632
2025-09-12 17:11:14,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [73.69963, 502.2611, 2.872928, 3.5681942, 522.0171, 13.703004, 673.0845, 13.819042, 562.5582, 498.12903]
2025-09-12 17:11:14,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [128.0, 1000.0, 16.0, 16.0, 1000.0, 42.0, 890.0, 43.0, 1000.0, 1000.0]
2025-09-12 17:11:14,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 20 hours, 1 minute, 55 seconds)
2025-09-12 17:21:31,044 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:21:31,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 17:23:23,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 205.88925 ± 142.824
2025-09-12 17:23:23,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [540.01416, 94.554276, 387.52374, 14.870413, 125.502914, 176.67906, 186.69876, 165.01512, 175.0572, 192.97693]
2025-09-12 17:23:23,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 107.0, 1000.0, 63.0, 138.0, 227.0, 274.0, 284.0, 316.0, 297.0]
2025-09-12 17:23:23,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 19 hours, 17 minutes, 41 seconds)
2025-09-12 17:35:18,594 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:35:18,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 17:37:13,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 275.90051 ± 160.127
2025-09-12 17:37:13,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [137.23201, 172.18576, 507.73132, 256.22498, 529.4637, 408.3164, 154.76671, 17.126156, 353.28827, 222.66985]
2025-09-12 17:37:13,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [155.0, 208.0, 1000.0, 287.0, 768.0, 498.0, 231.0, 29.0, 439.0, 285.0]
2025-09-12 17:37:13,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 19 hours, 1 minute, 19 seconds)
2025-09-12 17:48:09,790 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:48:09,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 17:50:08,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 344.68881 ± 220.616
2025-09-12 17:50:08,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [385.98004, 270.91885, 176.00731, 72.754395, 465.4752, 751.3225, 325.90176, 109.747765, 691.16364, 197.61664]
2025-09-12 17:50:08,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [391.0, 367.0, 228.0, 125.0, 513.0, 900.0, 461.0, 123.0, 623.0, 197.0]
2025-09-12 17:50:08,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (344.69) for latency ExtremeSparseL4U32
2025-09-12 17:50:08,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 18 hours, 45 minutes, 30 seconds)
2025-09-12 18:01:08,268 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:01:08,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 18:03:24,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 337.34180 ± 259.644
2025-09-12 18:03:24,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [64.938065, 217.80037, 170.77547, 654.6273, 199.09328, 605.26337, 56.90883, 721.1961, 606.29266, 76.522385]
2025-09-12 18:03:24,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [83.0, 293.0, 196.0, 696.0, 216.0, 1000.0, 75.0, 1000.0, 1000.0, 51.0]
2025-09-12 18:03:24,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 18 hours, 21 minutes, 6 seconds)
2025-09-12 18:13:50,104 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:13:50,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 18:17:27,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 413.97202 ± 231.041
2025-09-12 18:17:27,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [37.26191, 60.9791, 634.6114, 543.1013, 497.5768, 100.94186, 579.72925, 511.22412, 582.64056, 591.6537]
2025-09-12 18:17:27,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [79.0, 113.0, 1000.0, 1000.0, 1000.0, 155.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:17:27,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (413.97) for latency ExtremeSparseL4U32
2025-09-12 18:17:27,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 18 hours, 5 minutes, 51 seconds)
2025-09-12 18:28:05,947 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:28:05,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 18:30:30,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 324.32245 ± 274.682
2025-09-12 18:30:30,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [102.917114, 126.37914, 931.2206, 520.4996, 550.01685, 265.11612, 72.39515, 485.83234, 24.465853, 164.3816]
2025-09-12 18:30:30,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [133.0, 105.0, 1000.0, 1000.0, 1000.0, 289.0, 91.0, 1000.0, 25.0, 156.0]
2025-09-12 18:30:30,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 18 hours, 7 minutes, 20 seconds)
2025-09-12 18:42:18,907 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:42:18,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 18:45:00,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 488.51001 ± 294.200
2025-09-12 18:45:00,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [882.0297, 431.73303, 1051.2, 614.22614, 86.30516, 334.20105, 439.78784, 542.21716, 61.921207, 441.4783]
2025-09-12 18:45:00,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [913.0, 407.0, 1000.0, 604.0, 168.0, 361.0, 378.0, 444.0, 83.0, 1000.0]
2025-09-12 18:45:00,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (488.51) for latency ExtremeSparseL4U32
2025-09-12 18:45:00,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 18 hours, 4 minutes, 32 seconds)
2025-09-12 18:55:17,155 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:55:17,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 18:57:56,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 422.57260 ± 404.810
2025-09-12 18:57:56,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [104.27809, 65.0761, 64.03169, 1108.0544, 1086.7113, 64.20554, 12.394161, 521.3252, 579.6318, 620.01776]
2025-09-12 18:57:56,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [116.0, 57.0, 76.0, 1000.0, 1000.0, 84.0, 20.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:57:56,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 17 hours, 51 minutes, 19 seconds)
2025-09-12 19:08:50,344 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:08:50,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 19:12:06,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 444.95111 ± 237.828
2025-09-12 19:12:06,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [152.55743, 101.40539, 439.36316, 510.91806, 529.87396, 201.3626, 311.09528, 627.9484, 736.9947, 837.9921]
2025-09-12 19:12:06,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [136.0, 120.0, 1000.0, 1000.0, 1000.0, 167.0, 295.0, 1000.0, 726.0, 1000.0]
2025-09-12 19:12:06,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 17 hours, 51 minutes, 38 seconds)
2025-09-12 19:23:34,802 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:23:34,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 19:26:17,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 502.72583 ± 311.546
2025-09-12 19:26:17,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [595.7755, 572.9684, 316.68076, 445.81287, 336.46423, 162.40369, 1110.0725, 996.4516, 368.90167, 121.7266]
2025-09-12 19:26:17,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 249.0, 419.0, 285.0, 158.0, 1000.0, 959.0, 261.0, 136.0]
2025-09-12 19:26:17,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (502.73) for latency ExtremeSparseL4U32
2025-09-12 19:26:17,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 17 hours, 39 minutes, 55 seconds)
2025-09-12 19:37:13,720 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:37:13,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 19:41:01,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 609.98572 ± 292.009
2025-09-12 19:41:01,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [942.2736, 582.6745, 874.39764, 92.88056, 230.3265, 679.7313, 526.0425, 479.51157, 1090.6798, 601.339]
2025-09-12 19:41:01,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [808.0, 1000.0, 1000.0, 130.0, 201.0, 1000.0, 1000.0, 451.0, 1000.0, 1000.0]
2025-09-12 19:41:01,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (609.99) for latency ExtremeSparseL4U32
2025-09-12 19:41:01,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 17 hours, 51 minutes, 48 seconds)
2025-09-12 19:52:33,944 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:52:33,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 19:55:25,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 470.27499 ± 258.291
2025-09-12 19:55:25,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [650.6199, 442.82498, 95.359314, 590.22986, 647.64935, 733.86, 500.98642, 176.44296, 811.7467, 53.03073]
2025-09-12 19:55:25,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [535.0, 359.0, 78.0, 1000.0, 1000.0, 564.0, 1000.0, 231.0, 851.0, 43.0]
2025-09-12 19:55:25,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 17 hours, 36 minutes, 13 seconds)
2025-09-12 20:05:45,900 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:05:45,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:07:48,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 370.56354 ± 251.271
2025-09-12 20:07:48,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [183.8056, 17.180841, 587.40875, 522.8759, 341.34845, 869.45245, 179.28458, 205.25214, 192.08105, 606.9455]
2025-09-12 20:07:48,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [190.0, 20.0, 479.0, 1000.0, 247.0, 1000.0, 200.0, 171.0, 221.0, 613.0]
2025-09-12 20:07:48,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 17 hours, 14 minutes, 6 seconds)
2025-09-12 20:19:24,204 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:19:24,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:22:44,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 615.57458 ± 275.177
2025-09-12 20:22:44,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [30.99673, 877.7674, 719.85394, 236.27579, 572.6131, 963.3479, 747.79285, 499.86978, 742.78107, 764.4474]
2025-09-12 20:22:44,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [35.0, 1000.0, 1000.0, 155.0, 1000.0, 1000.0, 1000.0, 401.0, 627.0, 628.0]
2025-09-12 20:22:44,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (615.57) for latency ExtremeSparseL4U32
2025-09-12 20:22:44,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 17 hours, 11 minutes, 13 seconds)
2025-09-12 20:33:01,142 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:33:01,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:36:12,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 558.65686 ± 324.491
2025-09-12 20:36:12,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [600.3587, 889.4938, 237.73154, 861.9603, 46.65687, 215.06236, 505.65112, 1160.6714, 543.621, 525.3612]
2025-09-12 20:36:12,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 730.0, 167.0, 1000.0, 55.0, 137.0, 421.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:36:12,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 16 hours, 46 minutes, 51 seconds)
2025-09-12 20:46:36,542 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:46:36,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:50:22,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 642.53625 ± 305.656
2025-09-12 20:50:22,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [70.84819, 1086.1313, 602.8701, 394.9438, 621.29474, 951.72723, 538.0868, 1069.3304, 685.89325, 404.23724]
2025-09-12 20:50:22,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [77.0, 1000.0, 1000.0, 362.0, 1000.0, 836.0, 1000.0, 1000.0, 1000.0, 395.0]
2025-09-12 20:50:22,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (642.54) for latency ExtremeSparseL4U32
2025-09-12 20:50:22,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 16 hours, 24 minutes, 46 seconds)
2025-09-12 21:01:05,894 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:01:05,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:04:29,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 665.35504 ± 381.521
2025-09-12 21:04:29,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [746.57947, 370.4815, 861.28, 1200.2478, 712.86115, 1213.5338, 561.5459, 113.097565, 833.6265, 40.29621]
2025-09-12 21:04:29,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 269.0, 604.0, 1000.0, 1000.0, 1000.0, 1000.0, 103.0, 1000.0, 29.0]
2025-09-12 21:04:29,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (665.36) for latency ExtremeSparseL4U32
2025-09-12 21:04:29,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 16 hours, 6 minutes, 55 seconds)
2025-09-12 21:16:01,672 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:16:01,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:19:19,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 625.53363 ± 323.888
2025-09-12 21:19:19,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [311.3404, 241.24464, 1143.4521, 605.3948, 557.6415, 396.71954, 460.58643, 624.52313, 1296.182, 618.25214]
2025-09-12 21:19:19,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [240.0, 173.0, 1000.0, 1000.0, 449.0, 314.0, 1000.0, 516.0, 1000.0, 1000.0]
2025-09-12 21:19:19,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 16 hours, 26 minutes, 52 seconds)
2025-09-12 21:29:41,809 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:29:41,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:32:40,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 637.62708 ± 345.102
2025-09-12 21:32:40,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [546.40796, 577.1132, 208.41594, 467.4726, 1221.0627, 527.83527, 738.2639, 443.20038, 325.00174, 1321.4972]
2025-09-12 21:32:40,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [421.0, 1000.0, 159.0, 346.0, 1000.0, 374.0, 540.0, 1000.0, 238.0, 1000.0]
2025-09-12 21:32:40,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 15 hours, 51 minutes, 4 seconds)
2025-09-12 21:43:02,908 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:43:02,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:45:17,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 470.26920 ± 372.259
2025-09-12 21:45:17,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [831.0986, 450.1454, 304.56967, 316.33658, 646.0621, 5.139817, 220.1865, 1364.3666, 392.43314, 172.35374]
2025-09-12 21:45:17,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [514.0, 1000.0, 216.0, 189.0, 1000.0, 18.0, 148.0, 1000.0, 320.0, 141.0]
2025-09-12 21:45:17,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 15 hours, 25 minutes, 38 seconds)
2025-09-12 21:56:55,947 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:56:55,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:00:16,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 665.74915 ± 332.671
2025-09-12 22:00:16,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [533.3861, 1175.3839, 702.79956, 390.49747, 372.14203, 1056.1151, 620.42773, 546.2744, 1127.6616, 132.80313]
2025-09-12 22:00:16,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [402.0, 1000.0, 555.0, 292.0, 1000.0, 1000.0, 479.0, 1000.0, 1000.0, 111.0]
2025-09-12 22:00:16,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (665.75) for latency ExtremeSparseL4U32
2025-09-12 22:00:16,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 15 hours, 22 minutes, 43 seconds)
2025-09-12 22:10:14,986 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:10:14,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:13:31,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 645.72729 ± 418.280
2025-09-12 22:13:31,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [639.3934, 360.53595, 481.1575, 228.06111, 503.08966, 1057.0565, 414.52072, 1377.8074, 113.7376, 1281.913]
2025-09-12 22:13:31,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [434.0, 245.0, 1000.0, 156.0, 1000.0, 801.0, 1000.0, 1000.0, 93.0, 974.0]
2025-09-12 22:13:31,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 14 hours, 57 minutes, 25 seconds)
2025-09-12 22:24:05,393 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:24:05,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:27:59,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 904.82336 ± 466.269
2025-09-12 22:27:59,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1342.1016, 1236.5426, 1558.2166, 1193.5696, 815.03754, 96.393425, 824.53265, 429.28796, 309.76373, 1242.7883]
2025-09-12 22:27:59,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [954.0, 1000.0, 1000.0, 1000.0, 642.0, 67.0, 1000.0, 1000.0, 267.0, 1000.0]
2025-09-12 22:27:59,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (904.82) for latency ExtremeSparseL4U32
2025-09-12 22:27:59,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 14 hours, 38 minutes, 47 seconds)
2025-09-12 22:38:49,857 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:38:49,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:41:38,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 526.17700 ± 440.465
2025-09-12 22:41:38,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1375.4672, 415.52478, 88.90673, 13.46415, 826.71576, 580.20844, 73.53096, 1141.3528, 252.5439, 494.05515]
2025-09-12 22:41:38,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 53.0, 17.0, 518.0, 1000.0, 76.0, 789.0, 237.0, 1000.0]
2025-09-12 22:41:38,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 14 hours, 29 minutes, 1 second)
2025-09-12 22:52:33,542 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:52:33,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:54:20,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 501.12094 ± 347.206
2025-09-12 22:54:20,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1094.9241, 202.3606, 1042.453, 605.3047, 117.5372, 75.33777, 163.60277, 539.2809, 615.3927, 555.01556]
2025-09-12 22:54:20,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [740.0, 207.0, 714.0, 345.0, 111.0, 65.0, 122.0, 359.0, 578.0, 394.0]
2025-09-12 22:54:20,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 14 hours, 16 minutes, 17 seconds)
2025-09-12 23:05:51,224 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:05:51,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:08:40,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 663.57812 ± 476.052
2025-09-12 23:08:40,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [129.59805, 1316.5936, 680.0614, 560.2648, 336.9725, 349.22598, 1394.956, 1309.267, 65.78754, 493.05414]
2025-09-12 23:08:40,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [104.0, 1000.0, 419.0, 483.0, 1000.0, 259.0, 1000.0, 1000.0, 74.0, 376.0]
2025-09-12 23:08:40,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 13 hours, 54 minutes, 24 seconds)
2025-09-12 23:19:21,845 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:19:21,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:22:02,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 566.10480 ± 314.324
2025-09-12 23:22:02,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [526.6321, 643.37933, 568.03394, 516.431, 1168.8618, 461.6801, 301.8068, 1038.5002, 33.178413, 402.54416]
2025-09-12 23:22:02,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [355.0, 1000.0, 471.0, 354.0, 1000.0, 340.0, 189.0, 706.0, 23.0, 1000.0]
2025-09-12 23:22:02,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 13 hours, 42 minutes, 14 seconds)
2025-09-12 23:32:57,375 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:32:57,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:35:53,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 547.94055 ± 343.372
2025-09-12 23:35:53,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [356.44528, 1043.0234, 906.84705, 1067.7955, 433.44092, 260.74884, 148.19415, 633.29755, 566.6301, 62.982353]
2025-09-12 23:35:53,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 342.0, 169.0, 125.0, 1000.0, 334.0, 65.0]
2025-09-12 23:35:53,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 13 hours, 21 minutes, 18 seconds)
2025-09-12 23:45:42,707 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:45:42,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:48:56,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 795.55017 ± 474.644
2025-09-12 23:48:56,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1377.7189, 654.5611, 1152.2809, 299.15237, 232.11032, 1627.7048, 550.57007, 701.17303, 1135.575, 224.65466]
2025-09-12 23:48:56,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 468.0, 747.0, 188.0, 201.0, 962.0, 1000.0, 1000.0, 1000.0, 168.0]
2025-09-12 23:48:56,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 13 hours, 46 seconds)
2025-09-13 00:00:41,914 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:00:41,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:03:42,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 751.95117 ± 409.867
2025-09-13 00:03:42,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [157.87946, 1033.4263, 822.76434, 204.2294, 358.68594, 1496.1996, 1209.9731, 866.0498, 650.0994, 720.204]
2025-09-13 00:03:42,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [94.0, 764.0, 535.0, 128.0, 227.0, 1000.0, 1000.0, 1000.0, 505.0, 1000.0]
2025-09-13 00:03:42,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 13 hours, 10 minutes, 49 seconds)
2025-09-13 00:13:34,469 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:13:34,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:15:43,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 611.88263 ± 449.694
2025-09-13 00:15:43,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [365.68146, 1028.0933, 100.2276, 312.0492, 528.50354, 65.00146, 1269.9274, 701.05164, 348.72037, 1399.5706]
2025-09-13 00:15:43,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [252.0, 701.0, 89.0, 223.0, 342.0, 65.0, 900.0, 515.0, 273.0, 1000.0]
2025-09-13 00:15:43,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 12 hours, 30 minutes, 56 seconds)
2025-09-13 00:26:36,950 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:26:36,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:28:19,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 331.20883 ± 209.853
2025-09-13 00:28:19,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [165.17122, 238.42072, 215.01544, 607.3382, 19.438898, 163.62169, 609.4069, 273.4148, 361.11835, 659.14197]
2025-09-13 00:28:19,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [95.0, 167.0, 167.0, 1000.0, 40.0, 145.0, 1000.0, 231.0, 253.0, 465.0]
2025-09-13 00:28:19,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 12 hours, 9 minutes, 7 seconds)
2025-09-13 00:38:56,347 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:38:56,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:42:27,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 762.23499 ± 426.319
2025-09-13 00:42:27,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [397.61105, 1005.1596, 564.99963, 652.06805, 436.67456, 172.58621, 1535.0167, 1428.7665, 521.30524, 908.1625]
2025-09-13 00:42:27,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 739.0, 368.0, 475.0, 1000.0, 132.0, 1000.0, 900.0, 1000.0, 640.0]
2025-09-13 00:42:27,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 11 hours, 58 minutes, 53 seconds)
2025-09-13 00:53:18,374 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:53:18,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:56:54,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1114.27637 ± 550.717
2025-09-13 00:56:54,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [435.77045, 1733.983, 1656.7794, 482.69632, 1484.8782, 1698.5739, 1378.7922, 525.74335, 381.0668, 1364.4806]
2025-09-13 00:56:54,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [331.0, 1000.0, 1000.0, 356.0, 1000.0, 1000.0, 1000.0, 374.0, 300.0, 1000.0]
2025-09-13 00:56:54,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (1114.28) for latency ExtremeSparseL4U32
2025-09-13 00:56:54,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 12 hours, 26 seconds)
2025-09-13 01:08:11,765 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:08:11,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:12:04,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1052.59009 ± 544.620
2025-09-13 01:12:04,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1579.4496, 1649.7921, 1267.7366, 189.26764, 878.33875, 1018.0987, 250.48848, 1445.9935, 1715.21, 531.5245]
2025-09-13 01:12:04,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 160.0, 559.0, 1000.0, 161.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:12:04,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 11 hours, 51 minutes, 1 second)
2025-09-13 01:22:58,665 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:22:58,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:27:06,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1078.19800 ± 408.248
2025-09-13 01:27:06,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1621.0156, 1344.467, 1140.6685, 337.15347, 491.73376, 718.3279, 1516.9998, 1350.7006, 1164.7412, 1096.1732]
2025-09-13 01:27:06,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 264.0, 1000.0, 452.0, 1000.0, 1000.0, 760.0, 1000.0]
2025-09-13 01:27:06,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 12 hours, 8 minutes, 2 seconds)
2025-09-13 01:37:39,073 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:37:39,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:40:19,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 712.81433 ± 545.401
2025-09-13 01:40:19,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [969.8303, 507.1977, 1461.0244, 1655.5013, 325.90558, 1248.1915, 474.91522, 71.4988, 238.51517, 175.56296]
2025-09-13 01:40:19,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [677.0, 314.0, 1000.0, 1000.0, 1000.0, 739.0, 320.0, 94.0, 173.0, 169.0]
2025-09-13 01:40:19,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 11 hours, 59 minutes, 58 seconds)
2025-09-13 01:50:28,195 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:50:28,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:53:35,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 866.57489 ± 561.178
2025-09-13 01:53:35,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [854.5295, 349.056, 344.59094, 159.45525, 1590.1744, 604.9685, 675.84216, 1810.2976, 668.3467, 1608.488]
2025-09-13 01:53:35,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [551.0, 242.0, 210.0, 144.0, 1000.0, 330.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:53:35,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 11 hours, 37 minutes, 5 seconds)
2025-09-13 02:05:18,015 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:05:18,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:07:42,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 755.23010 ± 547.807
2025-09-13 02:07:42,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1033.6273, 644.8755, 213.5198, 237.54088, 110.33693, 709.265, 1608.8088, 198.19896, 1215.8055, 1580.3221]
2025-09-13 02:07:42,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [563.0, 491.0, 116.0, 160.0, 108.0, 522.0, 1000.0, 116.0, 1000.0, 960.0]
2025-09-13 02:07:42,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 11 hours, 19 minutes, 42 seconds)
2025-09-13 02:18:11,554 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:18:11,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:21:18,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1008.47693 ± 691.649
2025-09-13 02:21:18,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1663.9331, 157.09132, 1467.1367, 70.68467, 127.68642, 1372.3634, 1430.724, 1747.6168, 1696.8894, 350.6431]
2025-09-13 02:21:18,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 77.0, 1000.0, 46.0, 65.0, 1000.0, 1000.0, 1000.0, 1000.0, 263.0]
2025-09-13 02:21:18,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 10 hours, 50 minutes, 42 seconds)
2025-09-13 02:32:09,713 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:32:09,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:36:08,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1330.54504 ± 614.232
2025-09-13 02:36:08,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1232.0769, 1734.5406, 1854.7327, 1768.4932, 1765.2247, 267.40088, 1000.2358, 1666.6785, 176.17203, 1839.8943]
2025-09-13 02:36:08,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 998.0, 168.0, 1000.0, 1000.0, 98.0, 1000.0]
2025-09-13 02:36:08,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (1330.55) for latency ExtremeSparseL4U32
2025-09-13 02:36:08,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 10 hours, 35 minutes, 9 seconds)
2025-09-13 02:46:12,710 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:46:12,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:49:56,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 985.31262 ± 480.499
2025-09-13 02:49:56,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [467.7652, 907.6697, 276.7269, 1492.0586, 1500.1642, 1693.1311, 666.7714, 1067.6354, 452.55884, 1328.6447]
2025-09-13 02:49:56,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [246.0, 1000.0, 176.0, 1000.0, 1000.0, 1000.0, 443.0, 1000.0, 1000.0, 817.0]
2025-09-13 02:49:56,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 10 hours, 26 minutes, 28 seconds)
2025-09-13 03:00:46,003 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:00:46,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:03:33,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 725.59192 ± 534.948
2025-09-13 03:03:33,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1619.8961, 603.41266, 591.87506, 1430.0343, 139.6893, 290.93024, 1475.0173, 479.94702, 449.4557, 175.66154]
2025-09-13 03:03:33,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 361.0, 360.0, 776.0, 78.0, 165.0, 883.0, 1000.0, 1000.0, 85.0]
2025-09-13 03:03:33,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 10 hours, 15 minutes, 46 seconds)
2025-09-13 03:14:44,214 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:14:44,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:17:30,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 547.72772 ± 480.620
2025-09-13 03:17:30,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [87.40207, 1673.7736, 122.4037, 575.67865, 990.68774, 195.85957, 619.82336, 772.5986, 50.4524, 388.59717]
2025-09-13 03:17:30,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [53.0, 1000.0, 73.0, 316.0, 1000.0, 125.0, 1000.0, 1000.0, 68.0, 1000.0]
2025-09-13 03:17:30,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 10 hours, 11 seconds)
2025-09-13 03:27:58,789 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:27:58,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:31:12,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1015.26013 ± 683.143
2025-09-13 03:31:12,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1998.5558, 1467.6526, 30.151674, 831.62994, 1742.1893, 423.154, 499.48758, 343.3676, 866.973, 1949.4402]
2025-09-13 03:31:12,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 31.0, 487.0, 1000.0, 1000.0, 302.0, 210.0, 572.0, 1000.0]
2025-09-13 03:31:12,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 9 hours, 47 minutes, 9 seconds)
2025-09-13 03:42:17,903 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:42:17,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:44:32,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 757.27258 ± 562.720
2025-09-13 03:44:32,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [467.46497, 1677.955, 219.13802, 666.40906, 140.93349, 912.82214, 500.1535, 979.1141, 199.52505, 1809.2102]
2025-09-13 03:44:32,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [247.0, 1000.0, 153.0, 404.0, 98.0, 557.0, 394.0, 609.0, 115.0, 1000.0]
2025-09-13 03:44:32,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 9 hours, 20 minutes, 57 seconds)
2025-09-13 03:55:01,644 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:55:01,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:57:55,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 912.93127 ± 713.185
2025-09-13 03:57:55,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1667.6456, 1024.0804, 449.06274, 15.834963, 1697.8678, 283.6095, 170.06673, 1805.8313, 1751.0242, 264.2888]
2025-09-13 03:57:55,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 583.0, 1000.0, 18.0, 840.0, 205.0, 105.0, 1000.0, 1000.0, 133.0]
2025-09-13 03:57:55,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 9 hours, 3 minutes, 54 seconds)
2025-09-13 04:08:48,154 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:08:48,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:10:51,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 765.17767 ± 619.757
2025-09-13 04:10:51,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [256.8455, 763.88385, 1908.299, 85.73333, 74.896225, 869.8112, 670.79376, 1751.2285, 1029.9535, 240.33226]
2025-09-13 04:10:51,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [133.0, 466.0, 1000.0, 50.0, 79.0, 512.0, 345.0, 1000.0, 487.0, 140.0]
2025-09-13 04:10:51,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 44 minutes, 52 seconds)
2025-09-13 04:22:09,270 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:22:09,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:24:19,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 854.94775 ± 685.692
2025-09-13 04:24:19,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [42.950394, 610.34094, 1986.5515, 1992.7831, 469.92392, 719.112, 934.7692, 326.28018, 53.215588, 1413.5502]
2025-09-13 04:24:19,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 376.0, 1000.0, 1000.0, 281.0, 375.0, 509.0, 140.0, 47.0, 703.0]
2025-09-13 04:24:19,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 8 hours, 27 minutes, 46 seconds)
2025-09-13 04:34:20,179 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:34:20,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:38:20,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1179.26538 ± 547.725
2025-09-13 04:38:20,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1759.4608, 1706.5404, 283.5612, 1213.9933, 1017.6987, 679.7264, 616.85284, 2107.891, 1454.8809, 952.04865]
2025-09-13 04:38:20,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 792.0, 1000.0, 600.0, 1000.0, 1000.0, 289.0, 1000.0, 1000.0, 487.0]
2025-09-13 04:38:20,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 8 hours, 16 minutes, 47 seconds)
2025-09-13 04:50:00,475 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:50:00,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:52:00,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 693.29431 ± 493.036
2025-09-13 04:52:00,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1036.4019, 984.84357, 138.50626, 297.2457, 167.6256, 1486.5342, 376.35684, 1401.5311, 207.97821, 835.92004]
2025-09-13 04:52:00,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 483.0, 73.0, 130.0, 111.0, 817.0, 221.0, 689.0, 142.0, 477.0]
2025-09-13 04:52:00,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 8 hours, 5 minutes, 44 seconds)
2025-09-13 05:02:31,167 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:02:31,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:04:14,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 723.17816 ± 694.905
2025-09-13 05:04:14,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [451.24545, 454.16248, 1905.1842, 122.60092, 184.70766, 665.6586, 2155.8232, 120.55466, 884.4356, 287.40872]
2025-09-13 05:04:14,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [258.0, 227.0, 946.0, 70.0, 111.0, 322.0, 1000.0, 78.0, 395.0, 145.0]
2025-09-13 05:04:14,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 44 minutes, 15 seconds)
2025-09-13 05:14:23,848 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:14:23,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:17:16,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1028.35632 ± 685.321
2025-09-13 05:17:16,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [536.5534, 382.70847, 1979.0773, 1547.0032, 1693.0579, 324.29532, 1391.6858, 55.40119, 1818.7935, 554.9877]
2025-09-13 05:17:16,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [339.0, 294.0, 1000.0, 929.0, 1000.0, 180.0, 850.0, 71.0, 970.0, 323.0]
2025-09-13 05:17:16,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 31 minutes, 40 seconds)
2025-09-13 05:28:16,401 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:28:16,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:31:55,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1379.42615 ± 473.661
2025-09-13 05:31:55,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1932.5944, 641.2365, 2103.2568, 983.1986, 1297.104, 1675.4762, 916.4005, 1895.3293, 1337.6372, 1012.02893]
2025-09-13 05:31:55,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 366.0, 1000.0, 494.0, 1000.0, 1000.0, 455.0, 1000.0, 784.0, 462.0]
2025-09-13 05:31:55,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (1379.43) for latency ExtremeSparseL4U32
2025-09-13 05:31:55,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 7 hours, 26 minutes, 14 seconds)
2025-09-13 05:42:26,308 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:42:26,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:44:48,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 958.52423 ± 808.703
2025-09-13 05:44:48,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [43.263786, 2132.0593, 36.556705, 498.8255, 1961.8208, 1589.822, 554.7278, 665.28986, 159.15654, 1943.7205]
2025-09-13 05:44:48,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [51.0, 1000.0, 30.0, 282.0, 1000.0, 914.0, 271.0, 307.0, 73.0, 1000.0]
2025-09-13 05:44:48,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 7 hours, 5 minutes, 25 seconds)
2025-09-13 05:55:32,947 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:55:32,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:59:00,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1304.85107 ± 681.557
2025-09-13 05:59:00,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1770.2279, 875.3117, 405.9054, 2124.0679, 32.761646, 2111.0083, 1669.9263, 1716.9891, 890.16455, 1452.1477]
2025-09-13 05:59:00,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 430.0, 189.0, 1000.0, 26.0, 1000.0, 1000.0, 1000.0, 423.0, 1000.0]
2025-09-13 05:59:00,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 55 minutes, 23 seconds)
2025-09-13 06:10:01,634 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:10:01,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:12:50,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 834.75928 ± 413.228
2025-09-13 06:12:50,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [789.07306, 821.75, 1594.6671, 657.8106, 145.54996, 1440.4404, 885.6056, 751.81433, 346.07312, 914.8086]
2025-09-13 06:12:50,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [432.0, 421.0, 830.0, 390.0, 64.0, 1000.0, 488.0, 591.0, 1000.0, 480.0]
2025-09-13 06:12:50,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 51 minutes, 36 seconds)
2025-09-13 06:24:07,418 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:24:07,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:25:47,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 637.89148 ± 552.738
2025-09-13 06:25:47,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1052.4592, 92.395454, 687.6934, 646.8887, 22.433601, 224.2621, 1161.6268, 1822.9208, 42.419544, 625.8152]
2025-09-13 06:25:47,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [456.0, 70.0, 419.0, 395.0, 17.0, 149.0, 559.0, 902.0, 43.0, 452.0]
2025-09-13 06:25:47,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 37 minutes, 24 seconds)
2025-09-13 06:36:14,860 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:36:14,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:38:22,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 722.55457 ± 494.107
2025-09-13 06:38:22,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1878.2395, 639.036, 393.20963, 1066.3235, 895.67145, 321.03336, 1017.4818, 425.50845, 558.4916, 30.550413]
2025-09-13 06:38:22,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 310.0, 192.0, 483.0, 449.0, 237.0, 451.0, 175.0, 1000.0, 32.0]
2025-09-13 06:38:22,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 12 minutes, 3 seconds)
2025-09-13 06:48:59,071 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:48:59,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:52:37,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1287.53320 ± 650.651
2025-09-13 06:52:37,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [584.0837, 480.09113, 2033.0969, 939.8991, 2111.865, 1271.8845, 436.50256, 1063.11, 2020.6107, 1934.1884]
2025-09-13 06:52:37,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [353.0, 241.0, 1000.0, 1000.0, 1000.0, 725.0, 205.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:52:37,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 6 minutes, 15 seconds)
2025-09-13 07:03:41,081 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:03:41,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:07:09,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1240.48962 ± 543.250
2025-09-13 07:07:09,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [363.91498, 1666.0615, 1688.755, 1042.1952, 1864.6617, 307.80576, 963.3099, 1124.8992, 1740.0391, 1643.2534]
2025-09-13 07:07:09,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [209.0, 1000.0, 1000.0, 618.0, 1000.0, 111.0, 475.0, 1000.0, 819.0, 1000.0]
2025-09-13 07:07:09,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 54 minutes, 22 seconds)
2025-09-13 07:17:19,230 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:17:19,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:20:13,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1012.31866 ± 755.425
2025-09-13 07:20:13,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [507.8804, 1999.7719, 286.00604, 261.5245, 1864.7644, 1080.7445, 353.39957, 99.46249, 1704.2528, 1965.3801]
2025-09-13 07:20:13,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [349.0, 1000.0, 193.0, 183.0, 1000.0, 1000.0, 237.0, 75.0, 1000.0, 1000.0]
2025-09-13 07:20:13,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 36 minutes, 55 seconds)
2025-09-13 07:31:35,975 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:31:35,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:33:10,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 632.82806 ± 541.155
2025-09-13 07:33:10,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [66.71873, 87.967224, 925.0391, 1154.1478, 300.81308, 710.62823, 1902.5708, 580.78253, 321.34814, 278.2653]
2025-09-13 07:33:10,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [45.0, 56.0, 448.0, 544.0, 167.0, 413.0, 1000.0, 224.0, 172.0, 177.0]
2025-09-13 07:33:10,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 23 minutes, 22 seconds)
2025-09-13 07:44:03,937 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:44:03,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:45:49,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 586.76251 ± 582.532
2025-09-13 07:45:49,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [2047.4324, 538.6828, 505.16342, 884.5874, 245.67758, 94.20448, 1067.4849, 285.30887, 156.65907, 42.423866]
2025-09-13 07:45:49,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 237.0, 1000.0, 421.0, 183.0, 48.0, 547.0, 120.0, 76.0, 34.0]
2025-09-13 07:45:49,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 10 minutes, 19 seconds)
2025-09-13 07:55:55,748 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:55:55,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:58:41,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 937.87268 ± 702.561
2025-09-13 07:58:41,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [801.5031, 228.3436, 848.04333, 1906.8805, 856.83997, 1683.0808, 574.0632, 34.48726, 2169.8918, 275.5932]
2025-09-13 07:58:41,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [414.0, 131.0, 1000.0, 1000.0, 1000.0, 792.0, 248.0, 25.0, 1000.0, 148.0]
2025-09-13 07:58:41,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 50 minutes, 40 seconds)
2025-09-13 08:09:41,409 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:09:41,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:13:05,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1392.92114 ± 696.146
2025-09-13 08:13:05,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [91.146484, 2183.4265, 165.88528, 1616.5002, 2026.1455, 1533.2816, 1307.0347, 1265.1129, 2058.5193, 1682.1578]
2025-09-13 08:13:05,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [75.0, 1000.0, 80.0, 678.0, 1000.0, 889.0, 566.0, 742.0, 1000.0, 1000.0]
2025-09-13 08:13:05,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (1392.92) for latency ExtremeSparseL4U32
2025-09-13 08:13:05,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 36 minutes, 54 seconds)
2025-09-13 08:23:58,462 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:23:58,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:26:47,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1230.40601 ± 778.506
2025-09-13 08:26:47,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [2198.2507, 1719.6218, 428.90802, 2095.6353, 146.88312, 590.7314, 1911.3066, 456.74313, 2012.2992, 743.6815]
2025-09-13 08:26:47,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 782.0, 199.0, 834.0, 68.0, 248.0, 1000.0, 188.0, 1000.0, 366.0]
2025-09-13 08:26:47,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 26 minutes, 13 seconds)
2025-09-13 08:37:35,997 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:37:35,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:40:01,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 760.40094 ± 719.455
2025-09-13 08:40:01,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1277.2646, 38.00383, 300.97842, 138.56636, 433.53986, 221.05734, 415.9185, 1867.0028, 720.5952, 2191.082]
2025-09-13 08:40:01,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [657.0, 27.0, 189.0, 75.0, 222.0, 103.0, 1000.0, 773.0, 1000.0, 1000.0]
2025-09-13 08:40:01,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 14 minutes, 2 seconds)
2025-09-13 08:51:12,556 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:51:12,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:53:27,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 880.89667 ± 627.555
2025-09-13 08:53:27,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [178.88512, 2098.5364, 60.931293, 652.1999, 801.327, 582.05536, 1370.7452, 1149.2443, 1597.363, 317.67953]
2025-09-13 08:53:27,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [120.0, 907.0, 41.0, 255.0, 1000.0, 279.0, 699.0, 495.0, 714.0, 146.0]
2025-09-13 08:53:27,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 3 minutes, 28 seconds)
2025-09-13 09:04:08,622 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:04:08,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:06:01,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 640.60840 ± 516.927
2025-09-13 09:06:01,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1793.9214, 1005.0034, 79.00536, 1140.7704, 663.6205, 194.84991, 220.91974, 742.2909, 251.48892, 314.21362]
2025-09-13 09:06:01,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 430.0, 45.0, 496.0, 288.0, 94.0, 110.0, 330.0, 115.0, 1000.0]
2025-09-13 09:06:01,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 48 minutes, 55 seconds)
2025-09-13 09:16:13,134 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:16:13,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:19:18,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1406.31433 ± 712.093
2025-09-13 09:19:18,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1914.4684, 1306.6666, 2016.3806, 386.6151, 2203.7483, 504.15225, 2170.7742, 797.0076, 680.2461, 2083.0842]
2025-09-13 09:19:18,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [867.0, 572.0, 827.0, 136.0, 907.0, 328.0, 1000.0, 385.0, 339.0, 1000.0]
2025-09-13 09:19:18,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (1406.31) for latency ExtremeSparseL4U32
2025-09-13 09:19:18,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 31 minutes, 53 seconds)
2025-09-13 09:30:04,404 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:30:04,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:30:51,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 335.71283 ± 301.586
2025-09-13 09:30:51,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [517.46265, 333.20422, 813.5407, 85.27989, 138.06262, 45.580948, 301.11124, 925.16595, 72.68847, 125.03171]
2025-09-13 09:30:51,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [258.0, 139.0, 386.0, 41.0, 49.0, 36.0, 168.0, 425.0, 37.0, 61.0]
2025-09-13 09:30:51,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 12 minutes, 12 seconds)
2025-09-13 09:42:25,223 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:42:25,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:44:43,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 983.29462 ± 877.941
2025-09-13 09:44:43,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [229.45145, 174.88309, 561.1999, 1897.8923, 30.462997, 1165.7776, 691.71423, 2533.2275, 2242.0593, 306.27747]
2025-09-13 09:44:43,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [109.0, 95.0, 228.0, 858.0, 25.0, 1000.0, 285.0, 1000.0, 1000.0, 159.0]
2025-09-13 09:44:43,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 1 minute, 9 seconds)
2025-09-13 09:54:48,188 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:54:48,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:57:31,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1176.40112 ± 852.116
2025-09-13 09:57:31,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [633.7569, 114.87893, 1357.169, 2079.7542, 813.0714, 2136.4905, 2204.9763, 188.38394, 2091.4705, 144.05835]
2025-09-13 09:57:31,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [356.0, 63.0, 573.0, 1000.0, 365.0, 1000.0, 1000.0, 96.0, 1000.0, 70.0]
2025-09-13 09:57:31,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 46 minutes, 34 seconds)
2025-09-13 10:08:48,098 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:08:48,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:12:24,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1268.20532 ± 774.713
2025-09-13 10:12:24,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [2006.6599, 1794.4563, 534.86584, 1106.2517, 22.815416, 205.78548, 2258.5916, 1112.5342, 2221.072, 1419.0205]
2025-09-13 10:12:24,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 249.0, 1000.0, 18.0, 113.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:12:24,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 39 minutes, 19 seconds)
2025-09-13 10:22:53,765 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:22:53,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:25:58,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 890.45154 ± 760.972
2025-09-13 10:25:58,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [300.84796, 820.0765, 280.492, 2028.3715, 868.23535, 633.2901, 1101.395, 103.98644, 253.72641, 2514.0947]
2025-09-13 10:25:58,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 389.0, 1000.0, 1000.0, 401.0, 1000.0, 387.0, 56.0, 116.0, 1000.0]
2025-09-13 10:25:58,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 26 minutes, 40 seconds)
2025-09-13 10:36:43,909 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:36:43,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:38:45,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 683.97186 ± 625.395
2025-09-13 10:38:45,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [144.02255, 115.642944, 782.3878, 1809.2324, 1846.3893, 861.87665, 80.96876, 451.4233, 255.79434, 491.981]
2025-09-13 10:38:45,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [113.0, 69.0, 297.0, 1000.0, 1000.0, 1000.0, 55.0, 224.0, 129.0, 230.0]
2025-09-13 10:38:45,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 15 minutes, 47 seconds)
2025-09-13 10:49:37,107 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:49:37,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:51:17,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 734.17480 ± 614.698
2025-09-13 10:51:17,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [852.8677, 883.10645, 2293.216, 397.73456, 108.3829, 492.19247, 169.42125, 249.28622, 720.8446, 1174.696]
2025-09-13 10:51:17,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [398.0, 394.0, 1000.0, 188.0, 82.0, 220.0, 111.0, 168.0, 372.0, 473.0]
2025-09-13 10:51:17,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 59 minutes, 50 seconds)
2025-09-13 11:02:22,387 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:02:22,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:05:36,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1344.81201 ± 705.114
2025-09-13 11:05:36,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1233.9828, 1542.7189, 2336.1611, 787.226, 1801.8878, 362.55402, 2171.4773, 2034.9617, 315.1639, 861.9862]
2025-09-13 11:05:36,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 358.0, 764.0, 176.0, 1000.0, 872.0, 141.0, 410.0]
2025-09-13 11:05:36,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 48 minutes, 55 seconds)
2025-09-13 11:16:12,213 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:16:12,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:18:32,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1027.57251 ± 678.583
2025-09-13 11:18:32,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [356.55154, 2205.0415, 362.3767, 1502.4368, 1060.33, 2020.8171, 526.21814, 1351.7303, 248.2547, 641.9675]
2025-09-13 11:18:32,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [183.0, 834.0, 134.0, 1000.0, 400.0, 1000.0, 262.0, 588.0, 119.0, 288.0]
2025-09-13 11:18:32,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 32 minutes, 34 seconds)
2025-09-13 11:29:25,813 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:29:25,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:31:08,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 775.85846 ± 660.643
2025-09-13 11:31:08,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [366.12442, 229.50969, 644.126, 2295.0354, 637.00354, 638.4558, 932.13275, 54.134586, 1653.7849, 308.27762]
2025-09-13 11:31:08,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [134.0, 110.0, 310.0, 1000.0, 330.0, 263.0, 444.0, 30.0, 795.0, 158.0]
2025-09-13 11:31:08,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 18 minutes, 12 seconds)
2025-09-13 11:41:41,304 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:41:41,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:44:42,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1376.39282 ± 862.703
2025-09-13 11:44:42,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [239.30627, 1492.2579, 364.73105, 2442.1833, 925.543, 1696.0005, 62.626602, 2243.5508, 2151.3176, 2146.4124]
2025-09-13 11:44:42,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [113.0, 667.0, 177.0, 1000.0, 366.0, 798.0, 53.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:44:42,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 5 minutes, 57 seconds)
2025-09-13 11:55:53,970 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:55:53,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:59:12,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1121.67334 ± 740.773
2025-09-13 11:59:12,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [713.084, 1697.8539, 159.40335, 901.7086, 1928.287, 384.55743, 2141.0994, 991.95447, 2088.1716, 210.61313]
2025-09-13 11:59:12,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 99.0, 1000.0, 1000.0, 224.0, 1000.0, 433.0, 1000.0, 115.0]
2025-09-13 11:59:12,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 54 minutes, 19 seconds)
2025-09-13 12:09:51,128 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:09:51,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:12:28,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 948.32635 ± 527.754
2025-09-13 12:12:28,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [297.94174, 606.7513, 1162.8619, 670.91907, 1419.1349, 1027.2891, 2185.0735, 970.8394, 771.5136, 370.93903]
2025-09-13 12:12:28,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [107.0, 269.0, 496.0, 353.0, 1000.0, 492.0, 1000.0, 467.0, 1000.0, 153.0]
2025-09-13 12:12:28,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 40 minutes, 6 seconds)
2025-09-13 12:22:29,683 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:22:29,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:24:14,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 600.93304 ± 873.074
2025-09-13 12:24:14,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [404.1658, 53.182186, 166.16353, 220.66644, 59.04351, 169.89035, 118.92296, 143.31676, 2361.881, 2312.098]
2025-09-13 12:24:14,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [188.0, 30.0, 81.0, 109.0, 37.0, 1000.0, 84.0, 113.0, 1000.0, 1000.0]
2025-09-13 12:24:14,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 26 minutes, 16 seconds)
2025-09-13 12:34:55,848 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:34:55,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:38:05,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1333.82739 ± 707.295
2025-09-13 12:38:05,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1377.5083, 696.7397, 2361.361, 2368.8647, 800.4053, 193.63512, 2122.1313, 993.8527, 1453.1588, 970.61676]
2025-09-13 12:38:05,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [521.0, 289.0, 995.0, 1000.0, 1000.0, 81.0, 848.0, 549.0, 628.0, 476.0]
2025-09-13 12:38:05,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 13 minutes, 23 seconds)
2025-09-13 12:49:34,839 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:49:34,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:52:04,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 863.53436 ± 606.996
2025-09-13 12:52:04,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [285.29846, 516.34326, 626.40314, 986.3908, 316.70056, 938.56036, 2044.352, 1730.2958, 64.335396, 1126.664]
2025-09-13 12:52:04,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [117.0, 1000.0, 287.0, 1000.0, 106.0, 383.0, 891.0, 686.0, 45.0, 582.0]
2025-09-13 12:52:04,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1251 [DEBUG]: Training session finished
