2025-09-12 10:03:21,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc0-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay
2025-09-12 10:03:21,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc0-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay
2025-09-12 10:03:21,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x146f09ff1550>}
2025-09-12 10:03:21,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1111 [DEBUG]: using device: cuda
2025-09-12 10:03:21,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1133 [INFO]: Creating new trainer
2025-09-12 10:03:22,005 baseline-mbpac-noiseperc0-ant:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-09-12 10:03:22,005 baseline-mbpac-noiseperc0-ant:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 10:03:22,018 baseline-mbpac-noiseperc0-ant:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=27, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=8, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-12 10:03:23,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1194 [DEBUG]: Starting training session...
2025-09-12 10:03:23,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 1/100
2025-09-12 10:15:58,850 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:15:58,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 10:17:34,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: -257.92700 ± 300.312
2025-09-12 10:17:34,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [-124.83229, -183.69096, -99.042465, -128.104, -79.52607, -85.748314, -899.13947, -19.189775, -802.71954, -157.2771]
2025-09-12 10:17:34,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [203.0, 205.0, 171.0, 128.0, 101.0, 87.0, 1000.0, 41.0, 1000.0, 183.0]
2025-09-12 10:17:34,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (-257.93) for latency ExtremeSparseL4U32
2025-09-12 10:17:34,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 23 hours, 24 minutes, 33 seconds)
2025-09-12 10:27:54,274 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:27:54,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 10:30:16,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 36.67141 ± 48.649
2025-09-12 10:30:16,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [113.8972, 86.78508, -1.8571582, 36.890354, 81.98835, 19.923313, 77.15694, -16.460787, -42.29518, 10.685983]
2025-09-12 10:30:16,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 136.0, 149.0, 1000.0, 15.0, 1000.0, 116.0, 270.0, 19.0]
2025-09-12 10:30:16,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (36.67) for latency ExtremeSparseL4U32
2025-09-12 10:30:16,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 21 hours, 57 minutes, 14 seconds)
2025-09-12 10:41:52,673 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:41:52,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 10:44:50,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 203.40163 ± 127.202
2025-09-12 10:44:50,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [59.862568, 94.97798, 304.66336, 255.71712, 433.2181, 319.64465, 158.58546, 78.95587, 285.76196, 42.629025]
2025-09-12 10:44:50,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [154.0, 191.0, 1000.0, 678.0, 1000.0, 1000.0, 506.0, 159.0, 1000.0, 149.0]
2025-09-12 10:44:50,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (203.40) for latency ExtremeSparseL4U32
2025-09-12 10:44:50,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 22 hours, 20 minutes, 16 seconds)
2025-09-12 10:54:50,557 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:54:50,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 10:57:41,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 284.76666 ± 201.248
2025-09-12 10:57:41,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [415.4687, 13.0696945, 242.97652, 540.0524, 219.3447, 180.21365, 426.78406, 79.95271, 645.09515, 84.70909]
2025-09-12 10:57:41,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 27.0, 530.0, 1000.0, 356.0, 494.0, 1000.0, 138.0, 1000.0, 98.0]
2025-09-12 10:57:41,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (284.77) for latency ExtremeSparseL4U32
2025-09-12 10:57:41,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 21 hours, 43 minutes, 26 seconds)
2025-09-12 11:08:53,231 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:08:53,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 11:12:21,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 382.44510 ± 214.808
2025-09-12 11:12:21,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [592.7901, 352.816, 161.0329, 611.5641, 549.6603, 23.35715, 499.06522, 431.07245, 555.3393, 47.753113]
2025-09-12 11:12:21,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 635.0, 255.0, 1000.0, 1000.0, 27.0, 1000.0, 1000.0, 1000.0, 76.0]
2025-09-12 11:12:21,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (382.45) for latency ExtremeSparseL4U32
2025-09-12 11:12:21,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 21 hours, 50 minutes, 21 seconds)
2025-09-12 11:23:33,846 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:23:33,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 11:28:46,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 540.87457 ± 103.118
2025-09-12 11:28:46,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [617.9758, 434.60757, 601.9157, 493.2376, 747.78357, 528.55524, 557.9744, 609.8272, 410.60757, 406.2607]
2025-09-12 11:28:46,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:28:46,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (540.87) for latency ExtremeSparseL4U32
2025-09-12 11:28:46,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 22 hours, 18 minutes, 27 seconds)
2025-09-12 11:41:00,291 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:41:00,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 11:46:08,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 735.45117 ± 75.386
2025-09-12 11:46:08,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [559.0629, 719.97455, 854.29785, 713.3066, 743.5507, 670.4431, 754.70337, 765.49054, 783.9787, 789.7029]
2025-09-12 11:46:08,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [708.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:46:08,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (735.45) for latency ExtremeSparseL4U32
2025-09-12 11:46:08,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 23 hours, 31 minutes, 8 seconds)
2025-09-12 11:57:03,048 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:57:03,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 12:02:04,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 805.03375 ± 133.908
2025-09-12 12:02:04,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [952.1546, 723.4388, 1040.2595, 758.44995, 644.1464, 1015.11786, 733.5843, 745.138, 719.77563, 718.27277]
2025-09-12 12:02:04,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:02:04,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (805.03) for latency ExtremeSparseL4U32
2025-09-12 12:02:04,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 23 hours, 41 minutes, 1 second)
2025-09-12 12:13:15,978 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:13:15,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 12:18:22,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 822.47327 ± 119.901
2025-09-12 12:18:22,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [766.4422, 712.96387, 865.02264, 903.3986, 732.4452, 990.92834, 795.20105, 699.20654, 704.4427, 1054.6815]
2025-09-12 12:18:22,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:18:22,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (822.47) for latency ExtremeSparseL4U32
2025-09-12 12:18:22,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 24 hours, 28 minutes, 17 seconds)
2025-09-12 12:29:27,294 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:29:27,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 12:33:50,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 842.05872 ± 316.840
2025-09-12 12:33:50,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [849.4957, 110.60809, 754.55975, 707.11304, 1255.4374, 959.3701, 1160.1675, 777.6239, 1187.4987, 658.71234]
2025-09-12 12:33:50,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 79.0, 572.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:33:50,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (842.06) for latency ExtremeSparseL4U32
2025-09-12 12:33:50,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 24 hours, 26 minutes, 54 seconds)
2025-09-12 12:44:45,899 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:44:45,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 12:49:51,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1020.44446 ± 278.309
2025-09-12 12:49:51,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [554.17255, 967.8331, 1376.1951, 778.4209, 1401.63, 872.76404, 715.986, 1302.5173, 1191.798, 1043.1268]
2025-09-12 12:49:51,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 987.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:49:51,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (1020.44) for latency ExtremeSparseL4U32
2025-09-12 12:49:51,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 24 hours, 3 minutes, 22 seconds)
2025-09-12 13:00:10,927 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:00:10,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 13:03:14,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 670.28558 ± 442.525
2025-09-12 13:03:14,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [332.7009, 1188.9668, 1098.331, 764.5378, 556.23254, 257.65048, 43.487022, 127.16813, 1263.8971, 1069.8844]
2025-09-12 13:03:14,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [241.0, 989.0, 1000.0, 1000.0, 432.0, 192.0, 38.0, 99.0, 1000.0, 1000.0]
2025-09-12 13:03:14,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 22 hours, 37 minutes, 6 seconds)
2025-09-12 13:14:38,947 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:14:38,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 13:18:55,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1083.14905 ± 381.943
2025-09-12 13:18:55,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1276.3895, 1327.1003, 461.1649, 1257.1226, 201.45346, 1318.771, 1192.976, 1264.6974, 1272.5979, 1259.2172]
2025-09-12 13:18:55,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 384.0, 1000.0, 138.0, 1000.0, 894.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:18:55,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (1083.15) for latency ExtremeSparseL4U32
2025-09-12 13:18:55,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 22 hours, 17 minutes, 24 seconds)
2025-09-12 13:30:12,772 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:30:12,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 13:35:12,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1379.92358 ± 166.165
2025-09-12 13:35:12,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1453.4543, 1435.5348, 1398.8479, 1232.2671, 1517.0494, 1577.1913, 1388.9869, 962.95026, 1491.0074, 1341.9453]
2025-09-12 13:35:12,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:35:12,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (1379.92) for latency ExtremeSparseL4U32
2025-09-12 13:35:12,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 22 hours, 1 minute, 40 seconds)
2025-09-12 13:46:20,827 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:46:20,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 13:51:22,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1330.55640 ± 421.406
2025-09-12 13:51:22,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1596.6332, 1573.3314, 386.6011, 1515.2705, 1471.2677, 1617.7285, 1558.9388, 1494.5702, 611.3233, 1479.8999]
2025-09-12 13:51:22,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 968.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:51:22,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 21 hours, 57 minutes, 59 seconds)
2025-09-12 14:02:04,148 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:02:04,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 14:06:31,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1203.69043 ± 420.222
2025-09-12 14:06:31,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [654.948, 1481.2421, 1017.3068, 1329.9614, 1558.8086, 1632.6517, 690.01526, 512.6045, 1616.5042, 1542.8624]
2025-09-12 14:06:31,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [416.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 370.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:06:31,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 21 hours, 28 minutes, 9 seconds)
2025-09-12 14:17:39,238 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:17:39,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 14:22:08,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1429.42529 ± 375.689
2025-09-12 14:22:08,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1600.1111, 1628.6401, 1605.4552, 1603.3121, 1667.0281, 619.4645, 743.4456, 1616.6039, 1633.0707, 1577.1222]
2025-09-12 14:22:08,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 384.0, 474.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:22:08,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (1429.43) for latency ExtremeSparseL4U32
2025-09-12 14:22:08,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 21 hours, 49 minutes, 38 seconds)
2025-09-12 14:34:03,531 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:34:03,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 14:37:23,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 954.00183 ± 525.880
2025-09-12 14:37:23,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1107.2631, 1200.6621, 500.98105, 221.55115, 1549.3883, 1596.1348, 501.95685, 946.2429, 1653.6912, 262.1471]
2025-09-12 14:37:23,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 749.0, 268.0, 134.0, 989.0, 1000.0, 303.0, 523.0, 1000.0, 170.0]
2025-09-12 14:37:23,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 21 hours, 26 minutes, 51 seconds)
2025-09-12 14:47:52,603 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:47:52,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 14:51:22,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 694.66779 ± 383.314
2025-09-12 14:51:22,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [591.8431, 478.79028, 203.88745, 185.4351, 523.2341, 543.0414, 983.1583, 1145.749, 1433.7566, 857.782]
2025-09-12 14:51:22,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 168.0, 129.0, 331.0, 1000.0, 1000.0, 704.0, 901.0, 585.0]
2025-09-12 14:51:22,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 20 hours, 33 minutes, 50 seconds)
2025-09-12 15:02:22,317 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:02:22,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 15:06:17,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1149.39343 ± 443.046
2025-09-12 15:06:17,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1116.7365, 1743.5338, 664.6149, 1366.4159, 1725.2252, 1280.3535, 1558.0326, 371.34085, 776.89966, 890.7826]
2025-09-12 15:06:17,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [679.0, 1000.0, 1000.0, 1000.0, 1000.0, 732.0, 986.0, 205.0, 429.0, 620.0]
2025-09-12 15:06:17,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 19 hours, 58 minutes, 37 seconds)
2025-09-12 15:17:35,000 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:17:35,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 15:22:43,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1441.78271 ± 470.358
2025-09-12 15:22:43,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1696.4045, 714.7248, 1665.0398, 1690.5483, 1829.4908, 1787.7041, 652.56555, 1776.298, 821.0371, 1784.0135]
2025-09-12 15:22:43,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:22:43,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (1441.78) for latency ExtremeSparseL4U32
2025-09-12 15:22:43,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 20 hours, 3 minutes, 52 seconds)
2025-09-12 15:33:35,504 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:33:35,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 15:38:36,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1657.40161 ± 285.176
2025-09-12 15:38:36,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1808.3796, 1723.54, 1774.4376, 1797.0453, 1358.1733, 1741.4817, 1721.0094, 927.3233, 1995.5872, 1727.0371]
2025-09-12 15:38:36,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:38:36,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (1657.40) for latency ExtremeSparseL4U32
2025-09-12 15:38:36,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 19 hours, 52 minutes, 49 seconds)
2025-09-12 15:49:55,246 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:49:55,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 15:54:08,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1233.74463 ± 487.561
2025-09-12 15:54:08,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [545.91815, 1642.3137, 815.8341, 1411.07, 1690.565, 1791.7062, 754.6286, 1701.6628, 519.8965, 1463.8513]
2025-09-12 15:54:08,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [301.0, 1000.0, 1000.0, 1000.0, 872.0, 1000.0, 1000.0, 1000.0, 293.0, 942.0]
2025-09-12 15:54:08,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 19 hours, 41 minutes, 57 seconds)
2025-09-12 16:04:26,833 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:04:26,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 16:09:13,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1344.56653 ± 528.893
2025-09-12 16:09:13,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1782.2277, 1175.2108, 492.79895, 517.6441, 1865.2354, 760.1108, 1764.3303, 1693.8441, 1761.3837, 1632.8785]
2025-09-12 16:09:13,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 294.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:09:13,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 19 hours, 43 minutes, 12 seconds)
2025-09-12 16:19:52,091 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:19:52,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 16:24:08,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1484.97559 ± 552.750
2025-09-12 16:24:08,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1627.4464, 1783.7961, 1173.4579, 375.20566, 2013.5776, 1805.3667, 557.0166, 1829.1207, 1841.5237, 1843.2439]
2025-09-12 16:24:08,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [865.0, 1000.0, 1000.0, 212.0, 1000.0, 1000.0, 311.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:24:08,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 19 hours, 27 minutes, 51 seconds)
2025-09-12 16:36:02,752 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:36:02,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 16:40:20,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1053.28723 ± 525.916
2025-09-12 16:40:20,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1733.0715, 1036.0662, 733.90094, 1503.8424, 728.1277, 1100.2566, 1040.0828, 29.08219, 1908.0148, 720.42694]
2025-09-12 16:40:20,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 374.0, 1000.0, 1000.0, 1000.0, 1000.0, 47.0, 1000.0, 1000.0]
2025-09-12 16:40:20,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 19 hours, 8 minutes, 36 seconds)
2025-09-12 16:51:14,365 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:51:14,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 16:56:16,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1752.15820 ± 93.930
2025-09-12 16:56:16,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1689.6333, 1794.9143, 1905.8472, 1714.2268, 1854.9924, 1671.8394, 1754.5574, 1594.2717, 1856.6398, 1684.6595]
2025-09-12 16:56:16,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:56:16,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (1752.16) for latency ExtremeSparseL4U32
2025-09-12 16:56:16,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 18 hours, 53 minutes, 55 seconds)
2025-09-12 17:07:05,488 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:07:05,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 17:11:23,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1427.83179 ± 518.657
2025-09-12 17:11:23,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1200.3248, 1270.7712, 754.5153, 1680.7119, 1993.7996, 1926.611, 1977.9122, 417.36078, 1831.0126, 1225.298]
2025-09-12 17:11:23,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [695.0, 1000.0, 431.0, 977.0, 1000.0, 1000.0, 1000.0, 294.0, 1000.0, 1000.0]
2025-09-12 17:11:23,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 18 hours, 32 minutes, 21 seconds)
2025-09-12 17:21:26,885 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:21:26,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 17:25:29,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1431.60522 ± 589.141
2025-09-12 17:25:29,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1466.7465, 1465.664, 1474.9065, 1932.7482, 1731.8685, 534.94415, 1818.3031, 115.24565, 1781.2529, 1994.3728]
2025-09-12 17:25:29,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [822.0, 794.0, 1000.0, 1000.0, 1000.0, 339.0, 1000.0, 60.0, 1000.0, 1000.0]
2025-09-12 17:25:29,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 18 hours, 3 minutes, 3 seconds)
2025-09-12 17:36:29,594 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:36:29,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 17:41:15,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1218.57275 ± 605.900
2025-09-12 17:41:15,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1785.123, 1806.2731, 490.36035, 589.3761, 1627.0378, 1986.8685, 658.91595, 1868.7983, 784.5503, 588.4244]
2025-09-12 17:41:15,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 899.0, 1000.0, 1000.0, 1000.0, 1000.0, 323.0]
2025-09-12 17:41:15,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 17 hours, 59 minutes, 27 seconds)
2025-09-12 17:53:11,083 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:53:11,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 17:57:00,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1137.59570 ± 550.977
2025-09-12 17:57:00,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [837.66736, 214.40115, 1603.2593, 1438.1793, 1006.7357, 1362.493, 1899.0337, 794.2476, 1823.9067, 396.0337]
2025-09-12 17:57:00,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [615.0, 128.0, 894.0, 814.0, 569.0, 678.0, 940.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:57:00,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 17 hours, 37 minutes, 59 seconds)
2025-09-12 18:07:51,165 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:07:51,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 18:12:38,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1650.49927 ± 349.995
2025-09-12 18:12:38,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1075.1742, 1750.834, 1904.7744, 1862.6849, 1975.1222, 1126.3416, 1886.6346, 1170.8367, 1932.3988, 1820.1918]
2025-09-12 18:12:38,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 964.0, 1000.0, 1000.0, 1000.0, 577.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:12:38,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 17 hours, 18 minutes, 41 seconds)
2025-09-12 18:23:31,301 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:23:31,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 18:28:15,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1839.79761 ± 295.976
2025-09-12 18:28:15,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1984.7256, 1977.1444, 2027.3256, 2007.6257, 1930.0663, 1954.3838, 1994.9124, 1631.4312, 1013.5466, 1876.8136]
2025-09-12 18:28:15,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 826.0, 530.0, 1000.0]
2025-09-12 18:28:15,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (1839.80) for latency ExtremeSparseL4U32
2025-09-12 18:28:15,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 17 hours, 9 minutes, 53 seconds)
2025-09-12 18:38:26,232 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:38:26,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 18:41:35,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1204.59766 ± 687.286
2025-09-12 18:41:35,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [390.92798, 767.31836, 2008.4135, 1842.1616, 1439.7386, 221.43825, 739.16876, 2117.2896, 1868.0878, 651.4314]
2025-09-12 18:41:35,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [215.0, 389.0, 1000.0, 1000.0, 795.0, 110.0, 365.0, 1000.0, 1000.0, 368.0]
2025-09-12 18:41:35,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 16 hours, 44 minutes, 30 seconds)
2025-09-12 18:52:46,342 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:52:46,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 18:55:51,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1083.58411 ± 714.147
2025-09-12 18:55:51,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1883.499, 278.96918, 1997.1569, 649.881, 242.97566, 274.46548, 1704.0955, 1620.8453, 1706.3716, 477.58063]
2025-09-12 18:55:51,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [905.0, 147.0, 1000.0, 296.0, 130.0, 152.0, 860.0, 760.0, 829.0, 1000.0]
2025-09-12 18:55:51,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 16 hours, 9 minutes, 53 seconds)
2025-09-12 19:06:26,936 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:06:26,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 19:11:02,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1530.89355 ± 608.081
2025-09-12 19:11:02,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2009.4548, 2055.8623, 1721.2108, 251.59653, 911.3101, 1915.222, 1764.3784, 2045.0234, 1855.3195, 779.5574]
2025-09-12 19:11:02,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 168.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:11:02,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 15 hours, 47 minutes, 35 seconds)
2025-09-12 19:21:57,435 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:21:57,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 19:26:28,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1321.28687 ± 674.061
2025-09-12 19:26:28,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [397.7509, 515.7477, 1787.3801, 2038.2268, 1980.2817, 2025.8279, 814.78375, 2015.7654, 1078.8828, 558.223]
2025-09-12 19:26:28,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [207.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 587.0, 1000.0]
2025-09-12 19:26:28,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 15 hours, 30 minutes, 17 seconds)
2025-09-12 19:38:20,951 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:38:20,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 19:43:22,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1870.28088 ± 113.384
2025-09-12 19:43:22,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1670.6289, 2004.7135, 1809.1858, 1827.0592, 1981.1207, 1969.098, 1996.9374, 1790.5531, 1918.3751, 1735.1376]
2025-09-12 19:43:22,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:43:22,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (1870.28) for latency ExtremeSparseL4U32
2025-09-12 19:43:22,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 15 hours, 31 minutes, 27 seconds)
2025-09-12 19:53:47,640 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:53:47,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 19:58:18,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1782.66077 ± 480.692
2025-09-12 19:58:18,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1974.2462, 1988.5323, 1984.9604, 366.5837, 2009.518, 2027.8118, 1923.7883, 1779.8237, 2013.4497, 1757.895]
2025-09-12 19:58:18,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 192.0, 1000.0, 1000.0, 1000.0, 915.0, 1000.0, 1000.0]
2025-09-12 19:58:18,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 15 hours, 36 minutes, 2 seconds)
2025-09-12 20:09:28,277 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:09:28,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:13:25,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1569.35974 ± 575.552
2025-09-12 20:13:25,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1779.612, 2113.5237, 1930.1547, 2036.5609, 1772.6848, 1326.124, 213.87975, 922.513, 2083.765, 1514.7803]
2025-09-12 20:13:25,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 946.0, 1000.0, 866.0, 644.0, 88.0, 399.0, 1000.0, 1000.0]
2025-09-12 20:13:25,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 30 minutes, 47 seconds)
2025-09-12 20:23:56,469 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:23:56,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:26:00,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 807.46643 ± 584.791
2025-09-12 20:26:00,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [208.81853, 1574.4424, 257.58646, 187.47768, 1311.1571, 1502.6808, 1025.0712, 343.23172, 1458.7229, 205.47627]
2025-09-12 20:26:00,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [106.0, 771.0, 136.0, 125.0, 599.0, 824.0, 463.0, 182.0, 708.0, 103.0]
2025-09-12 20:26:00,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 14 hours, 44 minutes, 39 seconds)
2025-09-12 20:37:32,404 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:37:32,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:41:40,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1423.61694 ± 607.189
2025-09-12 20:41:40,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [471.0194, 496.3713, 2213.221, 960.36835, 1944.104, 1797.3628, 2146.0935, 1184.3772, 1706.806, 1316.4465]
2025-09-12 20:41:40,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [192.0, 1000.0, 1000.0, 457.0, 1000.0, 1000.0, 1000.0, 557.0, 1000.0, 1000.0]
2025-09-12 20:41:40,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 14 hours, 32 minutes, 24 seconds)
2025-09-12 20:51:46,090 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:51:46,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:56:06,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1555.63562 ± 694.977
2025-09-12 20:56:06,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2123.1228, 647.7531, 2068.937, 2195.8691, 2198.5076, 449.88544, 1976.9518, 1298.737, 2004.8182, 591.77484]
2025-09-12 20:56:06,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 230.0, 1000.0, 1000.0, 1000.0, 294.0]
2025-09-12 20:56:06,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 13 hours, 49 minutes, 10 seconds)
2025-09-12 21:07:25,544 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:07:25,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:12:28,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1989.04785 ± 144.909
2025-09-12 21:12:28,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1976.9633, 1696.1154, 2007.6508, 2180.5867, 2088.9512, 2007.7368, 2118.6235, 2116.9575, 1803.8356, 1893.056]
2025-09-12 21:12:28,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [955.0, 804.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:12:28,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (1989.05) for latency ExtremeSparseL4U32
2025-09-12 21:12:28,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 13 hours, 50 minutes, 33 seconds)
2025-09-12 21:23:32,616 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:23:32,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:26:02,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1019.99719 ± 765.931
2025-09-12 21:26:02,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2121.0676, 172.50941, 2000.6986, 2108.2458, 402.3138, 245.26846, 815.61017, 155.30722, 1022.41376, 1156.5369]
2025-09-12 21:26:02,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 85.0, 1000.0, 1000.0, 188.0, 124.0, 376.0, 64.0, 460.0, 580.0]
2025-09-12 21:26:02,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 13 hours, 18 minutes, 42 seconds)
2025-09-12 21:37:29,467 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:37:29,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:41:25,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1441.33228 ± 765.500
2025-09-12 21:41:25,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [715.83527, 2013.5818, 603.6523, 2087.739, 2153.9573, 2073.436, 1992.2432, 448.50745, 2038.0005, 286.37015]
2025-09-12 21:41:25,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [334.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 216.0, 1000.0, 155.0]
2025-09-12 21:41:25,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 13 hours, 34 minutes, 37 seconds)
2025-09-12 21:52:25,390 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:52:25,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:57:02,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1721.74280 ± 577.215
2025-09-12 21:57:02,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1981.0576, 1037.2028, 1811.9135, 1530.6697, 2231.495, 1927.1895, 2056.4116, 2259.9446, 2052.0828, 329.4604]
2025-09-12 21:57:02,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 147.0]
2025-09-12 21:57:02,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 13 hours, 18 minutes, 45 seconds)
2025-09-12 22:07:50,985 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:07:50,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:11:58,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1478.75073 ± 597.434
2025-09-12 22:11:58,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [899.03253, 2239.8892, 2171.3696, 998.4295, 1888.0256, 1944.2402, 1916.8884, 791.7577, 1402.2424, 535.63324]
2025-09-12 22:11:58,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [418.0, 1000.0, 1000.0, 476.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 262.0]
2025-09-12 22:11:58,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 13 hours, 9 minutes, 5 seconds)
2025-09-12 22:23:24,775 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:23:24,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:28:34,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1914.69080 ± 189.230
2025-09-12 22:28:34,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1986.1666, 1850.8401, 1482.0925, 2071.1604, 2117.2847, 1823.4197, 1746.469, 2150.2263, 1953.273, 1965.9767]
2025-09-12 22:28:34,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:28:34,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 12 hours, 56 minutes, 18 seconds)
2025-09-12 22:39:33,982 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:39:33,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:43:07,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1469.20837 ± 794.514
2025-09-12 22:43:07,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [716.5296, 1972.4113, 846.8208, 2179.9165, 2196.3428, 2197.9626, 2123.2336, 1924.4174, 177.75404, 356.69482]
2025-09-12 22:43:07,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [367.0, 913.0, 442.0, 1000.0, 1000.0, 1000.0, 1000.0, 935.0, 94.0, 181.0]
2025-09-12 22:43:07,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 12 hours, 50 minutes, 57 seconds)
2025-09-12 22:54:20,056 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:54:20,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:58:32,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1788.52271 ± 586.052
2025-09-12 22:58:32,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2241.1836, 1959.0764, 2149.2112, 1177.196, 1502.9729, 2367.9126, 1910.7181, 2237.8586, 1974.67, 364.42868]
2025-09-12 22:58:32,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 592.0, 709.0, 1000.0, 1000.0, 1000.0, 846.0, 166.0]
2025-09-12 22:58:33,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 12 hours, 35 minutes, 45 seconds)
2025-09-12 23:10:25,640 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:10:25,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:14:39,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1825.28784 ± 616.159
2025-09-12 23:14:39,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [206.68622, 1539.9431, 2057.1013, 2327.0698, 1404.1036, 2021.933, 2158.6099, 2240.2935, 1946.9713, 2350.1672]
2025-09-12 23:14:39,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [102.0, 706.0, 1000.0, 1000.0, 648.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:14:39,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 25 minutes, 13 seconds)
2025-09-12 23:25:39,195 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:25:39,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:28:55,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1057.40625 ± 632.630
2025-09-12 23:28:55,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [469.01282, 2153.243, 1929.949, 1549.0236, 1132.7891, 418.9657, 229.63753, 501.37338, 1239.3096, 950.75916]
2025-09-12 23:28:55,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 548.0, 173.0, 111.0, 207.0, 532.0, 1000.0]
2025-09-12 23:28:55,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 3 minutes, 19 seconds)
2025-09-12 23:39:41,575 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:39:41,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:43:18,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1258.83533 ± 653.126
2025-09-12 23:43:18,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1027.5897, 1712.1495, 693.94354, 1920.1804, 979.06934, 1975.901, 1653.1808, 2004.5005, 37.768692, 584.07056]
2025-09-12 23:43:18,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [475.0, 1000.0, 1000.0, 1000.0, 465.0, 1000.0, 1000.0, 1000.0, 35.0, 313.0]
2025-09-12 23:43:18,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 11 hours, 27 minutes, 31 seconds)
2025-09-12 23:53:43,555 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:53:43,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:58:15,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1710.13354 ± 584.483
2025-09-12 23:58:15,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2463.6743, 1051.99, 1977.5907, 1611.3544, 799.24896, 2166.3523, 2170.9048, 1928.0438, 2143.2827, 788.8943]
2025-09-12 23:58:15,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 540.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 401.0]
2025-09-12 23:58:15,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 11 hours, 16 minutes, 9 seconds)
2025-09-13 00:09:15,950 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:09:15,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:12:59,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1453.40051 ± 718.966
2025-09-13 00:12:59,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2316.2725, 2241.1477, 831.6071, 2177.0325, 991.85, 1997.6198, 393.36963, 1678.6659, 1548.9491, 357.49112]
2025-09-13 00:12:59,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 407.0, 1000.0, 1000.0, 1000.0, 179.0, 1000.0, 625.0, 138.0]
2025-09-13 00:12:59,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 10 hours, 55 minutes, 8 seconds)
2025-09-13 00:24:20,678 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:24:20,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:28:15,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1595.53320 ± 767.023
2025-09-13 00:28:15,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2400.5188, 2087.6863, 799.60315, 387.9244, 2224.1887, 335.5496, 2285.8875, 1803.6963, 2203.7688, 1426.5087]
2025-09-13 00:28:15,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 417.0, 186.0, 1000.0, 202.0, 1000.0, 928.0, 1000.0, 1000.0]
2025-09-13 00:28:15,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 10 hours, 32 minutes, 56 seconds)
2025-09-13 00:39:02,380 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:39:02,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:43:52,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2131.86523 ± 319.159
2025-09-13 00:43:52,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2283.4736, 2425.3928, 2207.6223, 2206.912, 2069.0947, 2214.9382, 2220.3533, 2319.3508, 2158.127, 1213.3889]
2025-09-13 00:43:52,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 504.0]
2025-09-13 00:43:52,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (2131.87) for latency ExtremeSparseL4U32
2025-09-13 00:43:52,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 10 hours, 29 minutes, 35 seconds)
2025-09-13 00:54:39,306 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:54:39,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:58:50,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1384.18262 ± 815.377
2025-09-13 00:58:50,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [79.700485, 1500.3729, 2148.0122, 2310.4065, 2135.421, 818.0087, 479.85464, 2085.8977, 1910.0388, 374.11298]
2025-09-13 00:58:50,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [51.0, 702.0, 1000.0, 1000.0, 1000.0, 508.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:58:50,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 10 hours, 19 minutes, 24 seconds)
2025-09-13 01:09:46,112 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:09:46,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:14:38,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1959.29553 ± 261.407
2025-09-13 01:14:38,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2023.4346, 2287.3047, 2046.2073, 1478.3827, 2174.941, 1869.1267, 2095.4167, 1483.5548, 2137.9353, 1996.6531]
2025-09-13 01:14:38,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 665.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:14:38,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 10 hours, 11 minutes, 5 seconds)
2025-09-13 01:25:32,038 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:25:32,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:29:11,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1307.47156 ± 513.461
2025-09-13 01:29:11,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1567.4379, 1636.4655, 1012.4554, 1222.251, 1226.4529, 2389.9282, 742.8638, 1136.7395, 1673.1246, 466.99683]
2025-09-13 01:29:11,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [705.0, 691.0, 554.0, 472.0, 478.0, 1000.0, 1000.0, 456.0, 1000.0, 1000.0]
2025-09-13 01:29:11,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 9 hours, 54 minutes, 19 seconds)
2025-09-13 01:40:24,414 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:40:24,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:44:04,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1379.37830 ± 896.720
2025-09-13 01:44:04,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2374.452, 1069.7611, 602.3429, 636.28143, 1222.2391, 458.73358, 2363.5654, 126.38161, 2537.7976, 2402.2285]
2025-09-13 01:44:04,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 481.0, 1000.0, 292.0, 488.0, 1000.0, 1000.0, 62.0, 1000.0, 1000.0]
2025-09-13 01:44:04,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 9 hours, 36 minutes, 11 seconds)
2025-09-13 01:55:09,148 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:55:09,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:59:07,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1702.50842 ± 742.330
2025-09-13 01:59:07,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2359.4856, 2306.6372, 814.41345, 311.59573, 2380.7478, 1974.1208, 2299.515, 1074.8073, 2344.127, 1159.6333]
2025-09-13 01:59:07,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 362.0, 132.0, 922.0, 1000.0, 1000.0, 1000.0, 1000.0, 545.0]
2025-09-13 01:59:07,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 9 hours, 16 minutes, 48 seconds)
2025-09-13 02:10:06,081 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:10:06,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:14:56,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1960.96558 ± 659.656
2025-09-13 02:14:56,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2041.5074, 930.1811, 2279.7502, 2290.4558, 2269.5303, 2360.6292, 2378.7583, 2500.4224, 435.94742, 2122.475]
2025-09-13 02:14:56,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 450.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:14:56,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 9 hours, 7 minutes, 49 seconds)
2025-09-13 02:26:28,388 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:26:28,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:31:01,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1654.44800 ± 810.935
2025-09-13 02:31:01,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1949.8605, 2151.391, 93.295586, 2129.7307, 2179.7148, 712.8052, 2281.0898, 2220.4663, 522.37286, 2303.7546]
2025-09-13 02:31:01,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 51.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:31:01,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 8 hours, 54 minutes, 36 seconds)
2025-09-13 02:41:54,331 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:41:54,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:46:39,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2154.39697 ± 432.405
2025-09-13 02:46:39,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2525.2983, 1849.6185, 2080.872, 2421.946, 2219.1475, 984.6077, 2422.9604, 2322.077, 2413.3003, 2304.14]
2025-09-13 02:46:39,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 839.0, 1000.0, 1000.0, 1000.0, 467.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:46:39,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (2154.40) for latency ExtremeSparseL4U32
2025-09-13 02:46:39,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 8 hours, 46 minutes, 42 seconds)
2025-09-13 02:56:50,202 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:56:50,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:01:48,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1844.12537 ± 493.282
2025-09-13 03:01:48,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2436.6602, 1972.463, 792.1152, 2135.4163, 1697.7737, 1727.4543, 2311.1438, 2076.6794, 2139.8499, 1151.6993]
2025-09-13 03:01:48,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 802.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:01:48,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 33 minutes)
2025-09-13 03:12:56,787 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:12:56,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:17:11,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2067.66846 ± 675.572
2025-09-13 03:17:11,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2579.0178, 1138.0677, 2427.1594, 1095.681, 2243.95, 961.5497, 2578.088, 2825.7412, 2546.151, 2281.282]
2025-09-13 03:17:11,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 489.0, 957.0, 509.0, 1000.0, 365.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:17:11,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 19 minutes, 37 seconds)
2025-09-13 03:28:20,745 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:28:20,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:33:09,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2013.36975 ± 605.964
2025-09-13 03:33:09,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2232.109, 2320.9727, 2341.1091, 2342.6228, 858.39954, 801.5274, 2317.143, 2182.5586, 2106.1428, 2631.1116]
2025-09-13 03:33:09,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 389.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:33:09,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 4 minutes, 56 seconds)
2025-09-13 03:44:31,605 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:44:31,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:49:33,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2044.42346 ± 785.685
2025-09-13 03:49:33,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2792.3308, 2612.1375, 2536.6738, 413.40973, 2361.13, 2349.5317, 2153.8774, 696.2431, 2631.2256, 1897.6764]
2025-09-13 03:49:33,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 925.0, 886.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:49:33,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 51 minutes, 13 seconds)
2025-09-13 04:00:03,827 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:00:03,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:03:39,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1630.42969 ± 597.651
2025-09-13 04:03:39,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1087.5544, 1257.3662, 2445.4238, 2631.966, 1348.2783, 1457.2037, 1242.6088, 2508.1157, 1123.2559, 1202.5237]
2025-09-13 04:03:39,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [416.0, 540.0, 1000.0, 1000.0, 578.0, 1000.0, 504.0, 1000.0, 425.0, 544.0]
2025-09-13 04:03:39,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 26 minutes, 40 seconds)
2025-09-13 04:15:31,295 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:15:31,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:19:13,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1470.98230 ± 851.411
2025-09-13 04:19:13,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2642.8545, 1253.5541, 1639.5806, 879.5967, 1456.5605, 526.2378, 2656.5771, 656.19366, 384.34375, 2614.3242]
2025-09-13 04:19:13,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 615.0, 652.0, 1000.0, 551.0, 210.0, 1000.0, 1000.0, 258.0, 1000.0]
2025-09-13 04:19:13,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 13 minutes, 35 seconds)
2025-09-13 04:30:07,249 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:30:07,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:33:53,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1740.05664 ± 821.053
2025-09-13 04:33:53,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2226.1426, 2318.2278, 1184.4705, 2377.7825, 468.0893, 771.383, 655.9215, 2314.834, 2773.8406, 2309.8748]
2025-09-13 04:33:53,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 565.0, 1000.0, 244.0, 308.0, 267.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:33:53,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 54 minutes, 12 seconds)
2025-09-13 04:44:37,203 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:44:37,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:49:36,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2340.88867 ± 222.781
2025-09-13 04:49:36,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2634.6816, 2347.1045, 2238.1934, 2420.3325, 2684.6016, 2393.2183, 2275.7344, 1829.1998, 2320.7463, 2265.0745]
2025-09-13 04:49:36,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:49:36,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (2340.89) for latency ExtremeSparseL4U32
2025-09-13 04:49:36,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 37 minutes, 37 seconds)
2025-09-13 05:00:08,614 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:00:08,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:04:04,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2007.86987 ± 826.184
2025-09-13 05:04:04,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2787.2231, 2036.7605, 987.6987, 1859.3772, 99.150444, 2035.1877, 2611.216, 2963.8364, 2346.203, 2352.0461]
2025-09-13 05:04:04,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 800.0, 321.0, 689.0, 41.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:04:04,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 12 minutes, 37 seconds)
2025-09-13 05:15:07,029 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:15:07,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:18:29,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1304.16870 ± 953.942
2025-09-13 05:18:29,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1713.0552, 94.68686, 479.40793, 162.5532, 1591.1627, 2030.9242, 2565.2197, 1570.1752, 154.44034, 2680.062]
2025-09-13 05:18:29,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 48.0, 1000.0, 82.0, 1000.0, 905.0, 1000.0, 585.0, 87.0, 1000.0]
2025-09-13 05:18:29,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 59 minutes, 9 seconds)
2025-09-13 05:30:22,812 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:30:22,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:35:00,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1986.22852 ± 717.813
2025-09-13 05:35:00,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2105.6765, 2508.2268, 2655.674, 1911.7045, 445.91858, 2590.53, 2396.1926, 806.86304, 2272.619, 2168.881]
2025-09-13 05:35:00,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 351.0, 1000.0, 899.0]
2025-09-13 05:35:00,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 48 minutes, 35 seconds)
2025-09-13 05:45:48,264 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:45:48,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:50:03,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1899.36304 ± 716.170
2025-09-13 05:50:03,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2476.2722, 679.0628, 1074.7211, 2746.2722, 937.8264, 1662.6312, 2564.8423, 2214.065, 2451.469, 2186.4685]
2025-09-13 05:50:03,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 284.0, 1000.0, 1000.0, 376.0, 680.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:50:03,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 35 minutes, 8 seconds)
2025-09-13 06:00:33,509 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:00:33,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:05:29,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1846.54651 ± 832.226
2025-09-13 06:05:29,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1651.3259, 1432.0734, 2293.8164, 2902.7705, 2870.6992, 562.08875, 871.30646, 949.04095, 2168.4849, 2763.858]
2025-09-13 06:05:29,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [564.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:05:29,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 18 minutes, 39 seconds)
2025-09-13 06:16:16,099 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:16:16,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:19:59,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1643.76489 ± 711.340
2025-09-13 06:19:59,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1567.1167, 1616.5337, 2577.7832, 1261.4092, 2607.999, 1421.6505, 408.37402, 2660.2349, 1269.9888, 1046.5594]
2025-09-13 06:19:59,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [619.0, 1000.0, 1000.0, 519.0, 1000.0, 577.0, 197.0, 1000.0, 1000.0, 372.0]
2025-09-13 06:19:59,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 3 minutes, 40 seconds)
2025-09-13 06:31:50,870 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:31:50,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:36:34,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2199.21826 ± 632.951
2025-09-13 06:36:34,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2919.7031, 2762.8442, 2601.458, 552.05035, 2082.724, 1869.7919, 2306.4026, 2270.031, 2058.851, 2568.3276]
2025-09-13 06:36:34,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 230.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:36:34,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 56 minutes, 44 seconds)
2025-09-13 06:47:46,434 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:47:46,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:51:58,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1926.13538 ± 931.885
2025-09-13 06:51:58,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2728.0981, 2547.3772, 561.37494, 2797.2708, 297.87665, 1705.1652, 2792.9976, 2383.9763, 920.3653, 2526.8516]
2025-09-13 06:51:58,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 134.0, 644.0, 1000.0, 1000.0, 381.0, 1000.0]
2025-09-13 06:51:58,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 37 minutes, 3 seconds)
2025-09-13 07:02:36,962 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:02:36,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:06:12,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1812.58948 ± 1008.313
2025-09-13 07:06:12,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [300.02298, 3072.7551, 1897.5614, 2774.4272, 127.19411, 2777.3394, 1874.5747, 1172.4124, 1341.4332, 2788.1736]
2025-09-13 07:06:12,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [132.0, 1000.0, 1000.0, 1000.0, 59.0, 1000.0, 1000.0, 504.0, 511.0, 1000.0]
2025-09-13 07:06:12,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 18 minutes, 53 seconds)
2025-09-13 07:17:23,357 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:17:23,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:20:38,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1765.96484 ± 982.894
2025-09-13 07:20:38,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2596.7446, 2506.5352, 1732.9147, 414.8069, 2754.1702, 673.04895, 316.88306, 2715.1418, 2792.172, 1157.2308]
2025-09-13 07:20:38,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 575.0, 126.0, 1000.0, 232.0, 107.0, 1000.0, 1000.0, 406.0]
2025-09-13 07:20:38,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 30 seconds)
2025-09-13 07:31:11,517 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:31:11,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:34:25,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1765.78882 ± 1029.740
2025-09-13 07:34:25,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [475.26016, 1886.7593, 2551.7686, 1866.7325, 105.34231, 298.736, 2798.5483, 2910.6018, 2735.2537, 2028.8854]
2025-09-13 07:34:25,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [172.0, 1000.0, 915.0, 639.0, 50.0, 106.0, 1000.0, 1000.0, 1000.0, 631.0]
2025-09-13 07:34:25,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 43 minutes, 18 seconds)
2025-09-13 07:45:34,422 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:45:34,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:49:48,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2105.95020 ± 826.073
2025-09-13 07:49:48,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2593.7778, 2501.7966, 2466.255, 2548.4187, 2629.2825, 345.77194, 1727.1285, 777.5667, 2606.8813, 2862.6233]
2025-09-13 07:49:48,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 995.0, 1000.0, 140.0, 1000.0, 333.0, 1000.0, 1000.0]
2025-09-13 07:49:48,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 25 minutes, 1 second)
2025-09-13 08:01:04,535 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:01:04,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:05:24,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1399.28699 ± 887.275
2025-09-13 08:05:24,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [467.03354, 699.1305, 1333.2297, 2375.4126, 695.68823, 2823.625, 1084.8508, 2880.4731, 642.4731, 990.95374]
2025-09-13 08:05:24,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 218.0, 1000.0, 1000.0, 1000.0, 1000.0, 387.0]
2025-09-13 08:05:24,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 10 minutes, 55 seconds)
2025-09-13 08:16:28,607 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:16:28,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:20:15,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1356.12891 ± 848.241
2025-09-13 08:20:15,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1086.2412, 1685.3467, 1540.0688, 3124.3838, 2189.535, 1449.1432, 46.819233, 893.25287, 1293.8635, 252.63475]
2025-09-13 08:20:15,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [344.0, 672.0, 1000.0, 1000.0, 1000.0, 493.0, 42.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:20:15,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 57 minutes, 43 seconds)
2025-09-13 08:31:27,742 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:31:27,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:36:19,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2658.50146 ± 285.819
2025-09-13 08:36:19,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1915.3539, 2827.9775, 2718.4468, 2688.1277, 2924.1128, 2629.523, 2670.9236, 2724.0942, 2477.4656, 3008.9917]
2025-09-13 08:36:19,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [719.0, 919.0, 1000.0, 1000.0, 1000.0, 897.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:36:19,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (2658.50) for latency ExtremeSparseL4U32
2025-09-13 08:36:19,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 46 minutes, 29 seconds)
2025-09-13 08:46:42,929 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:46:42,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:50:25,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1947.03882 ± 966.966
2025-09-13 08:50:25,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [3024.6565, 241.76675, 1214.6489, 960.72784, 2679.3254, 2873.7783, 1305.9186, 1449.2102, 3018.6282, 2701.7285]
2025-09-13 08:50:25,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 93.0, 432.0, 367.0, 1000.0, 971.0, 1000.0, 434.0, 1000.0, 913.0]
2025-09-13 08:50:25,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 31 minutes, 58 seconds)
2025-09-13 09:01:12,494 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:01:12,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:04:48,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2075.09180 ± 1037.050
2025-09-13 09:04:48,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [3100.871, 2901.3447, 1928.897, 2700.9106, 3069.4321, 729.80914, 892.5674, 137.48889, 2701.7375, 2587.8586]
2025-09-13 09:04:48,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 602.0, 1000.0, 1000.0, 253.0, 349.0, 54.0, 1000.0, 1000.0]
2025-09-13 09:04:48,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 15 minutes)
2025-09-13 09:16:32,677 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:16:32,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:19:27,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1293.78174 ± 893.232
2025-09-13 09:19:27,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1638.8798, 2820.161, 640.05365, 67.56494, 422.40323, 1521.4298, 462.93292, 835.0683, 2235.7192, 2293.6047]
2025-09-13 09:19:27,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [702.0, 1000.0, 220.0, 30.0, 160.0, 548.0, 1000.0, 319.0, 1000.0, 897.0]
2025-09-13 09:19:27,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 58 minutes, 28 seconds)
2025-09-13 09:30:15,257 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:30:15,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:34:07,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2194.35815 ± 827.229
2025-09-13 09:34:07,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2962.5835, 1190.0355, 2863.79, 2880.8096, 1491.5948, 2688.6582, 1505.551, 2820.6196, 2856.2048, 683.73413]
2025-09-13 09:34:07,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 465.0, 1000.0, 1000.0, 522.0, 1000.0, 484.0, 1000.0, 1000.0, 271.0]
2025-09-13 09:34:07,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 43 minutes, 24 seconds)
2025-09-13 09:45:06,964 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:45:06,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:49:28,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2118.85815 ± 1029.903
2025-09-13 09:49:28,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1308.9446, 2658.388, 1546.3289, 3082.9026, 3456.7153, 264.4549, 2638.325, 2652.1777, 693.8688, 2886.4744]
2025-09-13 09:49:28,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [427.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 221.0, 956.0]
2025-09-13 09:49:28,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 27 minutes, 45 seconds)
2025-09-13 10:00:41,759 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:00:41,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:04:57,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2504.13721 ± 962.909
2025-09-13 10:04:57,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2623.643, 2901.2913, 3064.4958, 2824.9746, 3151.5623, 348.51117, 3021.7732, 3255.4229, 2952.8494, 896.84564]
2025-09-13 10:04:57,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [874.0, 1000.0, 1000.0, 1000.0, 1000.0, 166.0, 1000.0, 1000.0, 1000.0, 311.0]
2025-09-13 10:04:57,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 14 minutes, 32 seconds)
2025-09-13 10:15:06,116 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:15:06,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:19:08,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1947.95825 ± 1349.948
2025-09-13 10:19:08,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [255.40302, 3102.0085, 2833.0295, 59.74847, 606.9558, 3530.3765, 2920.8733, 353.28302, 3049.3325, 2768.5725]
2025-09-13 10:19:08,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [104.0, 978.0, 953.0, 32.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:19:08,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 59 minutes, 28 seconds)
2025-09-13 10:30:16,691 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:30:16,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:33:55,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1820.34302 ± 1251.406
2025-09-13 10:33:55,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [3106.84, 92.454445, 3250.3188, 676.80286, 498.20013, 2385.0283, 3274.9465, 2995.1145, 1523.1669, 400.5583]
2025-09-13 10:33:55,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 50.0, 1000.0, 1000.0, 329.0, 787.0, 1000.0, 1000.0, 1000.0, 136.0]
2025-09-13 10:33:55,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 44 minutes, 41 seconds)
2025-09-13 10:44:41,873 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:44:41,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:47:47,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1359.93445 ± 892.047
2025-09-13 10:47:47,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [260.5099, 1380.1599, 1227.6587, 1032.7034, 271.1116, 1453.8365, 524.09607, 2302.4312, 1903.5674, 3243.2695]
2025-09-13 10:47:47,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [94.0, 543.0, 497.0, 1000.0, 1000.0, 541.0, 228.0, 711.0, 577.0, 1000.0]
2025-09-13 10:47:47,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 29 minutes, 27 seconds)
2025-09-13 10:59:49,651 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:59:49,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:02:43,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1480.96997 ± 885.971
2025-09-13 11:02:43,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2768.732, 1458.1847, 290.24274, 1094.7115, 1333.0444, 3061.1892, 847.7985, 444.18066, 1267.0947, 2244.5208]
2025-09-13 11:02:43,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 539.0, 100.0, 292.0, 420.0, 1000.0, 251.0, 153.0, 1000.0, 1000.0]
2025-09-13 11:02:43,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 14 minutes, 39 seconds)
2025-09-13 11:13:06,429 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:13:06,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:17:45,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2371.91650 ± 859.499
2025-09-13 11:17:45,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1655.2009, 2722.3232, 3153.45, 3328.522, 803.3634, 3050.8638, 3290.625, 1381.1809, 2610.6614, 1722.9718]
2025-09-13 11:17:45,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 510.0, 1000.0, 578.0]
2025-09-13 11:17:45,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1251 [DEBUG]: Training session finished
