2025-09-11 20:25:42,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc0-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-09-11 20:25:42,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc0-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-09-11 20:25:42,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x1475106fcb90>}
2025-09-11 20:25:42,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1111 [DEBUG]: using device: cuda
2025-09-11 20:25:42,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1133 [INFO]: Creating new trainer
2025-09-11 20:25:42,377 baseline-mbpac-noiseperc0-ant:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-09-11 20:25:42,378 baseline-mbpac-noiseperc0-ant:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 20:25:42,387 baseline-mbpac-noiseperc0-ant:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=27, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=8, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-11 20:25:43,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1194 [DEBUG]: Starting training session...
2025-09-11 20:25:43,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 1/100
2025-09-11 20:36:16,556 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:36:16,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 20:37:50,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: -75.15273 ± 104.032
2025-09-11 20:37:50,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [-99.19343, -37.100147, -5.3186, 2.4729097, 11.690634, -266.06845, 7.793974, -272.3816, -2.4952447, -90.92723]
2025-09-11 20:37:50,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [443.0, 171.0, 281.0, 38.0, 18.0, 1000.0, 27.0, 1000.0, 15.0, 308.0]
2025-09-11 20:37:50,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (-75.15) for latency MM1Queue_a033_s075
2025-09-11 20:37:50,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 20 hours, 21 seconds)
2025-09-11 20:50:15,990 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:50:15,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 20:54:43,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 601.16907 ± 142.236
2025-09-11 20:54:43,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [701.177, 735.11584, 714.97833, 591.6923, 558.18494, 707.3767, 601.2264, 437.1949, 698.1092, 266.63507]
2025-09-11 20:54:43,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 994.0, 1000.0, 835.0, 568.0, 1000.0, 1000.0]
2025-09-11 20:54:43,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (601.17) for latency MM1Queue_a033_s075
2025-09-11 20:54:43,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 23 hours, 41 minutes, 31 seconds)
2025-09-11 21:07:09,265 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:07:09,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 21:11:53,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 659.49207 ± 136.390
2025-09-11 21:11:53,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [752.0125, 559.21735, 780.04736, 627.1521, 726.26624, 722.04596, 791.4189, 327.6266, 743.64105, 565.49255]
2025-09-11 21:11:53,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:11:53,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (659.49) for latency MM1Queue_a033_s075
2025-09-11 21:11:53,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 24 hours, 53 minutes, 1 second)
2025-09-11 21:23:56,235 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:23:56,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 21:28:01,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 447.84683 ± 190.307
2025-09-11 21:28:01,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [602.54504, 520.6183, 414.839, 316.43942, 675.0659, 111.76587, 563.84357, 361.27567, 709.9294, 202.1461]
2025-09-11 21:28:01,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 682.0, 1000.0, 569.0, 1000.0, 350.0]
2025-09-11 21:28:01,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 24 hours, 55 minutes, 25 seconds)
2025-09-11 21:39:04,742 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:39:04,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 21:43:55,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 729.22766 ± 108.550
2025-09-11 21:43:55,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [763.39496, 685.2239, 675.8234, 468.17975, 801.7973, 729.1227, 835.35223, 826.368, 844.57117, 662.4427]
2025-09-11 21:43:55,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:43:55,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (729.23) for latency MM1Queue_a033_s075
2025-09-11 21:43:55,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 24 hours, 45 minutes, 56 seconds)
2025-09-11 21:56:59,887 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:56:59,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 22:01:51,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 854.17560 ± 44.199
2025-09-11 22:01:51,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [837.8805, 889.4858, 931.5453, 867.9373, 885.15656, 767.58356, 834.1211, 824.54083, 884.13605, 819.369]
2025-09-11 22:01:51,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:01:51,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (854.18) for latency MM1Queue_a033_s075
2025-09-11 22:01:51,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 26 hours, 19 minutes, 23 seconds)
2025-09-11 22:13:54,794 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:13:54,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 22:18:05,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 826.46417 ± 236.015
2025-09-11 22:18:05,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1018.6758, 935.2868, 956.31824, 920.87103, 884.0797, 173.11519, 841.218, 916.94867, 661.96924, 956.15826]
2025-09-11 22:18:05,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 299.0, 862.0, 1000.0, 689.0, 1000.0]
2025-09-11 22:18:05,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 25 hours, 50 minutes, 25 seconds)
2025-09-11 22:29:00,503 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:29:00,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 22:33:38,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 914.31189 ± 220.909
2025-09-11 22:33:38,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1017.0847, 1075.4177, 938.5304, 1091.9028, 818.3477, 301.6263, 854.82086, 1027.1271, 992.95764, 1025.3038]
2025-09-11 22:33:38,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:33:38,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (914.31) for latency MM1Queue_a033_s075
2025-09-11 22:33:38,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 25 hours, 4 minutes, 9 seconds)
2025-09-11 22:45:52,124 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:45:52,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 22:49:54,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 860.50964 ± 327.966
2025-09-11 22:49:54,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [370.57227, 626.2133, 756.1346, 1164.9967, 981.3374, 1073.2297, 1219.9965, 1123.0564, 1048.5049, 241.05464]
2025-09-11 22:49:54,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 581.0, 1000.0, 964.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 199.0]
2025-09-11 22:49:54,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 24 hours, 50 minutes, 14 seconds)
2025-09-11 23:01:27,624 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:01:27,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:04:32,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 849.35919 ± 425.329
2025-09-11 23:04:32,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [103.63281, 1338.1827, 811.87665, 723.1135, 1328.2141, 1373.667, 708.8493, 1199.7458, 587.27313, 319.0362]
2025-09-11 23:04:32,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [80.0, 1000.0, 655.0, 582.0, 1000.0, 1000.0, 542.0, 1000.0, 484.0, 256.0]
2025-09-11 23:04:32,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 24 hours, 11 minutes, 3 seconds)
2025-09-11 23:16:53,099 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:16:53,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:21:09,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1178.87988 ± 356.851
2025-09-11 23:21:09,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1357.2325, 1467.9044, 1407.1886, 1063.7059, 1306.2026, 567.423, 1282.3687, 440.09235, 1427.5774, 1469.1046]
2025-09-11 23:21:09,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 795.0, 1000.0, 434.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:21:09,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (1178.88) for latency MM1Queue_a033_s075
2025-09-11 23:21:09,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 23 hours, 31 minutes, 32 seconds)
2025-09-11 23:32:21,691 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:32:21,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:36:34,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1269.57581 ± 446.593
2025-09-11 23:36:34,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1476.9248, 1605.6605, 1472.8965, 1555.0251, 1492.7201, 881.34393, 877.2481, 186.74194, 1617.5038, 1529.6937]
2025-09-11 23:36:34,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [955.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 139.0, 1000.0, 1000.0]
2025-09-11 23:36:34,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (1269.58) for latency MM1Queue_a033_s075
2025-09-11 23:36:34,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 23 hours, 1 minute, 27 seconds)
2025-09-11 23:48:27,718 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:48:27,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:52:40,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1115.02771 ± 471.304
2025-09-11 23:52:40,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [845.1958, 664.37573, 1538.0144, 1577.8475, 923.8783, 59.82851, 1465.1066, 1501.3208, 1098.7974, 1475.9116]
2025-09-11 23:52:40,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [582.0, 1000.0, 1000.0, 1000.0, 667.0, 889.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:52:40,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 22 hours, 55 minutes, 3 seconds)
2025-09-12 00:05:17,211 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:05:17,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:09:52,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1589.46643 ± 168.293
2025-09-12 00:09:52,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1755.191, 1643.8811, 1666.4825, 1526.9352, 1663.536, 1720.3275, 1278.4198, 1279.9453, 1604.4379, 1755.5076]
2025-09-12 00:09:52,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 762.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:09:52,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (1589.47) for latency MM1Queue_a033_s075
2025-09-12 00:09:52,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 22 hours, 55 minutes, 23 seconds)
2025-09-12 00:21:28,838 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:21:28,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:25:36,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1330.78577 ± 561.825
2025-09-12 00:25:36,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1695.1001, 156.12189, 1639.4006, 318.37363, 1562.4991, 1699.4279, 1718.7922, 1623.8198, 1259.2849, 1635.0366]
2025-09-12 00:25:36,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 102.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 723.0, 1000.0]
2025-09-12 00:25:36,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 22 hours, 58 minutes, 1 second)
2025-09-12 00:37:28,876 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:37:28,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:41:31,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1356.37695 ± 569.222
2025-09-12 00:41:31,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1699.7823, 1687.514, 1684.2982, 1751.2319, 458.38422, 1815.3131, 818.3577, 1634.5784, 1755.9293, 258.37943]
2025-09-12 00:41:31,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 501.0, 1000.0, 1000.0, 154.0]
2025-09-12 00:41:31,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 22 hours, 30 minutes, 4 seconds)
2025-09-12 00:53:41,116 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:53:41,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:58:12,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1441.75659 ± 517.984
2025-09-12 00:58:12,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1748.0856, 1679.9404, 1251.7749, 1650.2382, 1841.5334, 584.12604, 1784.4916, 1781.2723, 1760.3903, 335.71326]
2025-09-12 00:58:12,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 712.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:58:12,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 22 hours, 34 minutes, 56 seconds)
2025-09-12 01:09:36,929 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:09:36,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:14:15,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1842.91541 ± 51.417
2025-09-12 01:14:15,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1819.653, 1901.9463, 1757.4507, 1842.927, 1911.1882, 1759.1735, 1864.0183, 1821.449, 1861.8765, 1889.4731]
2025-09-12 01:14:15,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:14:15,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (1842.92) for latency MM1Queue_a033_s075
2025-09-12 01:14:15,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 22 hours, 17 minutes, 59 seconds)
2025-09-12 01:26:08,046 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:26:08,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:29:45,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1235.75623 ± 696.997
2025-09-12 01:29:45,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [51.09234, 1788.8851, 1762.5978, 1826.0402, 1748.6444, 995.6117, 339.18878, 1792.6028, 319.19742, 1733.7023]
2025-09-12 01:29:45,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [52.0, 1000.0, 1000.0, 1000.0, 1000.0, 570.0, 191.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:29:45,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 21 hours, 33 minutes, 56 seconds)
2025-09-12 01:41:37,818 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:41:37,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:45:36,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1640.92798 ± 563.712
2025-09-12 01:45:36,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1959.8932, 372.9595, 1723.3508, 699.04126, 1971.8687, 1990.904, 2049.2578, 1902.5221, 1835.0376, 1904.4425]
2025-09-12 01:45:36,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 197.0, 1000.0, 400.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:45:36,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 21 hours, 19 minutes, 58 seconds)
2025-09-12 01:57:33,731 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:57:33,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:01:27,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1607.74585 ± 585.044
2025-09-12 02:01:27,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1441.0719, 1894.8606, 1836.1289, 2004.3245, 59.443047, 1935.9379, 1934.9836, 1932.6711, 1948.993, 1089.044]
2025-09-12 02:01:27,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [719.0, 1000.0, 1000.0, 1000.0, 36.0, 1000.0, 1000.0, 1000.0, 1000.0, 641.0]
2025-09-12 02:01:27,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 21 hours, 3 minutes, 4 seconds)
2025-09-12 02:12:40,246 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:12:40,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:17:07,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1793.12988 ± 332.728
2025-09-12 02:17:07,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1935.6288, 2083.0251, 1205.614, 1863.8278, 1956.1168, 1914.8196, 1074.7644, 1904.4327, 1996.8005, 1996.2687]
2025-09-12 02:17:07,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 653.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:17:07,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 20 hours, 31 minutes, 13 seconds)
2025-09-12 02:29:26,980 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:29:26,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:33:09,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1563.73657 ± 578.080
2025-09-12 02:33:09,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1947.8285, 2018.7759, 2005.5842, 2039.8674, 430.3485, 1812.0348, 1651.9227, 1037.3624, 698.7182, 1994.923]
2025-09-12 02:33:09,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 214.0, 1000.0, 832.0, 512.0, 345.0, 1000.0]
2025-09-12 02:33:09,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 20 hours, 15 minutes, 10 seconds)
2025-09-12 02:45:01,694 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:45:01,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:48:04,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1322.89185 ± 728.392
2025-09-12 02:48:04,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [604.28564, 1985.9346, 197.2056, 2072.99, 652.1956, 2002.2722, 2092.7964, 621.53784, 1979.9785, 1019.72156]
2025-09-12 02:48:04,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [314.0, 1000.0, 131.0, 1000.0, 319.0, 1000.0, 1000.0, 334.0, 1000.0, 508.0]
2025-09-12 02:48:04,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 19 hours, 50 minutes, 38 seconds)
2025-09-12 03:00:09,662 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:00:09,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:04:06,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1312.32324 ± 720.664
2025-09-12 03:04:06,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2025.5642, 1060.8246, 616.6319, 1969.768, 2027.627, 1956.5953, 363.40753, 277.03946, 2039.8497, 785.92566]
2025-09-12 03:04:06,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 208.0, 169.0, 1000.0, 1000.0]
2025-09-12 03:04:06,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 19 hours, 37 minutes, 41 seconds)
2025-09-12 03:15:48,029 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:15:48,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:20:28,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1713.73706 ± 568.877
2025-09-12 03:20:28,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2095.8467, 2077.086, 2052.593, 2039.7352, 1995.4331, 2082.805, 559.7448, 1499.2998, 2039.7054, 695.1223]
2025-09-12 03:20:28,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [991.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:20:28,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 19 hours, 29 minutes, 24 seconds)
2025-09-12 03:32:21,064 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:32:21,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:36:53,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1856.72327 ± 349.573
2025-09-12 03:36:53,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2197.4062, 2024.2808, 2086.8967, 2080.9817, 1442.9747, 2036.1891, 2159.1052, 1847.5142, 1064.9667, 1626.9183]
2025-09-12 03:36:53,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 810.0, 1000.0, 809.0]
2025-09-12 03:36:53,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (1856.72) for latency MM1Queue_a033_s075
2025-09-12 03:36:53,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 19 hours, 24 minutes, 26 seconds)
2025-09-12 03:48:45,701 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:48:45,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:52:52,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1642.35718 ± 694.460
2025-09-12 03:52:52,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2057.4739, 2022.8722, 554.1803, 2087.7827, 1942.8988, 2161.9087, 2135.9197, 2071.1013, 175.40378, 1214.0314]
2025-09-12 03:52:52,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 846.0, 1000.0, 1000.0, 1000.0, 95.0, 1000.0]
2025-09-12 03:52:52,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 19 hours, 7 minutes, 50 seconds)
2025-09-12 04:04:44,231 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:04:44,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:08:43,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1698.01733 ± 658.400
2025-09-12 04:08:43,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1729.2124, 2097.957, 2117.5063, 1287.4181, 1150.6675, 43.07998, 2121.7996, 2215.089, 2050.8535, 2166.589]
2025-09-12 04:08:43,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 565.0, 19.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:08:43,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 19 hours, 5 minutes, 8 seconds)
2025-09-12 04:20:44,161 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:20:44,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:25:13,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1705.84106 ± 580.427
2025-09-12 04:25:13,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1918.2938, 2144.7766, 2152.3313, 2090.772, 2110.5269, 1016.3163, 662.07465, 2096.9556, 819.4586, 2046.9037]
2025-09-12 04:25:13,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 501.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:25:13,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 18 hours, 55 minutes, 25 seconds)
2025-09-12 04:37:01,025 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:37:01,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:41:08,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1792.39453 ± 618.323
2025-09-12 04:41:08,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1959.333, 2046.6035, 2141.227, 2036.1355, 2109.593, 2126.6763, 25.0965, 1907.73, 2102.441, 1469.1086]
2025-09-12 04:41:08,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 20.0, 1000.0, 1000.0, 709.0]
2025-09-12 04:41:08,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 18 hours, 33 minutes, 7 seconds)
2025-09-12 04:53:07,696 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:53:07,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:57:38,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1922.87817 ± 342.289
2025-09-12 04:57:38,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1425.215, 1862.1375, 2057.5413, 2102.8792, 2034.0634, 2112.939, 2060.6472, 2300.1199, 1138.5934, 2134.6443]
2025-09-12 04:57:38,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [666.0, 859.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:57:38,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (1922.88) for latency MM1Queue_a033_s075
2025-09-12 04:57:38,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 18 hours, 18 minutes, 13 seconds)
2025-09-12 05:09:44,463 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:09:44,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:14:00,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1866.08240 ± 562.297
2025-09-12 05:14:00,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1651.837, 2028.7355, 2189.3755, 2145.6067, 2205.6738, 1860.1154, 1926.9359, 2169.7864, 262.86765, 2219.891]
2025-09-12 05:14:00,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [807.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 138.0, 1000.0]
2025-09-12 05:14:00,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 18 hours, 7 minutes, 5 seconds)
2025-09-12 05:25:48,002 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:25:48,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:30:31,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2055.29419 ± 292.739
2025-09-12 05:30:31,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2171.4016, 2282.227, 2264.3354, 2130.5742, 2215.4421, 1853.3096, 1248.8083, 2117.6067, 2202.7488, 2066.487]
2025-09-12 05:30:31,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 981.0, 1000.0]
2025-09-12 05:30:31,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (2055.29) for latency MM1Queue_a033_s075
2025-09-12 05:30:31,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 17 hours, 59 minutes, 51 seconds)
2025-09-12 05:41:52,330 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:41:52,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:45:17,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1493.91357 ± 731.409
2025-09-12 05:45:17,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1598.2349, 2232.5347, 1740.3868, 285.44305, 2184.5962, 535.6481, 2165.7778, 1067.1016, 2349.9104, 779.50305]
2025-09-12 05:45:17,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 155.0, 1000.0, 279.0, 1000.0, 499.0, 1000.0, 413.0]
2025-09-12 05:45:17,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 17 hours, 20 minutes, 58 seconds)
2025-09-12 05:57:10,659 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:57:10,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:01:46,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2067.41064 ± 318.081
2025-09-12 06:01:46,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2229.6619, 2050.6946, 2311.775, 1952.3606, 2236.3547, 1160.2871, 2159.2153, 2150.351, 2255.5034, 2167.9014]
2025-09-12 06:01:46,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:01:46,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (2067.41) for latency MM1Queue_a033_s075
2025-09-12 06:01:46,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 17 hours, 12 minutes, 6 seconds)
2025-09-12 06:13:10,739 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:13:10,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:16:46,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1670.99585 ± 715.203
2025-09-12 06:16:46,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2259.0317, 2278.6685, 2164.0862, 2195.2844, 2101.2168, 510.45795, 1837.5397, 1940.6484, 1163.442, 259.58304]
2025-09-12 06:16:46,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 224.0, 882.0, 879.0, 556.0, 119.0]
2025-09-12 06:16:46,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 16 hours, 37 minutes, 11 seconds)
2025-09-12 06:28:37,188 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:28:37,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:32:29,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1583.66382 ± 702.517
2025-09-12 06:32:29,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [606.80786, 1250.7417, 749.47095, 1245.7168, 2343.4172, 2171.6072, 706.0143, 2249.8584, 2350.6487, 2162.356]
2025-09-12 06:32:29,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [250.0, 1000.0, 343.0, 554.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:32:29,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 16 hours, 13 minutes, 13 seconds)
2025-09-12 06:44:10,461 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:44:10,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:48:38,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2056.39038 ± 436.706
2025-09-12 06:48:38,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2177.4644, 2156.1936, 2168.5034, 2203.4111, 2276.4565, 2262.8408, 2181.552, 751.3696, 2168.8936, 2217.2214]
2025-09-12 06:48:38,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 362.0, 1000.0, 1000.0]
2025-09-12 06:48:38,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 15 hours, 52 minutes, 51 seconds)
2025-09-12 07:00:47,224 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:00:47,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:05:11,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1967.86108 ± 517.055
2025-09-12 07:05:11,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [675.7778, 2115.2175, 2385.4124, 1604.9254, 1808.2792, 2457.9246, 2298.6519, 2345.659, 1713.1052, 2273.661]
2025-09-12 07:05:11,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 703.0, 1000.0, 1000.0, 1000.0, 1000.0, 730.0, 1000.0]
2025-09-12 07:05:11,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 58 minutes, 43 seconds)
2025-09-12 07:17:00,638 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:17:00,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:21:12,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1756.81116 ± 703.837
2025-09-12 07:21:12,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2249.8423, 2222.0107, 1847.7661, 1019.59125, 397.9967, 2496.6514, 2195.9912, 2226.446, 765.1327, 2146.683]
2025-09-12 07:21:12,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 978.0, 821.0, 1000.0, 159.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:21:12,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 15 hours, 37 minutes, 24 seconds)
2025-09-12 07:32:50,069 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:32:50,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:37:20,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2122.45117 ± 429.427
2025-09-12 07:37:20,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1521.1403, 2363.9214, 2256.9004, 2386.9514, 2212.9607, 2393.5046, 1070.3403, 2326.7825, 2400.112, 2291.8997]
2025-09-12 07:37:20,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 430.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:37:20,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (2122.45) for latency MM1Queue_a033_s075
2025-09-12 07:37:20,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 15 hours, 34 minutes, 31 seconds)
2025-09-12 07:49:17,994 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:49:17,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:53:57,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2254.21021 ± 172.984
2025-09-12 07:53:57,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1772.1385, 2279.6697, 2195.2346, 2290.7473, 2427.5088, 2356.2122, 2254.527, 2355.9285, 2357.1484, 2252.9883]
2025-09-12 07:53:57,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [797.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:53:57,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (2254.21) for latency MM1Queue_a033_s075
2025-09-12 07:53:57,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 15 hours, 28 minutes, 49 seconds)
2025-09-12 08:06:10,810 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:06:10,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:10:28,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1909.15979 ± 800.877
2025-09-12 08:10:28,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2162.4302, 489.06635, 173.75195, 2303.7979, 2039.5806, 2346.073, 2348.6064, 2331.3513, 2399.1072, 2497.8323]
2025-09-12 08:10:28,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 97.0, 1000.0, 888.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:10:28,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 15 hours, 16 minutes, 35 seconds)
2025-09-12 08:22:47,562 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:22:47,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:26:12,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1670.80566 ± 798.084
2025-09-12 08:26:12,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [864.6166, 2257.442, 2278.4568, 2361.0703, 2397.2764, 2263.9648, 897.2517, 2313.4185, 351.6794, 722.87964]
2025-09-12 08:26:12,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [367.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 377.0, 1000.0, 161.0, 343.0]
2025-09-12 08:26:12,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 14 hours, 51 minutes, 14 seconds)
2025-09-12 08:37:47,774 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:37:47,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:42:34,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2251.29077 ± 278.921
2025-09-12 08:42:34,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2475.544, 2408.2688, 2114.7524, 2319.8125, 2383.1335, 1975.0885, 2465.6758, 1551.7065, 2391.9207, 2427.0076]
2025-09-12 08:42:34,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:42:34,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 14 hours, 38 minutes, 42 seconds)
2025-09-12 08:54:26,742 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:54:26,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:57:55,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1559.85706 ± 791.240
2025-09-12 08:57:55,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2277.1216, 1542.1342, 1171.4683, 358.26907, 2082.2388, 1243.7065, 2372.447, 2316.206, 2155.418, 79.5608]
2025-09-12 08:57:55,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 669.0, 487.0, 176.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 44.0]
2025-09-12 08:57:55,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 14 hours, 14 minutes, 14 seconds)
2025-09-12 09:10:09,935 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:10:09,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:14:55,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1976.98315 ± 629.873
2025-09-12 09:14:55,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2561.6763, 1978.158, 1605.8718, 2346.8982, 2546.0198, 540.03204, 2351.1265, 2339.1677, 2292.9094, 1207.9733]
2025-09-12 09:14:55,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:14:55,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 14 hours, 2 minutes, 3 seconds)
2025-09-12 09:26:18,123 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:26:18,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:30:06,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1693.63965 ± 659.807
2025-09-12 09:30:06,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2274.9824, 2437.9355, 777.59906, 1430.8973, 2214.9114, 1088.3154, 1301.3735, 2368.1008, 2310.7378, 731.54236]
2025-09-12 09:30:06,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 327.0, 1000.0, 1000.0, 1000.0, 538.0, 1000.0, 1000.0, 313.0]
2025-09-12 09:30:06,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 13 hours, 32 minutes, 15 seconds)
2025-09-12 09:42:32,998 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:42:33,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:46:16,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1706.21326 ± 604.081
2025-09-12 09:46:16,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2399.3496, 1537.6161, 675.07245, 2297.169, 763.30115, 1647.7537, 2294.3464, 2234.6575, 1914.8347, 1298.032]
2025-09-12 09:46:16,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 722.0, 295.0, 1000.0, 405.0, 718.0, 1000.0, 1000.0, 742.0, 1000.0]
2025-09-12 09:46:16,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 13 hours, 20 minutes, 39 seconds)
2025-09-12 09:57:18,640 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:57:18,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:01:53,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2291.46143 ± 282.932
2025-09-12 10:01:53,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2466.964, 1503.0707, 2304.6592, 2475.9026, 2542.0073, 2266.4517, 2215.347, 2285.8962, 2364.3162, 2490.0007]
2025-09-12 10:01:53,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 687.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:01:53,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (2291.46) for latency MM1Queue_a033_s075
2025-09-12 10:01:53,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 12 hours, 57 minutes, 23 seconds)
2025-09-12 10:13:45,404 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:13:45,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:18:30,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2143.03760 ± 427.804
2025-09-12 10:18:30,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1083.4133, 2402.6155, 1572.535, 2469.6887, 2252.506, 2365.854, 2262.4497, 2439.6836, 2301.6958, 2279.9363]
2025-09-12 10:18:30,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:18:30,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 53 minutes, 31 seconds)
2025-09-12 10:30:51,964 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:30:51,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:34:46,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1974.55933 ± 712.907
2025-09-12 10:34:46,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2332.1287, 745.2642, 2331.5398, 2279.2644, 2134.027, 2496.3977, 2413.2112, 2291.228, 2330.8936, 391.6377]
2025-09-12 10:34:46,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 272.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 169.0]
2025-09-12 10:34:46,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 30 minutes, 32 seconds)
2025-09-12 10:46:14,418 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:46:14,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:50:02,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1950.81470 ± 728.689
2025-09-12 10:50:02,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2577.653, 822.7219, 2585.9756, 2379.7988, 2332.8367, 1380.7836, 2214.645, 478.9154, 2405.0247, 2329.7903]
2025-09-12 10:50:02,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 356.0, 1000.0, 1000.0, 1000.0, 621.0, 1000.0, 224.0, 1000.0, 1000.0]
2025-09-12 10:50:02,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 12 hours, 15 minutes, 26 seconds)
2025-09-12 11:02:22,041 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:02:22,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:06:50,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2204.92480 ± 566.350
2025-09-12 11:06:50,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2390.6455, 2505.9392, 2555.7085, 2508.3445, 2458.7473, 2379.1753, 2631.531, 2414.543, 862.4584, 1342.1544]
2025-09-12 11:06:50,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 633.0]
2025-09-12 11:06:50,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 12 hours, 5 minutes, 6 seconds)
2025-09-12 11:18:21,986 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:18:21,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:22:40,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1653.95081 ± 838.451
2025-09-12 11:22:40,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1064.4218, 831.5213, 2362.0378, 2273.113, 1819.1317, 334.4272, 2519.3994, 2294.4946, 2559.2192, 481.7417]
2025-09-12 11:22:40,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 138.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:22:40,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 11 hours, 50 minutes, 49 seconds)
2025-09-12 11:34:28,600 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:34:28,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:39:12,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2434.74878 ± 104.242
2025-09-12 11:39:12,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2273.7214, 2527.006, 2555.2368, 2464.17, 2542.5383, 2498.623, 2289.2947, 2433.3025, 2472.2832, 2291.3127]
2025-09-12 11:39:12,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 972.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:39:12,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (2434.75) for latency MM1Queue_a033_s075
2025-09-12 11:39:12,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 11 hours, 33 minutes, 59 seconds)
2025-09-12 11:51:08,331 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:51:08,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:55:22,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2164.28418 ± 629.704
2025-09-12 11:55:22,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2324.6946, 2276.2998, 2349.7766, 2250.86, 2555.673, 2431.4731, 2556.5793, 2523.2551, 2043.8645, 330.36478]
2025-09-12 11:55:22,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 810.0, 149.0]
2025-09-12 11:55:22,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 11 hours, 17 minutes, 3 seconds)
2025-09-12 12:07:15,792 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:07:15,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:11:10,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1794.40662 ± 781.499
2025-09-12 12:11:10,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2429.7253, 823.07196, 2433.7107, 2387.3804, 113.31847, 2260.7075, 1779.0355, 1139.4213, 2427.1843, 2150.5103]
2025-09-12 12:11:10,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 78.0, 1000.0, 774.0, 462.0, 1000.0, 1000.0]
2025-09-12 12:11:10,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 11 hours, 5 minutes, 14 seconds)
2025-09-12 12:22:21,189 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:22:21,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:25:53,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1756.92126 ± 903.754
2025-09-12 12:25:53,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2451.719, 1869.7811, 2331.582, 637.04706, 2295.3025, 2450.6758, 444.52682, 2510.1875, 2427.0474, 151.34464]
2025-09-12 12:25:53,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 258.0, 1000.0, 1000.0, 210.0, 1000.0, 1000.0, 75.0]
2025-09-12 12:25:53,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 10 hours, 32 minutes, 27 seconds)
2025-09-12 12:38:43,815 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:38:43,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:42:59,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2217.03247 ± 665.821
2025-09-12 12:42:59,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2483.4514, 2591.813, 2414.9072, 231.05472, 2315.0195, 2381.4814, 2480.0676, 2474.2175, 2407.9812, 2390.3325]
2025-09-12 12:42:59,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 100.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:42:59,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 10 hours, 26 minutes, 29 seconds)
2025-09-12 12:54:07,331 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:54:07,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:58:17,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2180.65527 ± 552.785
2025-09-12 12:58:17,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1378.5935, 2359.3838, 848.36456, 2463.6208, 2497.3918, 2611.9814, 2398.3152, 2307.1252, 2410.9243, 2530.8516]
2025-09-12 12:58:17,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [543.0, 1000.0, 360.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:58:17,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 10 hours, 1 minute, 2 seconds)
2025-09-12 13:10:01,880 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:10:01,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:13:49,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1820.41370 ± 761.413
2025-09-12 13:13:49,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2424.58, 2409.4119, 2431.1194, 710.73944, 2398.141, 1309.992, 711.0397, 2504.455, 900.0664, 2404.591]
2025-09-12 13:13:49,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 953.0, 303.0, 1000.0, 538.0, 274.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:13:49,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 9 hours, 40 minutes, 28 seconds)
2025-09-12 13:26:06,372 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:26:06,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:30:14,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1889.18066 ± 771.793
2025-09-12 13:30:14,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2440.8416, 2537.3525, 1689.5538, 2328.8833, 2423.5676, 468.36368, 394.856, 2410.359, 2324.8699, 1873.1593]
2025-09-12 13:30:14,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 737.0, 1000.0, 1000.0, 1000.0, 211.0, 1000.0, 1000.0, 748.0]
2025-09-12 13:30:14,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 9 hours, 29 minutes, 20 seconds)
2025-09-12 13:42:32,393 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:42:32,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:46:08,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1629.39197 ± 740.081
2025-09-12 13:46:08,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1690.6204, 817.2205, 1913.9092, 1870.8813, 1636.5845, 962.1125, 112.2286, 2481.1982, 2399.2913, 2409.8728]
2025-09-12 13:46:08,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 353.0, 767.0, 744.0, 612.0, 1000.0, 61.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:46:08,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 9 hours, 21 minutes, 40 seconds)
2025-09-12 13:57:17,160 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:57:17,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:01:17,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2064.20947 ± 695.359
2025-09-12 14:01:17,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2511.216, 2224.1, 2556.9988, 2390.1997, 509.63504, 901.9931, 2534.7195, 2167.4973, 2467.454, 2378.2834]
2025-09-12 14:01:17,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 227.0, 380.0, 1000.0, 853.0, 1000.0, 1000.0]
2025-09-12 14:01:17,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 8 hours, 52 minutes, 29 seconds)
2025-09-12 14:13:32,167 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:13:32,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:17:48,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1785.28809 ± 656.578
2025-09-12 14:17:48,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1076.3221, 1662.4792, 1110.5433, 1583.7572, 2571.9124, 967.94257, 1266.3358, 2661.62, 2573.735, 2378.2332]
2025-09-12 14:17:48,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 660.0, 388.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:17:48,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 44 minutes, 49 seconds)
2025-09-12 14:29:57,562 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:29:57,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:34:35,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1979.66833 ± 688.174
2025-09-12 14:34:35,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2505.8342, 1916.8624, 2588.1135, 1265.1222, 2304.3804, 2513.569, 2350.2585, 1404.5275, 2523.8313, 424.18304]
2025-09-12 14:34:35,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 746.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:34:35,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 36 minutes, 57 seconds)
2025-09-12 14:46:08,924 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:46:08,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:50:31,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2289.11011 ± 386.880
2025-09-12 14:50:31,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1463.1387, 2573.179, 2364.712, 2502.2104, 2338.45, 2513.9512, 1635.5254, 2314.9272, 2686.8186, 2498.189]
2025-09-12 14:50:31,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [620.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 693.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:50:31,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 17 minutes, 43 seconds)
2025-09-12 15:03:04,159 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:03:04,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:07:36,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2417.90747 ± 302.221
2025-09-12 15:07:36,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2400.7568, 2557.1692, 2420.3936, 2587.0127, 2436.6213, 2610.754, 2472.1208, 2580.642, 1538.4746, 2575.1309]
2025-09-12 15:07:36,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 583.0, 1000.0]
2025-09-12 15:07:36,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 8 hours, 8 minutes, 47 seconds)
2025-09-12 15:19:18,487 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:19:18,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:23:34,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2251.14966 ± 498.779
2025-09-12 15:23:34,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2030.1158, 2528.7827, 840.0343, 2622.324, 2285.6157, 2522.0376, 2529.95, 2465.1196, 2437.6084, 2249.9055]
2025-09-12 15:23:34,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [782.0, 1000.0, 355.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 992.0, 1000.0]
2025-09-12 15:23:34,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 57 minutes, 10 seconds)
2025-09-12 15:35:26,118 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:35:26,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:39:48,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2287.85669 ± 556.538
2025-09-12 15:39:48,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [995.54095, 2616.48, 1399.7333, 2547.2666, 2603.6653, 2527.7502, 2707.2734, 2533.8557, 2490.52, 2456.481]
2025-09-12 15:39:48,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [422.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:39:48,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 39 minutes, 14 seconds)
2025-09-12 15:51:22,012 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:51:22,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:55:35,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2256.82373 ± 705.757
2025-09-12 15:55:35,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2528.2437, 2518.2498, 2485.718, 2571.912, 2349.7446, 2428.252, 2495.663, 2592.3064, 149.06155, 2449.085]
2025-09-12 15:55:35,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 77.0, 1000.0]
2025-09-12 15:55:35,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 17 minutes, 22 seconds)
2025-09-12 16:07:16,861 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:07:16,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:11:38,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2232.53320 ± 560.451
2025-09-12 16:11:38,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2629.994, 2456.5188, 2428.7896, 1605.886, 2611.8018, 2541.4768, 2202.6853, 2566.16, 2492.5996, 789.4195]
2025-09-12 16:11:38,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 302.0]
2025-09-12 16:11:38,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 7 hours, 1 minute, 47 seconds)
2025-09-12 16:22:55,412 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:22:55,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:27:22,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2187.56592 ± 592.932
2025-09-12 16:27:22,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1376.515, 2496.1167, 2516.3215, 909.44806, 2549.4395, 2777.024, 2526.4482, 2492.6028, 2518.6318, 1713.111]
2025-09-12 16:27:22,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [537.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:27:22,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 38 minutes, 49 seconds)
2025-09-12 16:40:01,766 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:40:01,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:44:26,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2308.64380 ± 459.351
2025-09-12 16:44:26,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2445.481, 2652.8784, 2208.196, 2524.2268, 2491.1584, 993.3854, 2406.4692, 2530.0637, 2229.1023, 2605.4775]
2025-09-12 16:44:26,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 395.0, 1000.0, 1000.0, 961.0, 1000.0]
2025-09-12 16:44:26,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 28 minutes, 12 seconds)
2025-09-12 16:56:04,740 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:56:04,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:00:09,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1898.71558 ± 823.854
2025-09-12 17:00:09,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2546.7078, 2312.9607, 2591.061, 1091.3376, 2485.8484, 400.0179, 2588.6887, 694.9236, 1671.7936, 2603.8164]
2025-09-12 17:00:09,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 883.0, 1000.0, 1000.0, 1000.0, 153.0, 1000.0, 1000.0, 691.0, 1000.0]
2025-09-12 17:00:09,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 9 minutes, 36 seconds)
2025-09-12 17:11:59,630 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:11:59,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:15:46,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1907.55835 ± 711.279
2025-09-12 17:15:46,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2428.382, 2457.777, 640.392, 2527.1562, 1121.6523, 1167.0774, 2518.1914, 2592.1807, 2295.9126, 1326.8629]
2025-09-12 17:15:46,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 223.0, 1000.0, 433.0, 1000.0, 1000.0, 1000.0, 1000.0, 436.0]
2025-09-12 17:15:46,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 52 minutes, 47 seconds)
2025-09-12 17:26:42,571 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:26:42,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:31:25,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2468.74219 ± 186.515
2025-09-12 17:31:25,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2376.4878, 2461.1743, 2629.599, 2402.5898, 2597.0657, 2512.1924, 2505.6907, 2646.2917, 1973.8813, 2582.449]
2025-09-12 17:31:25,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:31:25,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (2468.74) for latency MM1Queue_a033_s075
2025-09-12 17:31:25,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 35 minutes, 6 seconds)
2025-09-12 17:43:16,172 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:43:16,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:47:34,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2157.83667 ± 681.066
2025-09-12 17:47:34,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2596.0217, 2682.94, 1714.2388, 2531.9968, 2521.7117, 738.0493, 2498.188, 1086.4261, 2496.7034, 2712.0916]
2025-09-12 17:47:34,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 682.0, 1000.0, 1000.0, 1000.0, 1000.0, 418.0, 1000.0, 1000.0]
2025-09-12 17:47:34,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 20 minutes, 48 seconds)
2025-09-12 18:00:09,984 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:00:09,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:04:20,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2288.00732 ± 613.005
2025-09-12 18:04:20,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2710.7964, 1096.5283, 2507.6516, 2572.8298, 2588.475, 2589.7502, 1041.6885, 2513.2913, 2554.2102, 2704.8523]
2025-09-12 18:04:20,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 424.0, 1000.0, 1000.0, 1000.0, 1000.0, 402.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:04:20,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 3 minutes, 33 seconds)
2025-09-12 18:15:19,464 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:15:19,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:20:00,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2275.36597 ± 694.892
2025-09-12 18:20:00,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2254.238, 2660.6199, 1345.8151, 563.9275, 2650.5527, 2679.0637, 2547.8193, 2653.46, 2640.7605, 2757.4045]
2025-09-12 18:20:00,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:20:00,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 47 minutes, 26 seconds)
2025-09-12 18:32:23,626 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:32:23,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:36:13,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2079.33496 ± 983.698
2025-09-12 18:36:13,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [115.944374, 2456.1833, 2737.706, 129.31354, 2669.8162, 2613.167, 2682.7231, 2425.1042, 2445.6758, 2517.7166]
2025-09-12 18:36:13,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [90.0, 1000.0, 1000.0, 64.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:36:13,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 33 minutes, 30 seconds)
2025-09-12 18:48:02,425 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:48:02,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:52:28,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2311.26392 ± 390.435
2025-09-12 18:52:28,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2395.73, 2380.335, 2472.8845, 2652.032, 2456.7786, 1324.3735, 1859.2666, 2375.7712, 2633.2527, 2562.214]
2025-09-12 18:52:28,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 499.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:52:28,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 19 minutes, 21 seconds)
2025-09-12 19:04:38,907 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:04:38,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:09:18,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2397.72607 ± 639.426
2025-09-12 19:09:18,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2544.285, 2613.124, 2152.5828, 2693.8384, 2680.2566, 2679.6248, 2555.0754, 2768.4119, 545.3847, 2744.675]
2025-09-12 19:09:18,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 795.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:09:18,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 5 minutes, 11 seconds)
2025-09-12 19:20:12,436 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:20:12,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:24:57,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2506.85620 ± 108.514
2025-09-12 19:24:57,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2415.2976, 2500.4495, 2245.1125, 2487.394, 2664.7961, 2560.825, 2580.0452, 2591.5408, 2509.3489, 2513.7524]
2025-09-12 19:24:57,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:24:57,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (2506.86) for latency MM1Queue_a033_s075
2025-09-12 19:24:57,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 45 minutes, 45 seconds)
2025-09-12 19:37:37,392 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:37:37,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:41:53,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2256.28735 ± 528.321
2025-09-12 19:41:53,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2455.0444, 2527.28, 2484.159, 2408.489, 2467.4585, 2245.131, 1868.9857, 809.3158, 2708.0127, 2588.998]
2025-09-12 19:41:53,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 706.0, 318.0, 1000.0, 1000.0]
2025-09-12 19:41:53,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 32 minutes, 53 seconds)
2025-09-12 19:53:19,981 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:53:19,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:57:30,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1987.45740 ± 884.074
2025-09-12 19:57:30,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [1035.5895, 108.034, 2139.8591, 2641.679, 2604.1433, 2705.6252, 2637.1794, 2525.2075, 965.5994, 2511.6584]
2025-09-12 19:57:30,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 54.0, 824.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:57:30,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 15 minutes, 6 seconds)
2025-09-12 20:09:53,094 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:09:53,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:14:25,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2471.67188 ± 378.608
2025-09-12 20:14:25,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2708.0076, 2697.854, 1964.5581, 2537.8928, 1572.7961, 2631.5532, 2534.6714, 2921.1345, 2527.4495, 2620.8018]
2025-09-12 20:14:25,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 657.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:14:25,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 16 seconds)
2025-09-12 20:25:06,945 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:25:06,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:29:07,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2219.26904 ± 746.768
2025-09-12 20:29:07,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2788.4873, 2639.0864, 2621.8333, 508.33606, 2631.7424, 2657.387, 2457.8254, 1961.2695, 2791.1548, 1135.568]
2025-09-12 20:29:07,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 208.0, 1000.0, 1000.0, 1000.0, 739.0, 1000.0, 486.0]
2025-09-12 20:29:07,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 39 minutes, 37 seconds)
2025-09-12 20:41:09,109 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:41:09,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:45:30,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2389.96826 ± 597.551
2025-09-12 20:45:30,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2513.0095, 2721.0156, 2659.0369, 2685.9539, 2599.3076, 2629.8132, 614.9303, 2498.1492, 2523.969, 2454.4973]
2025-09-12 20:45:30,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 236.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:45:30,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 24 minutes, 58 seconds)
2025-09-12 20:57:10,885 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:57:10,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:01:29,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2063.15845 ± 791.237
2025-09-12 21:01:29,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [205.92311, 2637.887, 2398.7656, 2823.973, 1859.6594, 2477.9026, 1833.2094, 2517.9807, 2739.7122, 1136.57]
2025-09-12 21:01:29,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 783.0, 1000.0, 1000.0, 477.0]
2025-09-12 21:01:29,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 7 minutes, 21 seconds)
2025-09-12 21:14:04,148 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:14:04,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:18:22,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2349.10303 ± 630.235
2025-09-12 21:18:22,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2577.478, 2538.1143, 2787.5037, 2539.9558, 2492.9482, 2693.2388, 2064.5613, 2693.0615, 540.5031, 2563.6643]
2025-09-12 21:18:22,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 207.0, 1000.0]
2025-09-12 21:18:22,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 53 minutes, 13 seconds)
2025-09-12 21:29:40,871 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:29:40,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:33:44,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2058.75830 ± 839.596
2025-09-12 21:33:44,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2741.8135, 2533.3484, 2713.0203, 847.6911, 2569.62, 2758.1423, 2320.7441, 982.96246, 2557.441, 562.7994]
2025-09-12 21:33:44,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 408.0, 1000.0, 218.0]
2025-09-12 21:33:44,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 35 minutes, 11 seconds)
2025-09-12 21:45:25,011 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:45:25,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:50:07,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2593.22876 ± 88.187
2025-09-12 21:50:07,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2537.3064, 2479.3425, 2767.0955, 2587.3484, 2466.3196, 2587.0742, 2633.2427, 2575.7385, 2590.06, 2708.7585]
2025-09-12 21:50:07,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:50:07,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1226 [INFO]: New best (2593.23) for latency MM1Queue_a033_s075
2025-09-12 21:50:07,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 21 minutes)
2025-09-12 22:02:20,424 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:02:20,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:06:38,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2384.22485 ± 647.936
2025-09-12 22:06:38,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2011.4026, 543.71545, 2536.2346, 2683.4119, 2780.984, 2655.6724, 2658.7273, 2518.6904, 2726.639, 2726.7695]
2025-09-12 22:06:38,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [788.0, 233.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:06:38,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 4 minutes, 54 seconds)
2025-09-12 22:17:59,948 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:17:59,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:22:05,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2134.15381 ± 734.954
2025-09-12 22:22:05,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2588.8582, 2396.5198, 436.11267, 2559.4355, 2398.5771, 2566.9844, 2499.0615, 2547.1555, 2408.6057, 940.22797]
2025-09-12 22:22:05,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 190.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 403.0]
2025-09-12 22:22:05,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 48 minutes, 21 seconds)
2025-09-12 22:33:55,901 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:33:55,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:38:36,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2140.60938 ± 676.164
2025-09-12 22:38:36,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2474.4194, 1605.9442, 2365.1199, 2512.5085, 2566.8115, 2449.9702, 2617.8958, 283.7962, 2231.6323, 2297.9954]
2025-09-12 22:38:36,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:38:36,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 32 minutes, 5 seconds)
2025-09-12 22:50:52,662 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:50:52,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:55:04,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2185.51880 ± 509.251
2025-09-12 22:55:04,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2415.4114, 2379.1628, 2597.1484, 2276.789, 2470.2559, 2453.918, 2397.3552, 2136.6707, 741.3919, 1987.0836]
2025-09-12 22:55:04,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 956.0, 1000.0, 1000.0, 1000.0, 1000.0, 296.0, 850.0]
2025-09-12 22:55:04,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 16 minutes, 15 seconds)
2025-09-12 23:07:00,077 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:07:00,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:11:31,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2328.63965 ± 550.152
2025-09-12 23:11:31,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1222 [DEBUG]: All rewards: [2509.7612, 2768.457, 2398.6562, 2736.8716, 857.1096, 2503.6892, 2456.4246, 1849.5947, 2467.7063, 2738.1257]
2025-09-12 23:11:31,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 750.0, 1000.0, 1000.0]
2025-09-12 23:11:31,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-ant):1251 [DEBUG]: Training session finished
