2025-09-11 21:09:35,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc5-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-09-11 21:09:35,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc5-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-09-11 21:09:35,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x152547529b90>}
2025-09-11 21:09:35,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1111 [DEBUG]: using device: cuda
2025-09-11 21:09:35,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1133 [INFO]: Creating new trainer
2025-09-11 21:09:35,139 baseline-mbpac-noiseperc5-ant:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-09-11 21:09:35,139 baseline-mbpac-noiseperc5-ant:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 21:09:35,149 baseline-mbpac-noiseperc5-ant:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=27, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=8, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-11 21:09:36,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1194 [DEBUG]: Starting training session...
2025-09-11 21:09:36,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 1/100
2025-09-11 21:20:51,126 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:20:51,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 21:21:43,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: -67.90174 ± 99.197
2025-09-11 21:21:43,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [-349.77567, -111.27737, -0.62633336, -23.581839, -28.188536, -77.67254, -4.2169924, -32.334927, -28.417934, -22.925266]
2025-09-11 21:21:43,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 228.0, 39.0, 21.0, 37.0, 186.0, 24.0, 40.0, 125.0, 63.0]
2025-09-11 21:21:43,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (-67.90) for latency MM1Queue_a033_s075
2025-09-11 21:21:43,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 19 hours, 59 minutes, 28 seconds)
2025-09-11 21:34:21,689 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:34:21,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 21:36:39,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 38.55029 ± 41.370
2025-09-11 21:36:39,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [-12.198132, 0.60042876, 80.323784, 43.63131, 100.02347, 90.55043, 16.149101, 69.14298, 9.183459, -11.903939]
2025-09-11 21:36:39,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [97.0, 53.0, 1000.0, 137.0, 1000.0, 1000.0, 65.0, 1000.0, 64.0, 193.0]
2025-09-11 21:36:39,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (38.55) for latency MM1Queue_a033_s075
2025-09-11 21:36:39,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 22 hours, 6 minutes, 9 seconds)
2025-09-11 21:49:37,560 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:49:37,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 21:50:46,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 19.28079 ± 55.750
2025-09-11 21:50:46,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [11.57159, 35.933464, -61.06579, 33.875504, -9.233869, 25.115698, -8.706652, 24.947672, 163.00351, -22.633202]
2025-09-11 21:50:46,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [292.0, 358.0, 151.0, 392.0, 104.0, 168.0, 54.0, 94.0, 385.0, 319.0]
2025-09-11 21:50:46,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 22 hours, 11 minutes, 2 seconds)
2025-09-11 22:03:59,500 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:03:59,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 22:06:59,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 227.87167 ± 183.105
2025-09-11 22:06:59,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [465.07474, 17.899818, 89.37657, 451.4758, 351.68155, 381.0605, 25.386156, 388.06738, 48.90154, 59.792824]
2025-09-11 22:06:59,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 70.0, 295.0, 1000.0, 1000.0, 1000.0, 150.0, 1000.0, 70.0, 251.0]
2025-09-11 22:06:59,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (227.87) for latency MM1Queue_a033_s075
2025-09-11 22:06:59,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 22 hours, 57 minutes, 25 seconds)
2025-09-11 22:19:39,735 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:19:39,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 22:22:32,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 239.87788 ± 157.409
2025-09-11 22:22:32,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [109.124855, 368.2469, 165.57701, 32.804688, 24.156157, 398.23895, 238.95412, 368.1223, 181.11893, 512.4348]
2025-09-11 22:22:32,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [262.0, 1000.0, 220.0, 86.0, 271.0, 876.0, 369.0, 1000.0, 596.0, 1000.0]
2025-09-11 22:22:32,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (239.88) for latency MM1Queue_a033_s075
2025-09-11 22:22:32,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 23 hours, 5 minutes, 57 seconds)
2025-09-11 22:35:00,408 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:35:00,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 22:38:46,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 446.45917 ± 235.637
2025-09-11 22:38:46,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [142.53737, 672.8645, 666.2719, 495.44376, 504.37543, 617.8933, 25.380157, 544.3604, 658.99664, 136.46825]
2025-09-11 22:38:46,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [258.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 34.0, 1000.0, 1000.0, 200.0]
2025-09-11 22:38:46,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (446.46) for latency MM1Queue_a033_s075
2025-09-11 22:38:46,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 24 hours, 8 minutes, 43 seconds)
2025-09-11 22:50:52,036 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:50:52,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 22:54:09,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 361.72168 ± 184.158
2025-09-11 22:54:09,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [29.629654, 314.19308, 742.79016, 375.6207, 401.3353, 254.12868, 302.03793, 424.55237, 558.6708, 214.25772]
2025-09-11 22:54:09,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 356.0, 1000.0, 717.0, 1000.0, 508.0, 470.0, 1000.0, 1000.0, 438.0]
2025-09-11 22:54:09,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 24 hours, 1 minute, 16 seconds)
2025-09-11 23:07:20,833 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:07:20,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:11:45,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 565.62628 ± 187.102
2025-09-11 23:11:45,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [303.25235, 623.39307, 829.08203, 528.56067, 543.8213, 716.6882, 702.73987, 738.9065, 454.3455, 215.47319]
2025-09-11 23:11:45,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [551.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 781.0, 335.0]
2025-09-11 23:11:45,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (565.63) for latency MM1Queue_a033_s075
2025-09-11 23:11:45,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 24 hours, 50 minutes, 10 seconds)
2025-09-11 23:24:40,211 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:24:40,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:29:17,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 699.75891 ± 172.127
2025-09-11 23:29:17,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [598.27045, 655.2199, 337.9428, 824.6999, 973.2009, 631.52277, 732.04364, 744.08453, 592.3004, 908.30426]
2025-09-11 23:29:17,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [722.0, 1000.0, 296.0, 875.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:29:17,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (699.76) for latency MM1Queue_a033_s075
2025-09-11 23:29:17,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 24 hours, 57 minutes, 40 seconds)
2025-09-11 23:41:07,767 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:41:07,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:43:25,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 399.34546 ± 313.529
2025-09-11 23:43:25,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [75.645096, 236.59114, 937.7655, 301.76105, 483.57523, 598.66394, 78.63513, 347.52658, 24.552937, 908.7381]
2025-09-11 23:43:25,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [60.0, 182.0, 1000.0, 296.0, 520.0, 1000.0, 68.0, 355.0, 23.0, 1000.0]
2025-09-11 23:43:25,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 24 hours, 15 minutes, 44 seconds)
2025-09-11 23:56:59,039 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:56:59,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:00:14,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 597.68225 ± 271.966
2025-09-12 00:00:14,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [702.27234, 731.4202, 638.2245, 260.81882, 258.6064, 280.43643, 646.84436, 1211.1481, 622.63556, 624.4155]
2025-09-12 00:00:14,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 228.0, 246.0, 224.0, 556.0, 1000.0, 576.0, 504.0]
2025-09-12 00:00:14,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 24 hours, 9 minutes, 56 seconds)
2025-09-12 00:12:13,064 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:12:13,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:16:22,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 946.01923 ± 315.626
2025-09-12 00:16:22,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [874.63184, 1469.3538, 915.46967, 776.6996, 979.5045, 423.16284, 883.03723, 640.44727, 1506.0938, 991.79205]
2025-09-12 00:16:22,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [667.0, 980.0, 1000.0, 1000.0, 646.0, 321.0, 609.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:16:22,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (946.02) for latency MM1Queue_a033_s075
2025-09-12 00:16:22,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 24 hours, 6 minutes, 58 seconds)
2025-09-12 00:29:32,075 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:29:32,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:33:12,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 604.33630 ± 282.167
2025-09-12 00:33:12,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [171.28642, 617.94006, 530.532, 298.90045, 766.1261, 792.881, 593.72266, 505.8047, 507.0678, 1259.1017]
2025-09-12 00:33:12,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [103.0, 1000.0, 391.0, 201.0, 1000.0, 507.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:33:12,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 23 hours, 37 minutes, 7 seconds)
2025-09-12 00:45:53,719 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:45:53,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:48:44,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 619.69421 ± 321.507
2025-09-12 00:48:44,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [222.01675, 973.3639, 921.9441, 156.81036, 794.3178, 863.9879, 266.9671, 1005.6123, 655.1445, 336.77786]
2025-09-12 00:48:44,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [162.0, 1000.0, 1000.0, 107.0, 1000.0, 565.0, 164.0, 1000.0, 349.0, 160.0]
2025-09-12 00:48:44,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 22 hours, 46 minutes, 34 seconds)
2025-09-12 01:01:11,601 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:01:11,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:04:42,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 974.37500 ± 565.535
2025-09-12 01:04:42,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [857.9751, 961.24774, 239.38321, 1593.2262, 150.96602, 1351.2303, 220.83641, 1146.7449, 1482.4807, 1739.6595]
2025-09-12 01:04:42,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 146.0, 1000.0, 100.0, 793.0, 199.0, 674.0, 1000.0, 1000.0]
2025-09-12 01:04:42,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (974.38) for latency MM1Queue_a033_s075
2025-09-12 01:04:42,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 23 hours, 1 minute, 47 seconds)
2025-09-12 01:18:07,856 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:18:07,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:21:00,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 720.35339 ± 384.051
2025-09-12 01:21:00,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [579.5042, 323.92856, 664.65546, 419.14087, 99.58103, 1217.4158, 843.49927, 796.6544, 801.10944, 1458.0453]
2025-09-12 01:21:00,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 174.0, 412.0, 207.0, 87.0, 1000.0, 457.0, 509.0, 1000.0, 816.0]
2025-09-12 01:21:00,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 22 hours, 36 minutes, 53 seconds)
2025-09-12 01:33:22,138 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:33:22,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:38:06,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1526.96265 ± 445.509
2025-09-12 01:38:06,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1554.7736, 1830.0172, 1786.4271, 1732.6329, 1619.2964, 1707.285, 1806.1361, 1255.6919, 1699.4905, 277.87653]
2025-09-12 01:38:06,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 927.0, 1000.0, 1000.0, 1000.0, 1000.0, 180.0]
2025-09-12 01:38:06,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (1526.96) for latency MM1Queue_a033_s075
2025-09-12 01:38:06,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 22 hours, 36 minutes, 45 seconds)
2025-09-12 01:50:40,248 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:50:40,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:54:34,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1156.97571 ± 660.070
2025-09-12 01:54:34,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2091.3745, 1924.1604, 1839.8403, 1167.3987, 27.807566, 1234.6393, 626.4618, 1541.0662, 523.3413, 593.66705]
2025-09-12 01:54:34,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 651.0, 26.0, 608.0, 348.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:54:34,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 22 hours, 14 minutes, 37 seconds)
2025-09-12 02:07:42,681 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:07:42,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:10:43,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1149.08484 ± 712.403
2025-09-12 02:10:43,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [926.0084, 1893.1827, 242.37329, 2001.518, 1188.5852, 2132.1333, 1539.5099, 305.06558, 104.98223, 1157.4904]
2025-09-12 02:10:43,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [504.0, 1000.0, 136.0, 1000.0, 697.0, 1000.0, 794.0, 182.0, 62.0, 605.0]
2025-09-12 02:10:43,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 22 hours, 8 minutes, 14 seconds)
2025-09-12 02:23:41,046 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:23:41,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:27:41,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1433.67578 ± 565.629
2025-09-12 02:27:41,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [884.3244, 1876.6301, 1091.9562, 517.8908, 900.1258, 1949.8464, 1060.4863, 1938.254, 2158.218, 1959.0261]
2025-09-12 02:27:41,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [458.0, 1000.0, 1000.0, 331.0, 471.0, 1000.0, 581.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:27:41,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 22 hours, 7 minutes, 54 seconds)
2025-09-12 02:39:16,079 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:39:16,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:42:03,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 780.79187 ± 462.680
2025-09-12 02:42:03,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [506.90018, 990.4186, 69.242165, 195.04521, 399.42792, 565.35455, 1388.3793, 1196.9304, 1274.6587, 1221.5618]
2025-09-12 02:42:03,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [233.0, 1000.0, 51.0, 148.0, 182.0, 1000.0, 732.0, 594.0, 1000.0, 565.0]
2025-09-12 02:42:03,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 21 hours, 20 minutes, 43 seconds)
2025-09-12 02:55:17,684 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:55:17,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:57:44,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 981.83868 ± 638.404
2025-09-12 02:57:44,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [160.43423, 1115.1495, 1604.6412, 838.7739, 636.1533, 1387.0419, 927.67847, 2323.6094, 66.47569, 758.42944]
2025-09-12 02:57:44,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [95.0, 1000.0, 691.0, 366.0, 299.0, 636.0, 396.0, 1000.0, 40.0, 361.0]
2025-09-12 02:57:44,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 20 hours, 42 minutes, 30 seconds)
2025-09-12 03:09:55,386 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:09:55,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:14:11,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1413.05530 ± 695.207
2025-09-12 03:14:11,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2356.4446, 2253.899, 2099.2346, 551.09424, 847.42505, 1186.112, 1597.276, 400.03577, 876.08356, 1962.9475]
2025-09-12 03:14:11,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 222.0, 1000.0, 558.0, 681.0, 1000.0, 1000.0, 876.0]
2025-09-12 03:14:11,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 20 hours, 26 minutes, 5 seconds)
2025-09-12 03:27:18,418 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:27:18,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:28:55,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 421.50946 ± 363.811
2025-09-12 03:28:55,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1255.5652, 83.773056, 215.90703, 135.17558, 613.8864, 642.25525, 100.06099, 121.32867, 735.2297, 311.9128]
2025-09-12 03:28:55,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [508.0, 34.0, 90.0, 64.0, 256.0, 1000.0, 52.0, 81.0, 1000.0, 165.0]
2025-09-12 03:28:55,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 19 hours, 48 minutes, 41 seconds)
2025-09-12 03:41:28,171 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:41:28,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:45:11,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1314.80457 ± 709.351
2025-09-12 03:45:11,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [984.4555, 491.00168, 2342.2979, 1018.8228, 2243.448, 983.0208, 1104.0426, 801.30255, 676.368, 2503.286]
2025-09-12 03:45:11,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [423.0, 247.0, 1000.0, 463.0, 1000.0, 1000.0, 1000.0, 370.0, 1000.0, 1000.0]
2025-09-12 03:45:11,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 19 hours, 22 minutes, 31 seconds)
2025-09-12 03:57:15,534 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:57:15,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:59:51,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1022.37012 ± 898.477
2025-09-12 03:59:51,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [56.60065, 247.50899, 1679.7592, 422.72018, 17.669125, 205.50194, 1740.1917, 2515.4634, 2164.0322, 1174.2537]
2025-09-12 03:59:51,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [35.0, 131.0, 1000.0, 163.0, 19.0, 104.0, 682.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:59:51,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 19 hours, 11 minutes, 29 seconds)
2025-09-12 04:12:10,348 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:12:10,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:16:05,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1534.75293 ± 782.114
2025-09-12 04:16:05,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2295.4165, 1249.5016, 1282.0134, 1220.6237, 2373.9321, 382.79376, 164.06053, 1654.9257, 2340.7908, 2383.471]
2025-09-12 04:16:05,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 570.0, 485.0, 1000.0, 1000.0, 86.0, 687.0, 1000.0, 958.0]
2025-09-12 04:16:05,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (1534.75) for latency MM1Queue_a033_s075
2025-09-12 04:16:05,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 19 hours, 3 minutes, 44 seconds)
2025-09-12 04:28:47,914 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:28:47,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:30:41,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 971.35107 ± 774.455
2025-09-12 04:30:41,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [708.71576, 111.00976, 1516.9021, 747.6352, 1094.6395, 1605.8491, 220.05753, 51.107365, 2706.711, 950.88403]
2025-09-12 04:30:41,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [301.0, 52.0, 559.0, 308.0, 439.0, 608.0, 123.0, 33.0, 1000.0, 339.0]
2025-09-12 04:30:41,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 18 hours, 21 minutes, 25 seconds)
2025-09-12 04:44:16,220 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:44:16,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:47:48,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1176.06702 ± 546.425
2025-09-12 04:47:48,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [720.83105, 691.3707, 1736.0504, 1845.8422, 1529.082, 391.68192, 855.0311, 1587.6278, 1821.1843, 581.96826]
2025-09-12 04:47:48,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [286.0, 327.0, 1000.0, 1000.0, 587.0, 156.0, 1000.0, 1000.0, 633.0, 1000.0]
2025-09-12 04:47:48,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 18 hours, 39 minutes, 58 seconds)
2025-09-12 04:59:59,459 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:59:59,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:03:55,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1471.00500 ± 807.187
2025-09-12 05:03:55,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2434.4707, 2374.9229, 1533.0145, 1796.4669, 616.43396, 1046.3867, 140.75389, 641.5897, 1519.2301, 2606.7803]
2025-09-12 05:03:55,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [935.0, 849.0, 1000.0, 661.0, 1000.0, 1000.0, 49.0, 234.0, 1000.0, 1000.0]
2025-09-12 05:03:55,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 18 hours, 22 minutes, 8 seconds)
2025-09-12 05:15:49,816 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:15:49,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:18:53,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1517.93896 ± 1042.982
2025-09-12 05:18:53,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2035.1761, 97.392426, 2067.8467, 107.44671, 1015.7727, 2415.6887, 43.216263, 2719.49, 2710.732, 1966.6279]
2025-09-12 05:18:53,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [815.0, 70.0, 825.0, 56.0, 430.0, 1000.0, 33.0, 1000.0, 1000.0, 793.0]
2025-09-12 05:18:53,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 18 hours, 10 minutes, 38 seconds)
2025-09-12 05:32:03,143 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:32:03,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:34:00,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 821.35120 ± 714.817
2025-09-12 05:34:00,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [545.73694, 111.134544, 1412.1394, 944.2879, 379.15002, 648.0895, 959.8414, 2630.6868, 496.9013, 85.54348]
2025-09-12 05:34:00,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [192.0, 52.0, 504.0, 362.0, 149.0, 229.0, 334.0, 1000.0, 1000.0, 43.0]
2025-09-12 05:34:00,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 17 hours, 39 minutes, 37 seconds)
2025-09-12 05:45:48,433 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:45:48,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:48:24,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 901.32776 ± 805.982
2025-09-12 05:48:24,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [328.7311, 757.6451, 418.92035, 46.458622, 2752.3523, 301.19748, 668.43445, 625.1917, 2008.1962, 1106.1499]
2025-09-12 05:48:24,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [122.0, 1000.0, 1000.0, 25.0, 942.0, 98.0, 285.0, 207.0, 1000.0, 426.0]
2025-09-12 05:48:24,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 17 hours, 21 minutes, 30 seconds)
2025-09-12 06:02:03,410 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:02:03,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:05:44,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1593.91821 ± 918.585
2025-09-12 06:05:44,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [989.6123, 3042.2004, 380.96664, 2973.083, 684.787, 1706.5112, 1737.7102, 416.20547, 1907.5267, 2100.579]
2025-09-12 06:05:44,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [328.0, 1000.0, 158.0, 1000.0, 1000.0, 1000.0, 595.0, 162.0, 1000.0, 1000.0]
2025-09-12 06:05:44,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (1593.92) for latency MM1Queue_a033_s075
2025-09-12 06:05:44,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 17 hours, 8 minutes, 45 seconds)
2025-09-12 06:18:25,026 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:18:25,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:21:20,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1623.44849 ± 1063.098
2025-09-12 06:21:20,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [849.11554, 331.9645, 1429.2916, 701.7976, 2648.294, 448.87784, 979.0266, 2836.2903, 3088.6846, 2921.1428]
2025-09-12 06:21:20,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [313.0, 165.0, 526.0, 243.0, 1000.0, 186.0, 373.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:21:20,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (1623.45) for latency MM1Queue_a033_s075
2025-09-12 06:21:20,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 16 hours, 46 minutes, 29 seconds)
2025-09-12 06:33:01,795 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:33:01,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:35:47,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 685.79388 ± 771.654
2025-09-12 06:35:47,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [284.99762, 79.36431, 2447.2136, 382.7961, 1886.72, 752.1399, 121.29389, 180.61493, 380.70587, 342.0926]
2025-09-12 06:35:47,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [113.0, 48.0, 1000.0, 1000.0, 1000.0, 1000.0, 49.0, 87.0, 153.0, 1000.0]
2025-09-12 06:35:47,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 16 hours, 24 minutes, 10 seconds)
2025-09-12 06:48:16,774 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:48:16,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:51:22,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1332.30164 ± 1053.981
2025-09-12 06:51:22,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [659.7711, 347.23114, 2837.1562, 1535.7434, 323.51343, 187.99355, 1087.2668, 2886.8975, 2792.189, 665.2545]
2025-09-12 06:51:22,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [241.0, 115.0, 1000.0, 1000.0, 1000.0, 78.0, 404.0, 1000.0, 1000.0, 227.0]
2025-09-12 06:51:22,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 16 hours, 14 minutes, 52 seconds)
2025-09-12 07:04:16,686 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:04:16,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:06:07,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1138.20081 ± 922.319
2025-09-12 07:06:07,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [172.48796, 2883.6226, 698.9981, 1853.4944, 2123.5046, 245.30972, 1657.2056, 374.5511, 56.234524, 1316.5992]
2025-09-12 07:06:07,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [70.0, 1000.0, 269.0, 657.0, 637.0, 85.0, 527.0, 133.0, 30.0, 403.0]
2025-09-12 07:06:07,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 16 hours, 3 minutes, 45 seconds)
2025-09-12 07:18:23,332 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:18:23,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:21:09,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1696.25000 ± 987.377
2025-09-12 07:21:09,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [362.01114, 602.263, 2866.0269, 1009.37317, 1879.1649, 2170.2642, 280.0007, 2938.3027, 2594.52, 2260.5752]
2025-09-12 07:21:09,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [127.0, 211.0, 1000.0, 335.0, 641.0, 671.0, 128.0, 1000.0, 812.0, 795.0]
2025-09-12 07:21:09,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (1696.25) for latency MM1Queue_a033_s075
2025-09-12 07:21:09,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 15 hours, 20 minutes, 6 seconds)
2025-09-12 07:33:32,165 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:33:32,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:35:11,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 786.32281 ± 914.803
2025-09-12 07:35:11,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [192.62265, 1617.5415, 947.6523, 277.07126, 579.07684, 273.60995, 261.05066, 3208.9111, 352.88608, 152.80588]
2025-09-12 07:35:11,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [83.0, 495.0, 318.0, 92.0, 204.0, 117.0, 82.0, 1000.0, 1000.0, 60.0]
2025-09-12 07:35:11,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 14 hours, 46 minutes, 4 seconds)
2025-09-12 07:47:28,446 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:47:28,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:50:19,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1674.68616 ± 1088.248
2025-09-12 07:50:19,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [792.4466, 403.58856, 3006.7678, 154.02768, 1176.563, 745.01575, 2832.7844, 2457.4683, 3147.3528, 2030.8462]
2025-09-12 07:50:19,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [306.0, 117.0, 1000.0, 82.0, 392.0, 268.0, 1000.0, 1000.0, 1000.0, 676.0]
2025-09-12 07:50:19,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 14 hours, 39 minutes, 36 seconds)
2025-09-12 08:01:39,568 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:01:39,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:03:03,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 488.20734 ± 771.578
2025-09-12 08:03:03,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [439.06628, 130.99554, 550.5136, 335.5002, 2754.193, 152.60715, 112.012054, 96.859726, 28.110579, 282.21466]
2025-09-12 08:03:03,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [185.0, 52.0, 232.0, 166.0, 1000.0, 59.0, 44.0, 48.0, 40.0, 1000.0]
2025-09-12 08:03:03,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 13 hours, 51 minutes, 27 seconds)
2025-09-12 08:15:30,412 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:15:30,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:17:28,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1208.62671 ± 1093.543
2025-09-12 08:17:28,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2958.4985, 28.960258, 113.80278, 2426.45, 2620.5513, 81.93703, 169.72337, 691.14594, 1346.131, 1649.067]
2025-09-12 08:17:28,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [887.0, 23.0, 43.0, 879.0, 860.0, 43.0, 63.0, 245.0, 448.0, 485.0]
2025-09-12 08:17:28,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 13 hours, 33 minutes, 23 seconds)
2025-09-12 08:29:43,478 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:29:43,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:31:58,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1052.23425 ± 768.994
2025-09-12 08:31:58,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [391.56503, 488.7256, 108.298164, 1615.0812, 2112.533, 674.1279, 2301.6836, 158.81029, 1622.4438, 1049.0739]
2025-09-12 08:31:58,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [143.0, 187.0, 79.0, 542.0, 1000.0, 261.0, 697.0, 77.0, 590.0, 1000.0]
2025-09-12 08:31:58,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 13 hours, 13 minutes, 5 seconds)
2025-09-12 08:44:53,924 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:44:53,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:48:21,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1131.60852 ± 1012.042
2025-09-12 08:48:21,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2353.0486, 2223.4004, 38.476913, 988.1406, 511.74722, 553.844, 166.43834, 3124.8267, 285.04608, 1071.1165]
2025-09-12 08:48:21,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [779.0, 786.0, 35.0, 313.0, 1000.0, 1000.0, 78.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:48:21,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 13 hours, 24 minutes, 54 seconds)
2025-09-12 09:00:49,070 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:00:49,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:01:29,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 370.58978 ± 331.578
2025-09-12 09:01:29,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [206.86472, 887.47955, 201.65215, 131.0395, 242.49406, 391.42557, 70.52121, 75.92348, 1094.9794, 403.5185]
2025-09-12 09:01:29,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [83.0, 326.0, 67.0, 56.0, 108.0, 174.0, 36.0, 43.0, 351.0, 163.0]
2025-09-12 09:01:29,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 12 hours, 48 minutes, 30 seconds)
2025-09-12 09:13:04,465 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:13:04,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:16:24,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1704.66174 ± 1143.001
2025-09-12 09:16:24,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1200.8337, 482.09723, 2728.5408, 3282.4841, 807.3319, 836.5649, 3221.7114, 739.5395, 663.5331, 3083.9805]
2025-09-12 09:16:24,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [393.0, 163.0, 827.0, 1000.0, 275.0, 237.0, 1000.0, 1000.0, 1000.0, 865.0]
2025-09-12 09:16:24,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (1704.66) for latency MM1Queue_a033_s075
2025-09-12 09:16:24,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 12 hours, 57 minutes, 32 seconds)
2025-09-12 09:28:40,740 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:28:40,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:31:05,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1293.14258 ± 1167.531
2025-09-12 09:31:05,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3382.1504, 361.4602, 144.63199, 267.20636, 1211.9276, 3392.9392, 810.5527, 777.8994, 2037.5054, 545.152]
2025-09-12 09:31:05,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 53.0, 107.0, 365.0, 1000.0, 295.0, 236.0, 645.0, 206.0]
2025-09-12 09:31:05,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 12 hours, 45 minutes, 35 seconds)
2025-09-12 09:43:05,668 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:43:05,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:45:38,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1165.97253 ± 712.615
2025-09-12 09:45:38,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [65.7227, 1749.1846, 2018.4576, 114.09713, 1914.9307, 1250.5356, 731.4274, 1166.2798, 658.53766, 1990.5518]
2025-09-12 09:45:38,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 1000.0, 593.0, 66.0, 1000.0, 431.0, 227.0, 1000.0, 249.0, 604.0]
2025-09-12 09:45:38,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 12 hours, 31 minutes, 23 seconds)
2025-09-12 09:58:34,719 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:58:34,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:01:13,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1065.60291 ± 644.906
2025-09-12 10:01:13,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [638.17303, 970.5435, 691.05865, 726.439, 1829.2205, 1994.4901, 328.20966, 356.95325, 956.18616, 2164.755]
2025-09-12 10:01:13,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [204.0, 312.0, 1000.0, 219.0, 1000.0, 604.0, 1000.0, 139.0, 345.0, 666.0]
2025-09-12 10:01:13,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 12 hours, 8 minutes, 42 seconds)
2025-09-12 10:13:04,957 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:13:04,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:15:22,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1470.76270 ± 1130.147
2025-09-12 10:15:22,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1167.791, 171.0004, 488.40533, 3021.4512, 886.87836, 3236.7908, 94.20339, 2298.9248, 2537.3096, 804.87164]
2025-09-12 10:15:22,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [397.0, 73.0, 141.0, 1000.0, 277.0, 1000.0, 40.0, 698.0, 748.0, 254.0]
2025-09-12 10:15:22,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 12 hours, 4 minutes, 9 seconds)
2025-09-12 10:28:02,486 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:28:02,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:30:54,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1926.67712 ± 1067.168
2025-09-12 10:30:54,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1495.6729, 1174.6523, 3224.739, 1056.6644, 1005.2178, 3357.6162, 403.01587, 2426.6323, 1581.0593, 3541.5017]
2025-09-12 10:30:54,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [476.0, 349.0, 1000.0, 303.0, 351.0, 1000.0, 134.0, 771.0, 520.0, 1000.0]
2025-09-12 10:30:54,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (1926.68) for latency MM1Queue_a033_s075
2025-09-12 10:30:54,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 11 hours, 55 minutes, 11 seconds)
2025-09-12 10:42:24,022 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:42:24,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:44:12,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 903.55627 ± 849.651
2025-09-12 10:44:12,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3253.782, 852.44366, 156.59575, 871.96985, 1035.212, 177.37918, 1082.1052, 732.5734, 221.86598, 651.63525]
2025-09-12 10:44:12,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 243.0, 1000.0, 264.0, 291.0, 82.0, 275.0, 263.0, 83.0, 208.0]
2025-09-12 10:44:12,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 11 hours, 27 minutes, 15 seconds)
2025-09-12 10:56:44,285 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:56:44,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:59:48,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1265.66504 ± 1058.420
2025-09-12 10:59:48,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1092.0765, 3277.0676, 109.141815, 932.0282, 181.72482, 1035.2704, 2926.2568, 70.55995, 1253.2573, 1779.2677]
2025-09-12 10:59:48,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [379.0, 1000.0, 1000.0, 284.0, 94.0, 1000.0, 1000.0, 40.0, 439.0, 1000.0]
2025-09-12 10:59:48,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 11 hours, 22 minutes, 27 seconds)
2025-09-12 11:12:41,599 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:12:41,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:15:55,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2049.74463 ± 1101.705
2025-09-12 11:15:55,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3176.5208, 3336.92, 561.7624, 2890.521, 3120.687, 544.1327, 1314.5883, 1032.7434, 1480.8964, 3038.6763]
2025-09-12 11:15:55,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 177.0, 1000.0, 1000.0, 187.0, 435.0, 369.0, 500.0, 1000.0]
2025-09-12 11:15:55,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (2049.74) for latency MM1Queue_a033_s075
2025-09-12 11:15:55,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 11 hours, 12 minutes, 16 seconds)
2025-09-12 11:28:04,542 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:28:04,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:29:38,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 999.72528 ± 955.925
2025-09-12 11:29:38,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [398.48697, 194.21568, 2669.133, 2430.3752, 175.30537, 857.3643, 2114.3022, 718.7532, 52.709713, 386.6071]
2025-09-12 11:29:38,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [129.0, 85.0, 852.0, 771.0, 65.0, 316.0, 565.0, 226.0, 27.0, 120.0]
2025-09-12 11:29:38,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 10 hours, 53 minutes, 28 seconds)
2025-09-12 11:42:07,999 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:42:08,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:45:51,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1341.18384 ± 940.697
2025-09-12 11:45:51,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [986.77515, 634.30963, 323.9279, 164.55418, 2730.9807, 720.5929, 766.13525, 2309.0017, 2265.6904, 2509.8708]
2025-09-12 11:45:51,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 928.0, 246.0, 243.0, 694.0, 647.0, 729.0]
2025-09-12 11:45:51,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 10 hours, 44 minutes, 35 seconds)
2025-09-12 11:58:24,682 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:58:24,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:00:48,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 978.53204 ± 1102.605
2025-09-12 12:00:48,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [143.36687, 467.55728, 3410.9302, 1252.4491, 2676.8357, 947.1093, 123.282425, 266.24835, 343.79944, 153.74077]
2025-09-12 12:00:48,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [53.0, 146.0, 1000.0, 376.0, 1000.0, 1000.0, 58.0, 103.0, 1000.0, 56.0]
2025-09-12 12:00:48,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 10 hours, 43 minutes, 23 seconds)
2025-09-12 12:12:17,526 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:12:17,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:15:13,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1334.15210 ± 824.947
2025-09-12 12:15:13,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [559.7838, 1764.913, 2650.76, 707.58875, 1663.2954, 2826.3494, 1199.5095, 453.57056, 532.70526, 983.0468]
2025-09-12 12:15:13,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [194.0, 598.0, 837.0, 227.0, 1000.0, 818.0, 1000.0, 159.0, 202.0, 1000.0]
2025-09-12 12:15:13,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 10 hours, 18 minutes, 23 seconds)
2025-09-12 12:27:36,837 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:27:36,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:29:46,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1134.24243 ± 1136.464
2025-09-12 12:29:46,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3188.309, 428.08646, 1546.0795, 62.558285, 106.204544, 303.30716, 1076.9397, 706.45, 625.31494, 3299.1755]
2025-09-12 12:29:46,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 124.0, 486.0, 36.0, 62.0, 105.0, 381.0, 1000.0, 242.0, 1000.0]
2025-09-12 12:29:46,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 9 hours, 50 minutes, 44 seconds)
2025-09-12 12:42:32,395 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:42:32,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:43:57,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 969.13574 ± 1188.032
2025-09-12 12:43:57,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [327.8265, 57.965157, 2496.689, 869.51636, 94.99197, 206.27554, 3601.7605, 1831.543, 158.68744, 46.10156]
2025-09-12 12:43:57,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [138.0, 51.0, 660.0, 293.0, 53.0, 81.0, 1000.0, 577.0, 57.0, 24.0]
2025-09-12 12:43:57,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 9 hours, 39 minutes, 42 seconds)
2025-09-12 12:56:38,795 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:56:38,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:59:22,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1145.00903 ± 1154.312
2025-09-12 12:59:22,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [964.0912, 2705.401, 408.39188, 214.32782, 578.11444, 677.2626, 193.69095, 2082.3105, 3578.363, 48.136673]
2025-09-12 12:59:22,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 869.0, 127.0, 1000.0, 175.0, 286.0, 66.0, 1000.0, 1000.0, 30.0]
2025-09-12 12:59:22,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 9 hours, 18 minutes, 42 seconds)
2025-09-12 13:11:03,913 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:11:03,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:14:47,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2200.80420 ± 932.211
2025-09-12 13:14:47,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2970.7642, 2900.301, 1191.7708, 3306.0999, 686.294, 3362.6396, 1811.9283, 1266.429, 1666.7303, 2845.0857]
2025-09-12 13:14:47,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [919.0, 1000.0, 1000.0, 1000.0, 234.0, 1000.0, 529.0, 371.0, 496.0, 1000.0]
2025-09-12 13:14:47,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (2200.80) for latency MM1Queue_a033_s075
2025-09-12 13:14:47,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 9 hours, 7 minutes, 27 seconds)
2025-09-12 13:27:19,495 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:27:19,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:29:40,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1397.86353 ± 959.355
2025-09-12 13:29:40,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2413.4463, 530.5449, 1799.3895, 1816.9147, 3479.1672, 544.32513, 1153.6058, 69.51983, 1041.5236, 1130.1974]
2025-09-12 13:29:40,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [692.0, 174.0, 541.0, 492.0, 1000.0, 181.0, 1000.0, 33.0, 325.0, 354.0]
2025-09-12 13:29:40,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 8 hours, 55 minutes, 59 seconds)
2025-09-12 13:42:06,535 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:42:06,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:44:39,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1433.70776 ± 1175.188
2025-09-12 13:44:39,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1497.0278, 2516.217, 517.7903, 137.65166, 631.74274, 905.20856, 3323.8196, 3398.4045, 113.85766, 1295.3574]
2025-09-12 13:44:39,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [406.0, 700.0, 161.0, 54.0, 211.0, 368.0, 930.0, 1000.0, 1000.0, 405.0]
2025-09-12 13:44:39,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 8 hours, 44 minutes, 13 seconds)
2025-09-12 13:56:53,349 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:56:53,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:00:47,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2578.10986 ± 741.144
2025-09-12 14:00:47,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3222.4368, 3256.0164, 3341.306, 3318.1362, 2659.6355, 1647.7917, 1671.2539, 3186.6836, 1824.7651, 1653.0695]
2025-09-12 14:00:47,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 785.0, 479.0, 511.0, 1000.0, 550.0, 473.0]
2025-09-12 14:00:47,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (2578.11) for latency MM1Queue_a033_s075
2025-09-12 14:00:47,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 8 hours, 42 minutes, 27 seconds)
2025-09-12 14:12:46,829 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:12:46,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:14:52,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1245.25696 ± 1068.865
2025-09-12 14:14:52,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1503.3485, 784.56976, 2850.746, 1669.1838, 66.7398, 781.98956, 3417.1514, 80.15219, 638.758, 659.9304]
2025-09-12 14:14:52,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 225.0, 846.0, 493.0, 36.0, 226.0, 1000.0, 42.0, 210.0, 218.0]
2025-09-12 14:14:52,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 18 minutes, 20 seconds)
2025-09-12 14:27:14,242 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:27:14,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:31:24,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2140.79175 ± 1281.660
2025-09-12 14:31:24,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1283.8247, 633.476, 364.5818, 3343.4731, 2762.394, 3315.766, 182.23798, 3169.6199, 3095.8025, 3256.7412]
2025-09-12 14:31:24,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [361.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 101.0, 1000.0, 1000.0, 871.0]
2025-09-12 14:31:24,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 10 minutes, 21 seconds)
2025-09-12 14:43:31,234 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:43:31,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:46:42,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1879.76135 ± 1252.188
2025-09-12 14:46:42,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3245.63, 220.81392, 1299.46, 1958.6814, 839.1631, 3307.8125, 3564.7158, 418.64984, 829.80896, 3112.8782]
2025-09-12 14:46:42,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 106.0, 390.0, 579.0, 249.0, 1000.0, 1000.0, 1000.0, 251.0, 880.0]
2025-09-12 14:46:42,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 57 minutes, 35 seconds)
2025-09-12 14:58:52,699 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:58:52,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:00:36,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1175.93140 ± 537.839
2025-09-12 15:00:36,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1202.8423, 879.4534, 1472.1848, 570.11865, 1753.1068, 1772.7954, 16.214067, 1350.635, 1693.7621, 1048.2013]
2025-09-12 15:00:36,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [376.0, 241.0, 404.0, 176.0, 555.0, 503.0, 17.0, 416.0, 513.0, 302.0]
2025-09-12 15:00:36,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 35 minutes, 41 seconds)
2025-09-12 15:13:10,035 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:13:10,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:16:23,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2170.86865 ± 1170.582
2025-09-12 15:16:23,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [232.70218, 1668.6089, 3461.091, 1536.5157, 3165.3928, 3104.3425, 3390.644, 775.8811, 1106.9413, 3266.5654]
2025-09-12 15:16:23,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [86.0, 491.0, 1000.0, 437.0, 1000.0, 1000.0, 1000.0, 257.0, 339.0, 1000.0]
2025-09-12 15:16:23,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 18 minutes, 27 seconds)
2025-09-12 15:28:50,834 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:28:50,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:31:47,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1661.39819 ± 1058.112
2025-09-12 15:31:47,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1676.6564, 2889.907, 1919.5378, 1850.297, 3395.0693, 414.4127, 2754.9338, 274.70914, 973.7357, 464.72452]
2025-09-12 15:31:47,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [447.0, 831.0, 1000.0, 514.0, 957.0, 174.0, 857.0, 84.0, 1000.0, 147.0]
2025-09-12 15:31:47,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 10 minutes, 42 seconds)
2025-09-12 15:44:42,378 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:44:42,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:48:13,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2048.95679 ± 1147.025
2025-09-12 15:48:13,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [141.6647, 2444.9846, 3559.5112, 728.4301, 3337.2537, 1000.4039, 1303.2953, 3426.462, 2380.3257, 2167.237]
2025-09-12 15:48:13,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [57.0, 709.0, 1000.0, 257.0, 1000.0, 1000.0, 376.0, 1000.0, 1000.0, 609.0]
2025-09-12 15:48:13,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 54 minutes, 47 seconds)
2025-09-12 16:00:38,490 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:00:38,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:02:22,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1021.68280 ± 755.428
2025-09-12 16:02:22,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [157.52872, 1915.8961, 1692.0425, 707.53436, 1934.5759, 40.638435, 128.47658, 718.4858, 940.725, 1980.9253]
2025-09-12 16:02:22,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [82.0, 548.0, 565.0, 259.0, 1000.0, 27.0, 47.0, 238.0, 254.0, 591.0]
2025-09-12 16:02:22,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 33 minutes, 31 seconds)
2025-09-12 16:14:06,859 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:14:06,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:17:50,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2108.05566 ± 1119.754
2025-09-12 16:17:50,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2388.9624, 1112.5914, 1340.4835, 129.60927, 2450.165, 3366.185, 2991.79, 778.97864, 3433.7793, 3088.011]
2025-09-12 16:17:50,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [699.0, 1000.0, 1000.0, 46.0, 741.0, 1000.0, 875.0, 220.0, 1000.0, 971.0]
2025-09-12 16:17:50,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 26 minutes, 11 seconds)
2025-09-12 16:29:48,198 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:29:48,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:33:37,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2646.74951 ± 1185.273
2025-09-12 16:33:37,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3814.3716, 1624.1716, 3405.57, 2857.205, 3589.2922, 767.3034, 380.5239, 3267.4502, 3390.791, 3370.8167]
2025-09-12 16:33:37,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 499.0, 1000.0, 815.0, 1000.0, 250.0, 113.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:33:37,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (2646.75) for latency MM1Queue_a033_s075
2025-09-12 16:33:37,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 10 minutes, 46 seconds)
2025-09-12 16:46:41,287 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:46:41,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:49:53,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2056.68555 ± 964.495
2025-09-12 16:49:53,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1868.602, 2147.4358, 1151.0808, 1886.9757, 1758.1046, 1694.34, 207.15556, 3481.244, 3493.2683, 2878.6475]
2025-09-12 16:49:53,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 667.0, 312.0, 606.0, 519.0, 500.0, 57.0, 1000.0, 1000.0, 815.0]
2025-09-12 16:49:53,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 59 minutes, 15 seconds)
2025-09-12 17:01:21,130 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:01:21,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:04:40,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1610.67163 ± 1085.846
2025-09-12 17:04:40,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [304.88806, 2486.6326, 2473.4678, 933.2012, 484.69116, 485.81522, 2465.444, 1875.3123, 3683.276, 913.9877]
2025-09-12 17:04:40,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 614.0, 697.0, 343.0, 132.0, 1000.0, 753.0, 1000.0, 1000.0, 280.0]
2025-09-12 17:04:40,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 36 minutes, 26 seconds)
2025-09-12 17:17:25,890 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:17:25,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:19:05,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 889.55255 ± 731.249
2025-09-12 17:19:05,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2127.7998, 412.77655, 524.7542, 106.85869, 624.1198, 1928.7722, 179.6972, 1230.6741, 148.55916, 1611.5133]
2025-09-12 17:19:05,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [624.0, 140.0, 182.0, 63.0, 192.0, 497.0, 65.0, 1000.0, 49.0, 513.0]
2025-09-12 17:19:05,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 22 minutes, 9 seconds)
2025-09-12 17:31:09,062 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:31:09,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:32:49,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1177.60083 ± 1153.579
2025-09-12 17:32:49,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1320.0286, 2150.6465, 3274.373, 125.281425, 324.1152, 399.00507, 3039.8167, 654.23096, 149.84593, 338.66385]
2025-09-12 17:32:49,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [361.0, 551.0, 957.0, 56.0, 89.0, 127.0, 900.0, 214.0, 61.0, 107.0]
2025-09-12 17:32:49,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 59 minutes, 55 seconds)
2025-09-12 17:45:16,488 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:45:16,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:48:17,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1693.76404 ± 1066.038
2025-09-12 17:48:17,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [885.4062, 483.8321, 178.18619, 3542.5903, 1408.7708, 1987.0055, 2108.8755, 3446.0144, 1476.9719, 1419.9891]
2025-09-12 17:48:17,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [243.0, 170.0, 1000.0, 1000.0, 350.0, 1000.0, 564.0, 1000.0, 415.0, 411.0]
2025-09-12 17:48:17,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 43 minutes, 43 seconds)
2025-09-12 18:00:22,459 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:00:22,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:03:25,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2115.74634 ± 1123.661
2025-09-12 18:03:25,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [405.20538, 1225.5508, 3682.14, 1622.3965, 1850.818, 1542.1122, 1133.8394, 2408.4536, 3426.9097, 3860.0364]
2025-09-12 18:03:25,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [130.0, 371.0, 1000.0, 424.0, 526.0, 442.0, 339.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:03:25,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 24 minutes, 43 seconds)
2025-09-12 18:15:57,125 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:15:57,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:18:03,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1147.92993 ± 1158.739
2025-09-12 18:18:03,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [986.4945, 430.20148, 140.3247, 3302.9128, 237.44064, 717.55383, 2627.18, 371.37625, 2627.6062, 38.20852]
2025-09-12 18:18:03,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [320.0, 1000.0, 51.0, 981.0, 116.0, 208.0, 761.0, 129.0, 735.0, 23.0]
2025-09-12 18:18:03,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 9 minutes, 29 seconds)
2025-09-12 18:30:37,813 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:30:37,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:33:38,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1983.72461 ± 1029.364
2025-09-12 18:33:38,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1187.6796, 745.8756, 985.9833, 1912.0977, 3047.2214, 1298.8048, 3630.594, 2499.8296, 1104.2184, 3424.9417]
2025-09-12 18:33:38,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [331.0, 206.0, 300.0, 1000.0, 793.0, 433.0, 1000.0, 707.0, 354.0, 998.0]
2025-09-12 18:33:38,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 58 minutes, 34 seconds)
2025-09-12 18:45:34,696 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:45:34,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:48:27,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2078.94971 ± 1258.827
2025-09-12 18:48:27,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3419.2507, 56.756153, 3174.9045, 2003.7057, 1671.2877, 1199.6077, 3899.5837, 658.08887, 3432.6545, 1273.6581]
2025-09-12 18:48:27,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 32.0, 868.0, 597.0, 422.0, 382.0, 1000.0, 229.0, 931.0, 354.0]
2025-09-12 18:48:27,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 46 minutes, 55 seconds)
2025-09-12 19:01:04,429 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:01:04,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:03:34,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1339.67090 ± 940.470
2025-09-12 19:03:34,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2027.0594, 1662.4901, 942.8181, 1021.53394, 1026.3577, 410.53586, 3627.1387, 263.269, 650.07434, 1765.4318]
2025-09-12 19:03:34,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [577.0, 1000.0, 1000.0, 281.0, 339.0, 151.0, 1000.0, 97.0, 206.0, 465.0]
2025-09-12 19:03:34,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 30 minutes, 48 seconds)
2025-09-12 19:15:44,361 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:15:44,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:19:15,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2141.62036 ± 1057.172
2025-09-12 19:19:15,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3659.6965, 2598.143, 1144.6235, 971.66, 2531.681, 2562.9624, 3374.9885, 1781.8221, 129.20424, 2661.4194]
2025-09-12 19:19:15,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 785.0, 315.0, 285.0, 756.0, 786.0, 1000.0, 515.0, 1000.0, 713.0]
2025-09-12 19:19:15,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 17 minutes, 9 seconds)
2025-09-12 19:31:40,263 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:31:40,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:34:37,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1755.07483 ± 1125.647
2025-09-12 19:34:37,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2221.0925, 2566.5078, 462.6676, 3477.4792, 931.83276, 536.0567, 3643.6873, 1829.8682, 721.8375, 1159.7163]
2025-09-12 19:34:37,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 731.0, 149.0, 1000.0, 272.0, 157.0, 1000.0, 527.0, 241.0, 1000.0]
2025-09-12 19:34:37,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 3 minutes, 44 seconds)
2025-09-12 19:47:57,669 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:47:57,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:51:17,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2348.70142 ± 1084.229
2025-09-12 19:51:17,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1795.2137, 3246.5676, 686.4475, 3517.8508, 2219.5618, 1438.318, 2652.0535, 775.0818, 3381.622, 3774.2952]
2025-09-12 19:51:17,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [550.0, 1000.0, 219.0, 1000.0, 674.0, 373.0, 755.0, 215.0, 1000.0, 1000.0]
2025-09-12 19:51:17,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 50 minutes, 50 seconds)
2025-09-12 20:03:34,489 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:03:34,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:06:25,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1770.17773 ± 1120.357
2025-09-12 20:06:25,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2503.0454, 259.68765, 1942.9626, 2052.4722, 410.02704, 492.62158, 937.12317, 3251.84, 3464.7334, 2387.2642]
2025-09-12 20:06:25,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [698.0, 99.0, 542.0, 1000.0, 159.0, 153.0, 244.0, 1000.0, 1000.0, 733.0]
2025-09-12 20:06:25,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 35 minutes, 54 seconds)
2025-09-12 20:18:04,356 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:18:04,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:21:19,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2215.32690 ± 787.832
2025-09-12 20:21:19,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3552.4277, 1348.0771, 2628.581, 1599.2808, 1741.8429, 3376.0696, 1823.8586, 1144.8188, 2710.9539, 2227.359]
2025-09-12 20:21:19,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 355.0, 1000.0, 498.0, 520.0, 952.0, 568.0, 298.0, 748.0, 628.0]
2025-09-12 20:21:19,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 19 minutes, 55 seconds)
2025-09-12 20:33:20,415 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:33:20,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:36:15,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2142.87866 ± 1380.833
2025-09-12 20:36:15,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [181.0095, 3240.6084, 872.1015, 3577.6582, 3726.99, 138.39215, 2464.1445, 1472.3248, 1833.9364, 3921.6218]
2025-09-12 20:36:15,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [65.0, 937.0, 274.0, 1000.0, 1000.0, 50.0, 673.0, 446.0, 502.0, 1000.0]
2025-09-12 20:36:15,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 3 minutes, 12 seconds)
2025-09-12 20:49:07,071 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:49:07,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:51:26,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1645.42810 ± 1351.934
2025-09-12 20:51:26,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [110.89736, 610.00653, 3673.2944, 1379.6449, 402.00934, 954.81433, 554.1857, 1624.1443, 3684.156, 3461.129]
2025-09-12 20:51:26,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [41.0, 207.0, 985.0, 403.0, 103.0, 300.0, 180.0, 473.0, 1000.0, 1000.0]
2025-09-12 20:51:26,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 47 minutes, 32 seconds)
2025-09-12 21:04:08,359 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:04:08,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:07:58,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2604.25781 ± 1039.372
2025-09-12 21:07:58,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1836.1945, 1680.7512, 3647.218, 1039.7885, 2970.7417, 3574.2778, 3272.5933, 984.9989, 3443.7454, 3592.2686]
2025-09-12 21:07:58,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [597.0, 553.0, 1000.0, 348.0, 872.0, 1000.0, 1000.0, 321.0, 1000.0, 1000.0]
2025-09-12 21:07:58,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 32 minutes)
2025-09-12 21:19:32,777 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:19:32,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:23:00,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2414.28052 ± 1136.058
2025-09-12 21:23:00,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1773.6295, 19.607794, 2649.3652, 3241.5715, 3725.3088, 3678.2485, 2162.7273, 3613.112, 1591.5834, 1687.65]
2025-09-12 21:23:00,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [559.0, 28.0, 765.0, 1000.0, 1000.0, 1000.0, 578.0, 1000.0, 486.0, 524.0]
2025-09-12 21:23:00,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 16 minutes, 35 seconds)
2025-09-12 21:35:28,566 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:35:28,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:38:38,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1979.05896 ± 1369.990
2025-09-12 21:38:38,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2373.9036, 2307.6313, 3617.1208, 169.85423, 3471.9114, 81.56801, 1243.1014, 334.7079, 2394.9033, 3795.8887]
2025-09-12 21:38:38,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [694.0, 663.0, 927.0, 73.0, 1000.0, 55.0, 382.0, 1000.0, 584.0, 1000.0]
2025-09-12 21:38:38,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 1 minute, 51 seconds)
2025-09-12 21:51:38,083 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:51:38,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:56:02,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2795.86792 ± 1283.617
2025-09-12 21:56:02,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [3597.4795, 3457.2754, 3542.3303, 3532.7966, 2868.2258, 3371.5327, 394.30093, 3463.4382, 3600.0095, 131.29018]
2025-09-12 21:56:02,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 807.0, 1000.0, 1000.0, 1000.0, 1000.0, 62.0]
2025-09-12 21:56:02,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1226 [INFO]: New best (2795.87) for latency MM1Queue_a033_s075
2025-09-12 21:56:02,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 47 minutes, 52 seconds)
2025-09-12 22:08:23,404 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:08:23,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:10:44,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1691.87634 ± 1350.939
2025-09-12 22:10:44,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2246.8984, 595.4601, 547.3929, 3561.5874, 1060.7852, 447.39496, 1224.9901, 3365.8594, 3784.855, 83.538574]
2025-09-12 22:10:44,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [694.0, 188.0, 166.0, 1000.0, 259.0, 116.0, 326.0, 1000.0, 1000.0, 44.0]
2025-09-12 22:10:44,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 31 minutes, 43 seconds)
2025-09-12 22:22:53,758 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:22:53,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:24:54,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1374.84778 ± 1114.903
2025-09-12 22:24:54,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [2210.4587, 1935.1344, 660.9827, 19.232798, 813.7913, 3713.8037, 1114.6727, 360.39688, 2502.5503, 417.4545]
2025-09-12 22:24:54,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [671.0, 603.0, 180.0, 26.0, 246.0, 1000.0, 319.0, 116.0, 787.0, 112.0]
2025-09-12 22:24:54,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 15 minutes, 23 seconds)
2025-09-12 22:37:51,985 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:37:51,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:39:55,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1299.35156 ± 959.022
2025-09-12 22:39:55,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1222 [DEBUG]: All rewards: [1657.4666, 378.32883, 231.67917, 1562.2336, 1674.1309, 3640.6392, 190.37581, 1192.5259, 1497.223, 968.9126]
2025-09-12 22:39:55,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [464.0, 129.0, 77.0, 416.0, 479.0, 982.0, 72.0, 350.0, 1000.0, 268.0]
2025-09-12 22:39:55,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-ant):1251 [DEBUG]: Training session finished
