2025-09-11 22:44:31,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc20-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-09-11 22:44:31,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc20-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-09-11 22:44:31,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14b15bde51d0>}
2025-09-11 22:44:31,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1111 [DEBUG]: using device: cuda
2025-09-11 22:44:31,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1133 [INFO]: Creating new trainer
2025-09-11 22:44:31,586 baseline-mbpac-noiseperc20-ant:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-09-11 22:44:31,586 baseline-mbpac-noiseperc20-ant:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 22:44:31,596 baseline-mbpac-noiseperc20-ant:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=27, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=8, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-11 22:44:32,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1194 [DEBUG]: Starting training session...
2025-09-11 22:44:32,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 1/100
2025-09-11 22:56:25,413 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:56:25,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 22:57:25,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -120.36214 ± 146.226
2025-09-11 22:57:25,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-74.64541, -94.44529, -82.596245, -137.23541, -183.24107, -11.7104, -64.50285, -529.46027, -11.044988, -14.7395525]
2025-09-11 22:57:25,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [113.0, 75.0, 138.0, 309.0, 273.0, 20.0, 116.0, 1000.0, 67.0, 54.0]
2025-09-11 22:57:25,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (-120.36) for latency MM1Queue_a033_s075
2025-09-11 22:57:25,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 21 hours, 15 minutes, 1 second)
2025-09-11 23:08:12,010 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:08:12,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:08:54,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -64.82129 ± 114.005
2025-09-11 23:08:54,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-84.78518, -2.7944758, -41.50504, 2.3227804, -398.8771, -2.6991093, -31.365618, -32.06285, -41.46342, -14.982859]
2025-09-11 23:08:54,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [112.0, 14.0, 138.0, 14.0, 1000.0, 16.0, 77.0, 63.0, 70.0, 33.0]
2025-09-11 23:08:54,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (-64.82) for latency MM1Queue_a033_s075
2025-09-11 23:08:54,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 19 hours, 53 minutes, 24 seconds)
2025-09-11 23:20:39,672 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:20:39,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:21:51,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -103.53581 ± 157.146
2025-09-11 23:21:51,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-10.283897, -435.41403, -7.016765, -35.29407, -42.084785, -106.840675, 7.392943, -386.94446, -12.155997, -6.7164416]
2025-09-11 23:21:51,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [15.0, 1000.0, 12.0, 71.0, 37.0, 328.0, 21.0, 1000.0, 22.0, 38.0]
2025-09-11 23:21:51,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 20 hours, 6 minutes, 35 seconds)
2025-09-11 23:34:25,063 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:34:25,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:35:40,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -84.40595 ± 140.926
2025-09-11 23:35:40,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-26.762144, -13.447054, -59.860878, -102.62202, 9.885768, -375.76602, 90.86105, -27.7292, -322.89865, -15.720424]
2025-09-11 23:35:40,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [50.0, 26.0, 131.0, 226.0, 30.0, 1000.0, 160.0, 48.0, 1000.0, 29.0]
2025-09-11 23:35:40,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 20 hours, 27 minutes, 1 second)
2025-09-11 23:46:52,038 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:46:52,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:47:33,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -50.35049 ± 113.864
2025-09-11 23:47:33,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-17.207932, -62.460438, 10.9630165, -8.57285, -13.018873, -384.56442, 11.014698, -48.861614, 4.066024, 5.1374364]
2025-09-11 23:47:33,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [44.0, 88.0, 44.0, 31.0, 42.0, 1000.0, 54.0, 62.0, 16.0, 94.0]
2025-09-11 23:47:33,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (-50.35) for latency MM1Queue_a033_s075
2025-09-11 23:47:33,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 19 hours, 57 minutes, 23 seconds)
2025-09-11 23:59:45,960 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:59:45,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:00:36,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -78.47504 ± 125.988
2025-09-12 00:00:36,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-3.0322618, -35.112106, -227.99234, -399.1258, -19.533075, -70.959854, -45.888958, 8.811729, 12.888566, -4.8062673]
2025-09-12 00:00:36,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [16.0, 136.0, 264.0, 1000.0, 33.0, 110.0, 119.0, 35.0, 25.0, 34.0]
2025-09-12 00:00:36,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 19 hours, 47 minutes, 48 seconds)
2025-09-12 00:12:12,305 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:12:12,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:13:23,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -97.06837 ± 145.110
2025-09-12 00:13:23,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-54.21389, -6.837102, -357.0249, -55.648922, -20.45986, 5.553985, -21.326025, -40.266846, -410.63232, -9.827757]
2025-09-12 00:13:23,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [110.0, 13.0, 1000.0, 131.0, 46.0, 84.0, 53.0, 45.0, 1000.0, 36.0]
2025-09-12 00:13:23,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 19 hours, 59 minutes, 37 seconds)
2025-09-12 00:26:02,817 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:26:02,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:26:44,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -43.07463 ± 113.411
2025-09-12 00:26:44,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-21.026333, 36.615475, -58.288776, 38.14278, 8.501944, -28.18948, -8.856563, -29.975357, 4.720048, -372.39]
2025-09-12 00:26:44,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 27.0, 35.0, 68.0, 33.0, 114.0, 27.0, 46.0, 53.0, 1000.0]
2025-09-12 00:26:44,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (-43.07) for latency MM1Queue_a033_s075
2025-09-12 00:26:44,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 19 hours, 53 minutes, 39 seconds)
2025-09-12 00:37:52,272 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:37:52,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:38:41,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -61.89733 ± 88.376
2025-09-12 00:38:41,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-39.6157, -52.045868, -115.30243, 2.9202642, -38.19793, -309.63916, -30.683542, -16.772017, -16.303856, -3.3330312]
2025-09-12 00:38:41,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [117.0, 67.0, 210.0, 24.0, 180.0, 1000.0, 23.0, 25.0, 32.0, 27.0]
2025-09-12 00:38:41,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 19 hours, 6 minutes, 54 seconds)
2025-09-12 00:50:37,855 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:50:37,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:50:47,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -3.29829 ± 12.731
2025-09-12 00:50:47,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-14.581398, 13.562213, -5.479041, 11.524352, -25.819542, 1.6565709, -4.91699, 0.7764895, 9.790351, -19.495918]
2025-09-12 00:50:47,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [21.0, 48.0, 12.0, 40.0, 105.0, 35.0, 16.0, 13.0, 37.0, 27.0]
2025-09-12 00:50:47,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (-3.30) for latency MM1Queue_a033_s075
2025-09-12 00:50:47,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 18 hours, 58 minutes, 3 seconds)
2025-09-12 01:02:46,400 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:02:46,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:02:58,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: -9.31009 ± 29.354
2025-09-12 01:02:58,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-1.9800425, 6.0076613, 5.5610585, -6.896214, -90.9784, -13.469619, -0.603581, 26.644169, -3.9602876, -13.425683]
2025-09-12 01:02:58,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [16.0, 66.0, 18.0, 18.0, 136.0, 23.0, 22.0, 86.0, 39.0, 31.0]
2025-09-12 01:02:58,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 18 hours, 30 minutes, 14 seconds)
2025-09-12 01:15:29,673 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:15:29,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:15:39,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 2.88013 ± 13.258
2025-09-12 01:15:39,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [2.6599274, 0.52378654, 3.6904712, 5.548149, -14.340132, -5.336684, 5.802046, -17.965141, 19.876863, 28.342009]
2025-09-12 01:15:39,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [10.0, 29.0, 17.0, 13.0, 42.0, 16.0, 17.0, 50.0, 27.0, 133.0]
2025-09-12 01:15:39,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (2.88) for latency MM1Queue_a033_s075
2025-09-12 01:15:39,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 18 hours, 15 minutes, 48 seconds)
2025-09-12 01:27:57,008 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:27:57,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:28:06,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 2.09476 ± 40.034
2025-09-12 01:28:06,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-7.311602, -99.07646, -3.542757, 2.9353294, 1.6949186, 72.497665, 4.6281385, 9.340118, 16.585957, 23.196262]
2025-09-12 01:28:06,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [18.0, 152.0, 11.0, 12.0, 11.0, 61.0, 29.0, 15.0, 27.0, 23.0]
2025-09-12 01:28:06,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 17 hours, 48 minutes)
2025-09-12 01:39:11,166 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:39:11,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:39:21,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 13.03231 ± 14.097
2025-09-12 01:39:21,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [12.719923, -3.1236925, 33.80478, 5.3757725, 31.160557, 27.533016, 16.203268, 8.996924, -12.728717, 10.38127]
2025-09-12 01:39:21,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [27.0, 115.0, 40.0, 13.0, 38.0, 22.0, 37.0, 19.0, 34.0, 22.0]
2025-09-12 01:39:21,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (13.03) for latency MM1Queue_a033_s075
2025-09-12 01:39:21,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 17 hours, 23 minutes, 24 seconds)
2025-09-12 01:51:37,162 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:51:37,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:52:15,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 1.39920 ± 38.956
2025-09-12 01:52:15,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [18.540632, 21.846079, 52.488262, 0.91726696, -3.6984828, 11.5463085, 8.214009, -11.74308, -104.221535, 20.102537]
2025-09-12 01:52:15,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [44.0, 49.0, 52.0, 18.0, 50.0, 25.0, 17.0, 15.0, 1000.0, 34.0]
2025-09-12 01:52:15,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 17 hours, 24 minutes, 49 seconds)
2025-09-12 02:03:52,579 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:03:52,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:03:59,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 3.20497 ± 13.140
2025-09-12 02:03:59,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-12.82312, 13.4982395, 24.505917, 11.5124235, 12.471968, -3.6391785, 2.0123522, -23.043474, 6.713655, 0.8408762]
2025-09-12 02:03:59,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 17.0, 62.0, 30.0, 25.0, 22.0, 16.0, 28.0, 12.0, 11.0]
2025-09-12 02:03:59,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 17 hours, 5 minutes, 2 seconds)
2025-09-12 02:16:02,172 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:16:02,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:16:12,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 16.95062 ± 27.590
2025-09-12 02:16:12,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [75.38424, 3.1744058, 52.8056, -7.1215935, 43.04799, 6.81632, -3.196262, -5.8110805, 2.4458032, 1.960751]
2025-09-12 02:16:12,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [39.0, 17.0, 81.0, 8.0, 79.0, 35.0, 43.0, 16.0, 43.0, 18.0]
2025-09-12 02:16:12,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (16.95) for latency MM1Queue_a033_s075
2025-09-12 02:16:12,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 16 hours, 45 minutes, 4 seconds)
2025-09-12 02:28:01,624 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:28:01,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:28:10,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 3.28066 ± 10.831
2025-09-12 02:28:10,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-5.2120705, 8.595069, 28.51902, 12.512564, 1.4790349, -12.11761, -2.9144075, -4.0256634, -0.040262427, 6.010948]
2025-09-12 02:28:10,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [13.0, 31.0, 41.0, 42.0, 28.0, 69.0, 22.0, 16.0, 17.0, 34.0]
2025-09-12 02:28:10,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 16 hours, 24 minutes, 51 seconds)
2025-09-12 02:39:56,830 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:39:56,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:40:09,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 15.74355 ± 30.873
2025-09-12 02:40:09,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [7.6985545, 102.433495, -4.835625, -15.69796, 9.278763, 19.266077, -4.2898154, 16.765694, 7.405225, 19.41104]
2025-09-12 02:40:09,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 89.0, 53.0, 25.0, 32.0, 56.0, 25.0, 20.0, 118.0, 20.0]
2025-09-12 02:40:09,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 16 hours, 25 minutes, 6 seconds)
2025-09-12 02:51:53,573 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:51:53,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:52:27,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 10.13602 ± 19.174
2025-09-12 02:52:27,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [2.8988957, 0.599386, 9.973455, -1.1129481, 9.079289, 64.71146, 13.160852, 10.558231, -2.944059, -5.564343]
2025-09-12 02:52:27,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [23.0, 46.0, 15.0, 36.0, 14.0, 1000.0, 14.0, 28.0, 21.0, 11.0]
2025-09-12 02:52:27,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 16 hours, 3 minutes, 11 seconds)
2025-09-12 03:04:05,425 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:04:05,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:04:17,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 28.51376 ± 32.390
2025-09-12 03:04:17,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [9.820496, 26.30883, -2.3304455, 6.683263, 0.23565985, 40.34948, 59.034634, -11.904621, 68.16155, 88.778755]
2025-09-12 03:04:17,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [46.0, 43.0, 44.0, 20.0, 9.0, 32.0, 24.0, 17.0, 106.0, 97.0]
2025-09-12 03:04:17,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (28.51) for latency MM1Queue_a033_s075
2025-09-12 03:04:17,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 15 hours, 52 minutes, 40 seconds)
2025-09-12 03:15:59,100 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:15:59,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:16:05,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 13.54988 ± 7.167
2025-09-12 03:16:05,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [12.568397, 5.2147484, 12.04187, 13.600438, 18.868706, 12.902075, 1.9487616, 26.736294, 22.651056, 8.966488]
2025-09-12 03:16:05,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [17.0, 39.0, 16.0, 18.0, 25.0, 21.0, 18.0, 28.0, 27.0, 19.0]
2025-09-12 03:16:05,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 15 hours, 34 minutes, 7 seconds)
2025-09-12 03:27:51,957 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:27:51,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:28:29,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 10.81381 ± 10.701
2025-09-12 03:28:29,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [21.810726, 16.994223, -9.37943, 11.428715, -0.039081164, 6.1512403, 22.847448, 10.839815, 25.498173, 1.986302]
2025-09-12 03:28:29,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 27.0, 49.0, 17.0, 22.0, 28.0, 15.0, 36.0, 32.0, 122.0]
2025-09-12 03:28:29,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 15 hours, 28 minutes, 56 seconds)
2025-09-12 03:40:42,572 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:40:42,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:40:53,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 24.37916 ± 16.546
2025-09-12 03:40:53,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [44.26529, 3.0006082, 54.3823, 24.55758, 21.475683, 37.935703, 19.443872, 15.594769, -1.9045395, 25.040329]
2025-09-12 03:40:53,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 43.0, 44.0, 89.0, 23.0, 33.0, 45.0, 32.0, 15.0, 47.0]
2025-09-12 03:40:53,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 15 hours, 23 minutes, 12 seconds)
2025-09-12 03:52:06,598 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:52:06,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:52:48,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 22.36085 ± 33.429
2025-09-12 03:52:48,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [8.409416, 1.5548403, 32.577694, 26.796719, 3.2433398, 12.921877, 14.1101265, 8.463888, 117.76922, -2.2386239]
2025-09-12 03:52:48,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [15.0, 80.0, 56.0, 76.0, 38.0, 24.0, 44.0, 128.0, 1000.0, 14.0]
2025-09-12 03:52:48,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 15 hours, 5 minutes, 19 seconds)
2025-09-12 04:04:55,143 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:04:55,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:05:05,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 22.23280 ± 23.444
2025-09-12 04:05:05,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [27.408325, 0.471008, 22.331131, -2.2389607, 49.337288, 10.311813, 74.39223, 32.747097, 1.7252009, 5.842852]
2025-09-12 04:05:05,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [55.0, 13.0, 25.0, 11.0, 54.0, 53.0, 43.0, 32.0, 12.0, 79.0]
2025-09-12 04:05:05,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 14 hours, 59 minutes, 49 seconds)
2025-09-12 04:17:36,590 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:17:36,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:18:12,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 17.16650 ± 16.755
2025-09-12 04:18:12,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [16.757196, 30.947796, 14.8019, 25.288862, 10.542476, 29.286797, 9.466453, -21.753395, 44.58002, 11.746891]
2025-09-12 04:18:12,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [55.0, 19.0, 30.0, 21.0, 23.0, 1000.0, 30.0, 40.0, 25.0, 13.0]
2025-09-12 04:18:12,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 15 hours, 6 minutes, 56 seconds)
2025-09-12 04:28:44,993 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:28:45,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:29:00,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 33.81366 ± 22.490
2025-09-12 04:29:00,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-8.037422, 53.407787, 33.418186, 15.231447, 41.29386, 40.71087, 10.452887, 43.583622, 76.55936, 31.515963]
2025-09-12 04:29:00,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [31.0, 113.0, 83.0, 28.0, 60.0, 72.0, 36.0, 24.0, 77.0, 36.0]
2025-09-12 04:29:00,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (33.81) for latency MM1Queue_a033_s075
2025-09-12 04:29:00,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 14 hours, 31 minutes, 28 seconds)
2025-09-12 04:40:39,984 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:40:39,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:41:22,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 12.93954 ± 26.088
2025-09-12 04:41:22,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [22.982185, 20.514748, 32.69646, 11.997703, -58.65303, 10.101334, 9.558503, 12.006445, 23.712599, 44.478413]
2025-09-12 04:41:22,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [37.0, 29.0, 36.0, 14.0, 194.0, 50.0, 12.0, 113.0, 46.0, 1000.0]
2025-09-12 04:41:22,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 14 hours, 18 minutes, 44 seconds)
2025-09-12 04:53:10,513 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:53:10,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:53:56,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 62.94201 ± 36.667
2025-09-12 04:53:56,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [54.445713, 54.544727, 144.21158, 65.551476, 48.99969, 98.10272, 28.292234, 48.21994, 82.82172, 4.2302675]
2025-09-12 04:53:56,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [49.0, 85.0, 110.0, 134.0, 58.0, 80.0, 41.0, 57.0, 1000.0, 13.0]
2025-09-12 04:53:56,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (62.94) for latency MM1Queue_a033_s075
2025-09-12 04:53:56,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 14 hours, 15 minutes, 47 seconds)
2025-09-12 05:05:29,648 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:05:29,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:05:40,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 23.29974 ± 15.835
2025-09-12 05:05:40,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [10.45736, 32.434258, 10.228183, 52.959503, 33.50706, 33.585796, 18.764809, 32.27982, 13.931044, -5.150388]
2025-09-12 05:05:40,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [37.0, 39.0, 14.0, 38.0, 46.0, 122.0, 36.0, 28.0, 28.0, 14.0]
2025-09-12 05:05:40,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 13 hours, 56 minutes, 6 seconds)
2025-09-12 05:17:26,379 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:17:26,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:17:42,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 38.71152 ± 56.254
2025-09-12 05:17:42,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [29.251759, -20.591805, 43.49313, 14.011052, 175.11307, 24.305437, 108.30947, -0.95311713, 16.459414, -2.2831774]
2025-09-12 05:17:42,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [27.0, 122.0, 29.0, 29.0, 123.0, 85.0, 127.0, 26.0, 18.0, 14.0]
2025-09-12 05:17:42,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 13 hours, 29 minutes, 18 seconds)
2025-09-12 05:30:02,579 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:30:02,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:30:14,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 16.68960 ± 27.515
2025-09-12 05:30:14,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [34.23247, 21.98952, -13.121467, 69.03599, -5.012918, 54.594433, 21.306835, -0.36256814, -17.338423, 1.572137]
2025-09-12 05:30:14,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [51.0, 17.0, 37.0, 62.0, 27.0, 67.0, 52.0, 53.0, 25.0, 31.0]
2025-09-12 05:30:14,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 13 hours, 40 minutes, 25 seconds)
2025-09-12 05:41:29,460 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:41:29,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:41:41,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 23.78113 ± 16.682
2025-09-12 05:41:41,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [46.35968, 35.408646, 13.978898, 10.40386, 6.5082006, -0.016353806, 28.342793, 47.967144, 38.423462, 10.434968]
2025-09-12 05:41:41,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [65.0, 57.0, 71.0, 40.0, 17.0, 13.0, 32.0, 69.0, 43.0, 28.0]
2025-09-12 05:41:41,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 13 hours, 16 minutes, 10 seconds)
2025-09-12 05:53:32,427 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:53:32,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:53:41,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 23.67003 ± 19.006
2025-09-12 05:53:41,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [12.551136, 24.742464, 29.420784, 20.739124, 9.67037, 9.721321, 0.934874, 41.787056, 16.934713, 70.1985]
2025-09-12 05:53:41,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [18.0, 26.0, 52.0, 39.0, 24.0, 22.0, 15.0, 71.0, 15.0, 56.0]
2025-09-12 05:53:41,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 12 hours, 56 minutes, 50 seconds)
2025-09-12 06:05:23,075 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:05:23,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:06:35,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 55.28929 ± 54.707
2025-09-12 06:06:35,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [75.8048, 34.307907, 0.53963566, 194.32693, 81.57634, -11.978789, 43.043976, 71.59725, 33.70496, 29.969872]
2025-09-12 06:06:35,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [80.0, 99.0, 14.0, 1000.0, 157.0, 1000.0, 41.0, 83.0, 47.0, 49.0]
2025-09-12 06:06:35,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 12 hours, 59 minutes, 41 seconds)
2025-09-12 06:18:33,821 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:18:33,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:19:17,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 51.15416 ± 45.817
2025-09-12 06:19:17,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [22.15289, 122.25561, 122.71997, -8.98138, 16.993927, 66.87252, 36.516212, 6.1565104, 98.090706, 28.764648]
2025-09-12 06:19:17,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [29.0, 1000.0, 153.0, 24.0, 30.0, 93.0, 80.0, 12.0, 65.0, 91.0]
2025-09-12 06:19:17,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 12 hours, 55 minutes, 57 seconds)
2025-09-12 06:30:48,817 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:30:48,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:31:24,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 33.03514 ± 31.475
2025-09-12 06:31:24,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [17.065756, 51.893803, 19.37661, -8.311115, 91.252594, 78.88924, 48.636192, 3.4526067, 19.912071, 8.183651]
2025-09-12 06:31:24,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [49.0, 37.0, 33.0, 30.0, 41.0, 1000.0, 40.0, 33.0, 29.0, 20.0]
2025-09-12 06:31:25,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 12 hours, 38 minutes, 39 seconds)
2025-09-12 06:43:04,326 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:43:04,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:43:14,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 35.61257 ± 31.851
2025-09-12 06:43:14,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [82.28588, 18.07019, 38.630512, 26.31923, 7.93095, 105.16584, 10.045188, 43.109623, 21.255953, 3.3123012]
2025-09-12 06:43:14,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 16.0, 33.0, 21.0, 33.0, 83.0, 29.0, 71.0, 36.0, 21.0]
2025-09-12 06:43:14,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 12 hours, 30 minutes, 58 seconds)
2025-09-12 06:55:01,729 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:55:01,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:55:22,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 78.31062 ± 94.865
2025-09-12 06:55:22,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [123.17397, 35.005074, 77.50042, 32.22254, 78.37013, 52.560814, 342.60733, 39.866856, -0.19431765, 1.9933512]
2025-09-12 06:55:22,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [136.0, 43.0, 35.0, 34.0, 54.0, 31.0, 347.0, 46.0, 17.0, 23.0]
2025-09-12 06:55:22,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (78.31) for latency MM1Queue_a033_s075
2025-09-12 06:55:22,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 12 hours, 20 minutes, 15 seconds)
2025-09-12 07:07:48,876 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:07:48,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:09:31,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 82.25326 ± 77.808
2025-09-12 07:09:31,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [6.447888, 289.34164, 75.49623, 20.03313, 124.50359, 52.372875, 11.385366, 67.14796, 88.20505, 87.59889]
2025-09-12 07:09:31,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [23.0, 293.0, 40.0, 1000.0, 1000.0, 77.0, 26.0, 119.0, 65.0, 1000.0]
2025-09-12 07:09:31,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (82.25) for latency MM1Queue_a033_s075
2025-09-12 07:09:31,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 12 hours, 22 minutes, 43 seconds)
2025-09-12 07:20:33,240 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:20:33,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:20:47,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 41.45429 ± 23.124
2025-09-12 07:20:47,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [56.48273, -0.39417458, 29.204273, 40.329502, 71.8321, 71.15194, 9.376447, 46.076443, 58.18316, 32.300472]
2025-09-12 07:20:47,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 47.0, 31.0, 62.0, 54.0, 66.0, 20.0, 61.0, 123.0, 46.0]
2025-09-12 07:20:47,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 11 hours, 53 minutes, 24 seconds)
2025-09-12 07:32:30,417 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:32:30,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:32:50,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 65.02486 ± 47.921
2025-09-12 07:32:50,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [61.23903, 148.38158, 77.81992, 14.244178, 94.50549, 9.681442, 7.847974, 48.748013, 48.466312, 139.31474]
2025-09-12 07:32:50,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [103.0, 140.0, 36.0, 21.0, 99.0, 37.0, 16.0, 115.0, 65.0, 123.0]
2025-09-12 07:32:50,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 11 hours, 40 minutes, 17 seconds)
2025-09-12 07:45:31,470 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:45:31,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:46:22,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 92.11568 ± 43.310
2025-09-12 07:46:22,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [57.534023, 97.027855, 105.61777, 52.81438, 62.203644, 95.98843, 141.55276, 174.27623, 19.892271, 114.24946]
2025-09-12 07:46:22,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [71.0, 52.0, 130.0, 1000.0, 82.0, 78.0, 125.0, 150.0, 23.0, 116.0]
2025-09-12 07:46:22,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (92.12) for latency MM1Queue_a033_s075
2025-09-12 07:46:22,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 11 hours, 46 minutes, 59 seconds)
2025-09-12 07:57:17,175 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:57:17,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:58:54,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 59.17444 ± 50.437
2025-09-12 07:58:54,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [22.314243, 77.62837, -14.665169, 138.64735, 12.329979, 120.2732, 47.355534, 85.42307, 101.66162, 0.7761684]
2025-09-12 07:58:54,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [23.0, 78.0, 106.0, 175.0, 14.0, 1000.0, 36.0, 1000.0, 1000.0, 40.0]
2025-09-12 07:58:54,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 11 hours, 38 minutes, 44 seconds)
2025-09-12 08:11:02,499 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:11:02,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:11:26,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 72.10342 ± 55.652
2025-09-12 08:11:26,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [75.640884, 76.40165, 58.73815, 32.12347, 107.796326, 220.0766, 46.300377, 47.53602, 7.34091, 49.07991]
2025-09-12 08:11:26,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [56.0, 80.0, 58.0, 22.0, 90.0, 201.0, 34.0, 133.0, 94.0, 85.0]
2025-09-12 08:11:26,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 11 hours, 8 minutes, 32 seconds)
2025-09-12 08:23:46,287 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:23:46,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:24:59,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 77.19944 ± 49.191
2025-09-12 08:24:59,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [11.163868, 53.33438, 24.047066, 46.490417, 78.55339, 43.878956, 170.58208, 84.91639, 137.68094, 121.346924]
2025-09-12 08:24:59,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [20.0, 42.0, 20.0, 27.0, 96.0, 151.0, 1000.0, 1000.0, 128.0, 75.0]
2025-09-12 08:24:59,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 11 hours, 20 minutes, 23 seconds)
2025-09-12 08:35:50,326 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:35:50,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:36:12,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 89.14484 ± 70.156
2025-09-12 08:36:12,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [173.97752, 153.7541, 53.09582, 42.696857, 26.49398, 128.1279, 80.38965, 4.5373316, 215.32407, 13.051181]
2025-09-12 08:36:12,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [139.0, 119.0, 53.0, 40.0, 23.0, 147.0, 82.0, 59.0, 138.0, 15.0]
2025-09-12 08:36:12,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 10 hours, 58 minutes, 59 seconds)
2025-09-12 08:48:49,072 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:48:49,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:49:43,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 86.23892 ± 65.136
2025-09-12 08:49:43,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [116.21488, 175.20076, 17.2271, 39.97314, 59.329758, 2.722722, 218.58768, 105.05039, 72.90267, 55.1801]
2025-09-12 08:49:43,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [163.0, 129.0, 1000.0, 61.0, 105.0, 15.0, 193.0, 93.0, 94.0, 56.0]
2025-09-12 08:49:43,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 10 hours, 46 minutes, 9 seconds)
2025-09-12 09:01:17,805 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:01:17,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:02:55,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 54.85989 ± 49.665
2025-09-12 09:02:55,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [4.6626115, 23.592157, 169.44376, 22.198046, 54.904762, 59.20527, 31.698904, 115.74404, 63.60858, 3.5408094]
2025-09-12 09:02:55,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [18.0, 1000.0, 127.0, 17.0, 1000.0, 40.0, 1000.0, 58.0, 76.0, 69.0]
2025-09-12 09:02:55,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 10 hours, 40 minutes, 13 seconds)
2025-09-12 09:14:05,520 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:14:05,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:15:47,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 82.11350 ± 63.275
2025-09-12 09:15:47,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [70.561775, 66.99978, 84.89707, 20.541225, 8.468064, 30.346231, 153.16786, 143.78241, 210.25789, 32.11274]
2025-09-12 09:15:47,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 42.0, 74.0, 19.0, 1000.0, 62.0, 201.0, 119.0, 1000.0, 30.0]
2025-09-12 09:15:47,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 10 hours, 30 minutes, 43 seconds)
2025-09-12 09:28:11,933 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:28:11,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:29:02,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 108.46682 ± 93.832
2025-09-12 09:29:02,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [20.86732, 304.3278, 80.37851, 189.69547, 33.089455, 67.0839, 25.489817, 139.69954, 210.35953, 13.676883]
2025-09-12 09:29:02,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [56.0, 224.0, 58.0, 166.0, 31.0, 62.0, 26.0, 125.0, 1000.0, 91.0]
2025-09-12 09:29:02,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (108.47) for latency MM1Queue_a033_s075
2025-09-12 09:29:02,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 10 hours, 14 minutes, 55 seconds)
2025-09-12 09:40:10,423 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:40:10,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:40:57,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 61.83128 ± 42.317
2025-09-12 09:40:57,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [70.66968, 81.948105, 129.18034, 32.532265, 100.679184, 12.291424, 55.499775, 114.246925, 14.890792, 6.374291]
2025-09-12 09:40:57,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [73.0, 155.0, 124.0, 44.0, 102.0, 41.0, 1000.0, 118.0, 25.0, 13.0]
2025-09-12 09:40:57,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 10 hours, 8 minutes, 34 seconds)
2025-09-12 09:52:41,323 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:52:41,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:53:33,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 110.25752 ± 89.753
2025-09-12 09:53:33,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [14.349182, 52.141647, 309.11578, 159.97049, 90.84198, 88.54945, 76.72091, 16.774845, 226.25348, 67.85739]
2025-09-12 09:53:33,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 43.0, 244.0, 113.0, 105.0, 61.0, 68.0, 58.0, 168.0, 1000.0]
2025-09-12 09:53:33,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (110.26) for latency MM1Queue_a033_s075
2025-09-12 09:53:33,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 9 hours, 47 minutes, 18 seconds)
2025-09-12 10:05:17,295 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:05:17,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:06:04,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 54.73675 ± 44.026
2025-09-12 10:06:04,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [127.37479, 144.88539, 24.881237, 27.713198, 17.42294, 27.199835, 74.26632, 32.820488, 52.69154, 18.111742]
2025-09-12 10:06:04,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [121.0, 110.0, 38.0, 124.0, 56.0, 18.0, 70.0, 1000.0, 115.0, 28.0]
2025-09-12 10:06:04,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 9 hours, 28 minutes, 20 seconds)
2025-09-12 10:17:53,799 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:17:53,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:18:17,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 75.22593 ± 59.795
2025-09-12 10:18:17,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [123.17449, 193.59103, 19.04111, 122.230606, 127.62538, 8.796056, 28.114319, 62.741142, 56.476246, 10.468956]
2025-09-12 10:18:17,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [128.0, 202.0, 31.0, 206.0, 95.0, 22.0, 21.0, 53.0, 88.0, 18.0]
2025-09-12 10:18:17,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 9 hours, 10 minutes, 1 second)
2025-09-12 10:30:47,435 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:30:47,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:31:13,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 94.34798 ± 61.315
2025-09-12 10:31:13,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [60.45309, 189.4168, 81.53212, 61.811703, 73.83982, -1.779025, 34.674053, 103.52252, 143.25249, 196.75629]
2025-09-12 10:31:13,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [135.0, 145.0, 59.0, 46.0, 71.0, 16.0, 60.0, 140.0, 155.0, 127.0]
2025-09-12 10:31:13,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 8 hours, 54 minutes, 47 seconds)
2025-09-12 10:42:10,101 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:42:10,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:43:00,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 197.36993 ± 127.019
2025-09-12 10:43:00,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [272.48395, 228.16238, 269.53726, 137.60785, 20.05543, 286.25516, -6.932094, 83.83721, 413.93814, 268.75412]
2025-09-12 10:43:00,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [212.0, 225.0, 208.0, 212.0, 38.0, 222.0, 32.0, 81.0, 351.0, 232.0]
2025-09-12 10:43:00,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (197.37) for latency MM1Queue_a033_s075
2025-09-12 10:43:00,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 8 hours, 41 minutes, 12 seconds)
2025-09-12 10:55:46,493 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:55:46,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:57:03,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 87.61682 ± 74.288
2025-09-12 10:57:03,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [21.720055, 138.48048, 43.157135, 46.862507, 24.07065, 78.24185, 67.88955, 24.872074, 173.65208, 257.2218]
2025-09-12 10:57:03,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 111.0, 50.0, 120.0, 25.0, 1000.0, 79.0, 25.0, 1000.0, 234.0]
2025-09-12 10:57:03,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 40 minutes, 46 seconds)
2025-09-12 11:07:51,203 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:07:51,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:09:05,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 71.01576 ± 59.937
2025-09-12 11:09:05,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [77.66491, 114.77632, 184.16629, -28.255533, 87.908554, 17.045687, 16.739607, 24.067654, 118.52335, 97.52076]
2025-09-12 11:09:05,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [124.0, 141.0, 1000.0, 1000.0, 45.0, 16.0, 14.0, 32.0, 134.0, 90.0]
2025-09-12 11:09:05,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 8 hours, 24 minutes, 5 seconds)
2025-09-12 11:21:07,716 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:21:07,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:21:28,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 93.70653 ± 71.734
2025-09-12 11:21:28,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [1.6290237, 200.01752, 142.03146, 18.630934, 37.49535, 125.28684, 118.88998, 207.07968, 21.567064, 64.43742]
2025-09-12 11:21:28,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [24.0, 154.0, 79.0, 28.0, 31.0, 116.0, 80.0, 197.0, 16.0, 45.0]
2025-09-12 11:21:28,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 12 minutes, 46 seconds)
2025-09-12 11:34:03,461 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:34:03,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:34:54,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 79.56619 ± 71.386
2025-09-12 11:34:54,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [127.85095, 116.86163, 25.212631, 141.3628, 231.30171, 89.72199, 25.208267, 14.539942, 33.82833, -10.226315]
2025-09-12 11:34:54,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [105.0, 87.0, 55.0, 95.0, 191.0, 96.0, 35.0, 76.0, 80.0, 1000.0]
2025-09-12 11:34:54,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 8 hours, 3 minutes, 58 seconds)
2025-09-12 11:45:44,586 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:45:44,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:47:30,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 62.88059 ± 76.496
2025-09-12 11:47:30,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [4.668841, 15.970619, 21.186195, -33.672188, 206.97144, 108.03533, 33.260853, 35.1198, 193.47018, 43.794777]
2025-09-12 11:47:30,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 53.0, 1000.0, 1000.0, 207.0, 117.0, 44.0, 35.0, 282.0, 61.0]
2025-09-12 11:47:30,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 7 hours, 57 minutes, 22 seconds)
2025-09-12 11:59:36,419 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:59:36,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:00:23,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 86.12106 ± 67.920
2025-09-12 12:00:23,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [104.55518, 121.925575, 213.89107, 19.737644, -0.20289451, 47.744106, 129.34799, 166.23114, 24.57961, 33.401268]
2025-09-12 12:00:23,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [90.0, 102.0, 138.0, 18.0, 10.0, 1000.0, 101.0, 172.0, 35.0, 42.0]
2025-09-12 12:00:23,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 36 minutes)
2025-09-12 12:11:44,300 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:11:44,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:11:59,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 57.62179 ± 44.721
2025-09-12 12:11:59,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [40.435284, 37.130642, 154.28209, 33.145622, 24.271729, 4.5544205, 65.003784, 16.697338, 96.26877, 104.42821]
2025-09-12 12:11:59,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [50.0, 32.0, 139.0, 51.0, 37.0, 15.0, 57.0, 25.0, 96.0, 54.0]
2025-09-12 12:11:59,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 20 minutes, 20 seconds)
2025-09-12 12:23:43,251 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:23:43,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:24:29,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 96.00388 ± 77.490
2025-09-12 12:24:29,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [107.859245, 35.322113, 79.67064, 184.62047, 85.09275, 19.578245, 37.750942, -4.4294505, 163.14427, 251.42949]
2025-09-12 12:24:29,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [81.0, 19.0, 52.0, 177.0, 1000.0, 28.0, 45.0, 14.0, 117.0, 155.0]
2025-09-12 12:24:29,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 8 minutes, 31 seconds)
2025-09-12 12:36:50,091 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:36:50,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:37:37,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 84.44908 ± 62.120
2025-09-12 12:37:37,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [91.82434, 60.85573, 13.977127, 210.94568, 4.079787, 50.944767, 64.95643, 169.14227, 119.18564, 58.579113]
2025-09-12 12:37:37,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [59.0, 57.0, 54.0, 163.0, 25.0, 56.0, 1000.0, 98.0, 126.0, 61.0]
2025-09-12 12:37:37,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 6 hours, 53 minutes, 56 seconds)
2025-09-12 12:48:49,504 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:48:49,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:49:44,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 110.66063 ± 64.308
2025-09-12 12:49:44,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [186.99467, 112.433685, 151.49677, 182.37788, 21.796299, -1.0931236, 189.50575, 100.13102, 93.684616, 69.278786]
2025-09-12 12:49:44,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [185.0, 46.0, 175.0, 158.0, 60.0, 25.0, 137.0, 1000.0, 106.0, 72.0]
2025-09-12 12:49:44,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 6 hours, 38 minutes, 15 seconds)
2025-09-12 13:02:06,782 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:02:06,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:03:52,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 131.38661 ± 77.834
2025-09-12 13:03:52,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [272.23245, 9.689786, 72.93283, 72.4553, 142.13968, 99.505974, 133.5201, 158.83351, 257.2815, 95.275055]
2025-09-12 13:03:52,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [165.0, 21.0, 1000.0, 1000.0, 95.0, 106.0, 85.0, 97.0, 230.0, 1000.0]
2025-09-12 13:03:52,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 33 minutes, 30 seconds)
2025-09-12 13:15:03,521 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:15:03,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:15:19,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 73.12931 ± 103.212
2025-09-12 13:15:19,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [56.406025, -3.736238, 13.584554, 7.228405, 95.16559, 220.99394, 313.39224, 5.805818, 1.7861378, 20.666653]
2025-09-12 13:15:19,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [44.0, 54.0, 24.0, 34.0, 48.0, 135.0, 192.0, 16.0, 32.0, 29.0]
2025-09-12 13:15:19,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 20 minutes, 3 seconds)
2025-09-12 13:26:54,744 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:26:54,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:28:04,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 98.89659 ± 108.976
2025-09-12 13:28:04,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [88.19941, -6.771717, 64.88061, 286.40897, 77.48505, 330.82486, 70.97846, 22.014936, 30.35467, 24.590555]
2025-09-12 13:28:04,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [79.0, 1000.0, 72.0, 1000.0, 53.0, 152.0, 48.0, 23.0, 33.0, 26.0]
2025-09-12 13:28:04,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 8 minutes, 47 seconds)
2025-09-12 13:39:56,104 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:39:56,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:40:15,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 83.98927 ± 110.414
2025-09-12 13:40:15,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-4.3232946, 12.372811, 125.16127, 388.25082, 3.427416, 130.35555, 55.18555, 34.38649, 55.522163, 39.55388]
2025-09-12 13:40:15,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [19.0, 49.0, 90.0, 290.0, 14.0, 120.0, 47.0, 30.0, 34.0, 42.0]
2025-09-12 13:40:15,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 5 hours, 50 minutes, 46 seconds)
2025-09-12 13:52:57,682 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:52:57,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:53:28,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 150.86691 ± 107.001
2025-09-12 13:53:28,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [58.319042, 45.44463, 319.01315, 173.54292, 270.40735, 129.42456, 15.896539, 241.2936, 236.76067, 18.566717]
2025-09-12 13:53:28,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [71.0, 27.0, 285.0, 102.0, 164.0, 113.0, 19.0, 136.0, 142.0, 25.0]
2025-09-12 13:53:28,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 44 minutes, 8 seconds)
2025-09-12 14:04:22,807 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:04:22,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:04:44,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 94.04552 ± 84.133
2025-09-12 14:04:44,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-0.38488647, 12.997325, 43.001488, 74.22702, 29.762033, 153.66016, 238.75258, 24.987638, 223.72734, 139.72444]
2025-09-12 14:04:44,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [14.0, 35.0, 42.0, 65.0, 45.0, 88.0, 122.0, 27.0, 187.0, 178.0]
2025-09-12 14:04:44,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 16 minutes, 34 seconds)
2025-09-12 14:16:28,628 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:16:28,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:17:19,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 107.06089 ± 76.497
2025-09-12 14:17:19,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [184.94548, -18.090765, 46.512966, 199.28758, 221.13455, 169.91374, 94.53388, 55.491783, 51.058086, 65.821594]
2025-09-12 14:17:19,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [104.0, 1000.0, 94.0, 115.0, 144.0, 117.0, 54.0, 57.0, 81.0, 55.0]
2025-09-12 14:17:19,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 9 minutes, 57 seconds)
2025-09-12 14:29:08,197 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:29:08,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:30:24,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 108.50967 ± 95.206
2025-09-12 14:30:24,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [59.8075, 141.47163, 220.68608, 37.67399, 4.422071, 250.53282, 62.92872, 31.019938, 258.11646, 18.43746]
2025-09-12 14:30:24,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 150.0, 173.0, 27.0, 26.0, 256.0, 48.0, 31.0, 1000.0, 41.0]
2025-09-12 14:30:24,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 4 hours, 59 minutes, 13 seconds)
2025-09-12 14:42:08,889 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:42:08,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:42:45,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 176.51648 ± 174.433
2025-09-12 14:42:45,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [246.70393, 141.29178, 104.40804, 70.42328, 50.63091, 60.831356, 211.62564, 664.7999, 67.485344, 146.96465]
2025-09-12 14:42:45,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [189.0, 127.0, 77.0, 60.0, 43.0, 55.0, 180.0, 420.0, 62.0, 78.0]
2025-09-12 14:42:45,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 47 minutes, 25 seconds)
2025-09-12 14:54:35,257 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:54:35,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:55:57,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 113.27217 ± 79.839
2025-09-12 14:55:57,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [147.22913, 32.87418, 52.082798, 146.51633, 65.48576, 23.920576, 203.57008, 20.735575, 247.03247, 193.2748]
2025-09-12 14:55:57,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 29.0, 54.0, 138.0, 53.0, 76.0, 134.0, 1000.0, 249.0, 202.0]
2025-09-12 14:55:57,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 34 minutes, 56 seconds)
2025-09-12 15:07:49,977 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:07:49,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:09:06,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 98.70201 ± 126.405
2025-09-12 15:09:06,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [-3.9991899, 308.3118, 268.22696, -59.24949, 18.541567, 7.8668385, 159.95923, 243.639, 35.39084, 8.332519]
2025-09-12 15:09:06,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [23.0, 211.0, 238.0, 1000.0, 29.0, 27.0, 1000.0, 141.0, 68.0, 27.0]
2025-09-12 15:09:06,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 30 minutes, 18 seconds)
2025-09-12 15:21:28,488 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:21:28,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:22:08,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 202.97186 ± 147.371
2025-09-12 15:22:08,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [55.782692, 33.410267, 174.93571, 402.64053, 202.21362, 60.728397, 417.7814, 351.13892, 297.79413, 33.29314]
2025-09-12 15:22:08,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [38.0, 35.0, 127.0, 329.0, 140.0, 69.0, 255.0, 268.0, 164.0, 36.0]
2025-09-12 15:22:08,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (202.97) for latency MM1Queue_a033_s075
2025-09-12 15:22:08,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 19 minutes, 16 seconds)
2025-09-12 15:33:05,458 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:33:05,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:33:45,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 54.42201 ± 63.188
2025-09-12 15:33:45,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [127.11497, 24.31855, 29.076536, 205.33224, -6.7454324, 87.36612, -0.5778339, 25.259823, 20.931904, 32.14325]
2025-09-12 15:33:45,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [85.0, 18.0, 29.0, 158.0, 1000.0, 78.0, 22.0, 34.0, 36.0, 27.0]
2025-09-12 15:33:45,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 42 seconds)
2025-09-12 15:45:16,101 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:45:16,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:46:11,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 137.79076 ± 100.337
2025-09-12 15:46:11,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [28.876139, 248.54889, 312.26602, 40.18194, 209.70465, 203.11359, 64.19151, 189.95476, 46.94684, 34.12342]
2025-09-12 15:46:11,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [25.0, 221.0, 249.0, 48.0, 181.0, 1000.0, 45.0, 125.0, 51.0, 39.0]
2025-09-12 15:46:11,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 48 minutes, 21 seconds)
2025-09-12 15:57:55,312 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:57:55,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:59:32,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 217.49934 ± 133.363
2025-09-12 15:59:32,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [118.29761, 168.15353, 285.4662, 208.54115, 269.50266, 373.42282, 208.67256, 30.95826, 475.10764, 36.871193]
2025-09-12 15:59:32,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [98.0, 180.0, 289.0, 217.0, 1000.0, 462.0, 212.0, 46.0, 1000.0, 25.0]
2025-09-12 15:59:32,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (217.50) for latency MM1Queue_a033_s075
2025-09-12 15:59:32,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 36 minutes, 13 seconds)
2025-09-12 16:11:41,606 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:11:41,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:12:05,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 122.82079 ± 65.683
2025-09-12 16:12:05,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [98.8718, 134.5827, 187.69308, 85.590706, 124.951614, 233.57523, 64.081696, 14.332053, 76.73222, 207.79675]
2025-09-12 16:12:05,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [91.0, 80.0, 170.0, 61.0, 65.0, 176.0, 42.0, 13.0, 66.0, 127.0]
2025-09-12 16:12:05,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 21 minutes, 34 seconds)
2025-09-12 16:23:32,238 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:23:32,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:24:50,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 108.98372 ± 113.216
2025-09-12 16:24:50,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [35.61994, 45.084953, -64.84171, 256.2961, 68.54986, 54.230038, 124.653015, 291.86447, 254.62245, 23.758]
2025-09-12 16:24:50,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [29.0, 51.0, 1000.0, 158.0, 76.0, 1000.0, 77.0, 240.0, 148.0, 28.0]
2025-09-12 16:24:50,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 8 minutes, 4 seconds)
2025-09-12 16:36:33,447 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:36:33,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:36:57,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 110.36101 ± 164.630
2025-09-12 16:36:57,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [81.75646, 592.6207, 32.485706, 24.50413, 97.56557, 123.765114, 61.387627, 66.4327, 11.654269, 11.437798]
2025-09-12 16:36:57,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [107.0, 349.0, 35.0, 32.0, 81.0, 133.0, 58.0, 64.0, 15.0, 14.0]
2025-09-12 16:36:57,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 56 minutes, 56 seconds)
2025-09-12 16:48:49,626 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:48:49,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:49:14,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 124.43840 ± 98.754
2025-09-12 16:49:14,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [350.11313, 115.97541, 40.621414, 46.24133, 203.19992, 19.410309, 167.07758, 92.20858, 183.87848, 25.65794]
2025-09-12 16:49:14,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [290.0, 66.0, 36.0, 48.0, 102.0, 39.0, 110.0, 69.0, 128.0, 25.0]
2025-09-12 16:49:14,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 43 minutes, 55 seconds)
2025-09-12 17:01:30,836 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:01:30,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:03:03,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 174.12222 ± 195.507
2025-09-12 17:03:03,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [565.974, 5.8372154, 60.525463, 93.01231, -68.70279, 288.8516, 147.32361, 146.76381, 32.27717, 469.35986]
2025-09-12 17:03:03,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [392.0, 17.0, 114.0, 77.0, 1000.0, 168.0, 131.0, 159.0, 1000.0, 303.0]
2025-09-12 17:03:03,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 32 minutes, 24 seconds)
2025-09-12 17:14:15,603 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:14:15,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:15:32,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 175.16742 ± 155.892
2025-09-12 17:15:32,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [110.85752, 245.49548, 577.2552, 11.215063, 41.751926, 184.45985, 227.4894, 212.93085, 95.23509, 44.983788]
2025-09-12 17:15:32,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 148.0, 1000.0, 18.0, 40.0, 92.0, 149.0, 178.0, 84.0, 42.0]
2025-09-12 17:15:32,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 19 minutes, 33 seconds)
2025-09-12 17:27:08,103 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:27:08,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:28:12,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 174.28413 ± 167.399
2025-09-12 17:28:12,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [118.62371, -46.81426, 75.45099, 153.76811, 188.16019, 508.23987, 68.623024, 100.97105, 466.75693, 109.0616]
2025-09-12 17:28:12,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [72.0, 1000.0, 51.0, 95.0, 131.0, 347.0, 111.0, 95.0, 351.0, 64.0]
2025-09-12 17:28:12,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 6 minutes, 44 seconds)
2025-09-12 17:39:56,332 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:39:56,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:41:23,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 261.40140 ± 353.346
2025-09-12 17:41:23,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [71.53957, 365.82315, 16.767452, 232.30241, -31.13123, 6.1181126, 263.6073, 201.36687, 1254.2516, 233.36874]
2025-09-12 17:41:23,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [132.0, 281.0, 40.0, 250.0, 1000.0, 16.0, 243.0, 187.0, 789.0, 209.0]
2025-09-12 17:41:23,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (261.40) for latency MM1Queue_a033_s075
2025-09-12 17:41:23,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 55 minutes, 59 seconds)
2025-09-12 17:53:05,772 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:53:05,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:53:37,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 143.07915 ± 122.362
2025-09-12 17:53:37,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [96.01289, 69.585464, 104.784584, 460.43634, 59.126453, 251.53375, 5.5594397, 122.38652, 157.77597, 103.590004]
2025-09-12 17:53:37,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [59.0, 43.0, 66.0, 290.0, 70.0, 233.0, 77.0, 74.0, 114.0, 127.0]
2025-09-12 17:53:37,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 43 minutes)
2025-09-12 18:05:22,925 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:05:22,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:06:41,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 150.19441 ± 147.403
2025-09-12 18:06:41,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [298.02518, 275.56638, 23.569347, 47.66598, 40.152363, 170.13335, 47.04141, 52.59825, 61.428104, 485.7637]
2025-09-12 18:06:41,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [249.0, 278.0, 18.0, 29.0, 47.0, 152.0, 35.0, 42.0, 1000.0, 1000.0]
2025-09-12 18:06:41,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 29 minutes, 6 seconds)
2025-09-12 18:18:25,442 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:18:25,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:19:36,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 231.22791 ± 225.731
2025-09-12 18:19:36,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [87.79066, 327.2844, 52.030083, -13.048837, 660.3759, 226.23334, 392.81396, 46.616276, 531.7475, 0.43583846]
2025-09-12 18:19:36,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [87.0, 231.0, 63.0, 1000.0, 446.0, 138.0, 307.0, 32.0, 262.0, 36.0]
2025-09-12 18:19:36,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 16 minutes, 53 seconds)
2025-09-12 18:31:10,877 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:31:10,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:31:33,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 111.51453 ± 87.329
2025-09-12 18:31:33,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [60.20855, 122.057945, 38.451355, 33.9808, 213.9554, 16.785858, 243.5294, 38.43073, 95.909874, 251.83539]
2025-09-12 18:31:33,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [78.0, 65.0, 30.0, 62.0, 134.0, 26.0, 142.0, 46.0, 89.0, 163.0]
2025-09-12 18:31:33,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 3 minutes, 20 seconds)
2025-09-12 18:43:16,707 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:43:16,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:45:11,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 281.66367 ± 352.952
2025-09-12 18:45:11,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [715.6559, 139.69368, 102.77056, 303.532, 273.26355, 10.585352, 1144.1302, 53.48896, 124.34362, -50.8268]
2025-09-12 18:45:11,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [571.0, 134.0, 1000.0, 236.0, 226.0, 13.0, 806.0, 87.0, 116.0, 1000.0]
2025-09-12 18:45:11,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (281.66) for latency MM1Queue_a033_s075
2025-09-12 18:45:11,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 51 minutes, 2 seconds)
2025-09-12 18:57:56,996 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:57:57,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:58:29,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 154.66806 ± 129.148
2025-09-12 18:58:29,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [141.68895, 228.11829, 316.9309, 47.50991, 370.30612, -15.150493, 103.67179, 29.085457, 283.90005, 40.61975]
2025-09-12 18:58:29,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [80.0, 193.0, 267.0, 36.0, 235.0, 18.0, 83.0, 36.0, 221.0, 29.0]
2025-09-12 18:58:29,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 38 minutes, 55 seconds)
2025-09-12 19:09:38,792 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:09:38,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:10:37,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 151.22684 ± 233.596
2025-09-12 19:10:37,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [69.99083, 7.057936, 576.6057, 38.441723, -1.5942557, 652.8105, 71.589424, 29.037354, 11.808181, 56.520958]
2025-09-12 19:10:37,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [49.0, 14.0, 450.0, 36.0, 13.0, 444.0, 77.0, 1000.0, 23.0, 61.0]
2025-09-12 19:10:37,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 25 minutes, 34 seconds)
2025-09-12 19:21:46,826 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:21:46,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:23:31,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 306.75488 ± 319.655
2025-09-12 19:23:31,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [111.313675, 892.17365, -76.3752, 91.7305, 656.84796, 85.84944, 8.292723, 722.06714, 203.52376, 372.1252]
2025-09-12 19:23:31,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [109.0, 566.0, 1000.0, 67.0, 1000.0, 62.0, 21.0, 582.0, 103.0, 264.0]
2025-09-12 19:23:31,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1226 [INFO]: New best (306.75) for latency MM1Queue_a033_s075
2025-09-12 19:23:31,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 12 minutes, 47 seconds)
2025-09-12 19:35:18,631 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:35:18,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:36:07,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1221 [DEBUG]: Total Reward: 90.08057 ± 145.642
2025-09-12 19:36:07,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1222 [DEBUG]: All rewards: [13.329708, 113.72328, 196.19843, -148.10692, 96.21116, 41.182537, 443.74164, 71.3795, 81.80604, -8.659653]
2025-09-12 19:36:07,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [27.0, 110.0, 150.0, 1000.0, 57.0, 35.0, 233.0, 75.0, 69.0, 31.0]
2025-09-12 19:36:07,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-ant):1251 [DEBUG]: Training session finished
