2025-09-11 22:13:15,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc15-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-09-11 22:13:15,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc15-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-09-11 22:13:15,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x1496602228d0>}
2025-09-11 22:13:15,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1111 [DEBUG]: using device: cuda
2025-09-11 22:13:15,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1133 [INFO]: Creating new trainer
2025-09-11 22:13:15,698 baseline-mbpac-noiseperc15-ant:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-09-11 22:13:15,699 baseline-mbpac-noiseperc15-ant:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 22:13:15,709 baseline-mbpac-noiseperc15-ant:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=27, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=8, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-11 22:13:16,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1194 [DEBUG]: Starting training session...
2025-09-11 22:13:16,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 1/100
2025-09-11 22:23:44,997 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:23:44,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 22:24:56,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: -66.95064 ± 85.946
2025-09-11 22:24:56,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [0.9526428, -221.1, -72.95015, -26.327385, -11.207025, -10.404158, -245.09537, -48.76967, -33.21692, -1.3883146]
2025-09-11 22:24:56,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [99.0, 1000.0, 83.0, 65.0, 48.0, 88.0, 1000.0, 108.0, 61.0, 26.0]
2025-09-11 22:24:56,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (-66.95) for latency MM1Queue_a033_s075
2025-09-11 22:24:56,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 19 hours, 14 minutes, 33 seconds)
2025-09-11 22:36:43,760 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:36:43,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 22:38:03,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 4.18298 ± 40.924
2025-09-11 22:38:03,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [-0.7397354, -3.4609475, -13.122406, 120.70517, 5.7572346, -20.751593, 3.856461, -38.71165, 1.9651159, -13.667844]
2025-09-11 22:38:03,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [130.0, 32.0, 105.0, 1000.0, 70.0, 113.0, 41.0, 1000.0, 18.0, 345.0]
2025-09-11 22:38:03,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (4.18) for latency MM1Queue_a033_s075
2025-09-11 22:38:03,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 20 hours, 13 minutes, 43 seconds)
2025-09-11 22:49:45,529 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:49:45,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 22:51:55,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: -27.20784 ± 33.957
2025-09-11 22:51:55,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [-37.074024, -19.460478, 23.318079, -56.997753, -2.6200995, 4.7341003, -38.30463, -103.64189, -9.470069, -32.56164]
2025-09-11 22:51:55,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [165.0, 262.0, 1000.0, 1000.0, 162.0, 1000.0, 171.0, 436.0, 262.0, 162.0]
2025-09-11 22:51:55,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 20 hours, 49 minutes, 28 seconds)
2025-09-11 23:04:02,544 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:04:02,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:05:10,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 3.76999 ± 52.003
2025-09-11 23:05:10,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [13.824145, -120.1936, 1.19889, 6.33792, 14.387851, -50.910477, 9.853661, 76.00248, 39.595436, 47.603603]
2025-09-11 23:05:10,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [79.0, 293.0, 34.0, 62.0, 151.0, 95.0, 205.0, 1000.0, 310.0, 218.0]
2025-09-11 23:05:10,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 20 hours, 45 minutes, 31 seconds)
2025-09-11 23:16:53,646 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:16:53,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:18:09,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: -0.00052 ± 36.127
2025-09-11 23:18:09,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [52.38314, 10.327322, -11.228709, -31.2752, 32.493397, -78.202446, 14.007355, -0.6036035, 34.819862, -22.726334]
2025-09-11 23:18:09,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [161.0, 266.0, 166.0, 601.0, 1000.0, 126.0, 39.0, 135.0, 95.0, 96.0]
2025-09-11 23:18:09,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 20 hours, 32 minutes, 30 seconds)
2025-09-11 23:29:48,585 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:29:48,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:30:43,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 8.78025 ± 24.874
2025-09-11 23:30:43,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [20.781538, -38.180683, 24.153576, 4.5578723, -12.553232, -12.412347, 25.546453, 26.93118, -3.5748117, 52.552986]
2025-09-11 23:30:43,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [32.0, 34.0, 166.0, 148.0, 275.0, 57.0, 85.0, 126.0, 1000.0, 68.0]
2025-09-11 23:30:43,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (8.78) for latency MM1Queue_a033_s075
2025-09-11 23:30:43,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 20 hours, 36 minutes, 49 seconds)
2025-09-11 23:43:22,131 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:43:22,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:44:11,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 19.16860 ± 35.583
2025-09-11 23:44:11,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [13.117496, 15.675857, 17.362959, -13.740991, 45.890625, 21.760555, -39.142582, -1.666904, 101.88429, 30.544676]
2025-09-11 23:44:11,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [145.0, 36.0, 63.0, 125.0, 70.0, 67.0, 93.0, 158.0, 1000.0, 40.0]
2025-09-11 23:44:11,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (19.17) for latency MM1Queue_a033_s075
2025-09-11 23:44:11,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 20 hours, 30 minutes, 13 seconds)
2025-09-11 23:56:05,048 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:56:05,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:57:49,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 72.36598 ± 69.965
2025-09-11 23:57:49,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [20.05816, 175.29407, 13.507453, 204.3688, 36.846573, 147.27603, 38.026646, 56.40156, 10.60502, 21.275429]
2025-09-11 23:57:49,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [123.0, 1000.0, 42.0, 1000.0, 97.0, 1000.0, 62.0, 250.0, 53.0, 111.0]
2025-09-11 23:57:49,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (72.37) for latency MM1Queue_a033_s075
2025-09-11 23:57:49,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 20 hours, 12 minutes, 32 seconds)
2025-09-12 00:08:56,684 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:08:56,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:09:45,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 50.75059 ± 103.617
2025-09-12 00:09:45,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [29.10585, 18.431395, 4.5779724, 3.5823126, 20.696812, 19.516806, 11.229772, 360.15225, 5.17508, 35.037655]
2025-09-12 00:09:45,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [46.0, 54.0, 107.0, 31.0, 37.0, 113.0, 29.0, 1000.0, 15.0, 342.0]
2025-09-12 00:09:45,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 19 hours, 35 minutes, 20 seconds)
2025-09-12 00:22:35,030 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:22:35,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:24:13,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 113.61224 ± 120.741
2025-09-12 00:24:13,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [269.48215, 343.55957, 20.401249, 50.236576, 24.173235, 23.17012, 17.874565, 269.33527, 71.91008, 45.979595]
2025-09-12 00:24:13,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 43.0, 99.0, 79.0, 75.0, 21.0, 1000.0, 76.0, 70.0]
2025-09-12 00:24:13,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (113.61) for latency MM1Queue_a033_s075
2025-09-12 00:24:13,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 19 hours, 49 minutes, 28 seconds)
2025-09-12 00:35:13,596 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:35:13,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:36:41,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 96.95841 ± 133.165
2025-09-12 00:36:41,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [303.981, 28.072674, 83.04241, 1.4628232, 8.932489, 5.6707077, 34.623688, 395.3455, -8.054685, 116.50754]
2025-09-12 00:36:41,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 243.0, 144.0, 44.0, 10.0, 21.0, 60.0, 1000.0, 308.0, 322.0]
2025-09-12 00:36:41,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 19 hours, 34 minutes, 8 seconds)
2025-09-12 00:48:32,786 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:48:32,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:49:52,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 103.76713 ± 125.332
2025-09-12 00:49:52,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [338.23062, 347.19666, 29.665693, 105.67547, 17.304083, 96.38469, -0.15473402, 11.389275, 89.66128, 2.3182638]
2025-09-12 00:49:52,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 48.0, 294.0, 62.0, 203.0, 16.0, 50.0, 120.0, 23.0]
2025-09-12 00:49:52,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 19 hours, 15 minutes, 52 seconds)
2025-09-12 01:02:01,320 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:02:01,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:03:00,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 68.48360 ± 100.601
2025-09-12 01:03:00,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [337.58246, 11.3111105, 53.024414, 8.609837, 163.79788, 37.03382, 2.7559676, 4.749761, 13.989513, 51.98128]
2025-09-12 01:03:00,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 24.0, 117.0, 69.0, 418.0, 90.0, 27.0, 152.0, 22.0, 160.0]
2025-09-12 01:03:00,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 18 hours, 54 minutes, 4 seconds)
2025-09-12 01:15:00,659 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:15:00,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:15:53,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 65.82377 ± 101.785
2025-09-12 01:15:53,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [17.878275, 50.142315, 11.433188, 70.068695, 17.648415, 4.5895967, 27.821075, 108.1154, 355.2986, -4.757946]
2025-09-12 01:15:53,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [103.0, 377.0, 21.0, 75.0, 65.0, 19.0, 44.0, 153.0, 1000.0, 20.0]
2025-09-12 01:15:53,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 18 hours, 57 minutes, 37 seconds)
2025-09-12 01:27:30,699 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:27:30,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:27:59,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 55.69216 ± 68.350
2025-09-12 01:27:59,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [7.4455457, 82.69406, 3.1367989, 197.06467, -1.1500536, 167.83351, 16.718317, 13.569803, 13.123434, 56.485577]
2025-09-12 01:27:59,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [24.0, 152.0, 26.0, 210.0, 198.0, 265.0, 28.0, 26.0, 29.0, 63.0]
2025-09-12 01:27:59,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 18 hours, 3 minutes, 50 seconds)
2025-09-12 01:40:47,871 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:40:47,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:41:26,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 76.23857 ± 43.069
2025-09-12 01:41:26,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [50.52794, 32.706257, 102.77763, 43.183815, 142.663, 53.620457, 76.27605, 122.37669, 128.46637, 9.787505]
2025-09-12 01:41:26,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [71.0, 21.0, 100.0, 61.0, 616.0, 70.0, 87.0, 122.0, 241.0, 15.0]
2025-09-12 01:41:26,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 18 hours, 7 minutes, 47 seconds)
2025-09-12 01:52:20,237 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:52:20,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:54:00,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 169.43826 ± 170.594
2025-09-12 01:54:00,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [86.87594, 66.606224, 48.417286, 62.626087, 439.45477, 6.6352835, 429.63358, 38.301674, 103.6312, 412.20062]
2025-09-12 01:54:00,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [75.0, 71.0, 57.0, 93.0, 1000.0, 35.0, 1000.0, 56.0, 204.0, 1000.0]
2025-09-12 01:54:00,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (169.44) for latency MM1Queue_a033_s075
2025-09-12 01:54:00,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 17 hours, 44 minutes, 46 seconds)
2025-09-12 02:05:54,599 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:05:54,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:06:45,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 93.54713 ± 111.897
2025-09-12 02:06:45,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [37.87354, 152.74216, 13.012688, 57.457077, 412.92566, 45.73921, 59.47034, 40.491997, 60.05674, 55.70194]
2025-09-12 02:06:45,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [58.0, 220.0, 15.0, 56.0, 1000.0, 114.0, 88.0, 97.0, 137.0, 69.0]
2025-09-12 02:06:45,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 17 hours, 25 minutes, 42 seconds)
2025-09-12 02:18:29,755 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:18:29,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:19:20,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 92.48408 ± 119.431
2025-09-12 02:19:20,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [10.692469, 54.569756, 91.461105, 2.1755347, 419.06442, 177.22572, 2.0020669, 69.42322, 44.811153, 53.415276]
2025-09-12 02:19:20,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 122.0, 124.0, 16.0, 1000.0, 189.0, 9.0, 65.0, 101.0, 166.0]
2025-09-12 02:19:20,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 17 hours, 7 minutes, 43 seconds)
2025-09-12 02:31:54,040 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:31:54,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:32:46,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 83.74217 ± 114.856
2025-09-12 02:32:46,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [12.864033, 58.215946, 32.53076, 172.45819, 38.622448, 17.044268, 30.89052, 392.44373, 102.62755, -20.27582]
2025-09-12 02:32:46,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 69.0, 77.0, 276.0, 75.0, 46.0, 40.0, 1000.0, 170.0, 56.0]
2025-09-12 02:32:46,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 17 hours, 16 minutes, 29 seconds)
2025-09-12 02:45:02,051 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:45:02,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:46:46,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 188.73306 ± 188.387
2025-09-12 02:46:46,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [32.73722, 579.69336, 311.3676, 47.407948, 97.091, 74.73863, 137.81277, 67.118004, 482.77634, 56.587627]
2025-09-12 02:46:46,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [41.0, 1000.0, 1000.0, 89.0, 107.0, 111.0, 239.0, 67.0, 1000.0, 84.0]
2025-09-12 02:46:46,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (188.73) for latency MM1Queue_a033_s075
2025-09-12 02:46:46,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 17 hours, 12 minutes, 12 seconds)
2025-09-12 02:58:09,414 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:58:09,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:00:05,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 253.76501 ± 235.219
2025-09-12 03:00:05,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [423.3194, 77.71789, 454.80298, 808.53046, 305.64612, 186.72568, 69.108536, 73.209755, 27.97158, 110.61734]
2025-09-12 03:00:05,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 59.0, 1000.0, 1000.0, 389.0, 274.0, 109.0, 86.0, 31.0, 143.0]
2025-09-12 03:00:05,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (253.77) for latency MM1Queue_a033_s075
2025-09-12 03:00:05,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 17 hours, 10 minutes, 50 seconds)
2025-09-12 03:11:50,759 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:11:50,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:12:49,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 138.50114 ± 112.081
2025-09-12 03:12:49,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [372.69528, 103.29716, 44.64771, 21.880184, 49.999977, 167.48427, 84.97552, 184.53447, 54.484318, 301.0125]
2025-09-12 03:12:49,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 74.0, 54.0, 36.0, 62.0, 134.0, 115.0, 154.0, 126.0, 367.0]
2025-09-12 03:12:49,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 16 hours, 57 minutes, 21 seconds)
2025-09-12 03:25:10,979 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:25:10,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:26:35,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 131.99915 ± 124.166
2025-09-12 03:26:35,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [90.16742, 53.50079, 351.85043, 243.15866, 43.060596, 1.0013256, 48.868298, 94.71835, 344.80554, 48.86005]
2025-09-12 03:26:35,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [101.0, 137.0, 1000.0, 258.0, 69.0, 23.0, 97.0, 192.0, 1000.0, 96.0]
2025-09-12 03:26:35,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 17 hours, 2 minutes, 7 seconds)
2025-09-12 03:38:52,242 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:38:52,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:41:00,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 206.59631 ± 149.629
2025-09-12 03:41:00,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [348.14175, 19.647472, 383.68088, 58.276173, 393.12924, 167.68457, 218.79318, 370.5341, 103.29248, 2.7832425]
2025-09-12 03:41:00,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 24.0, 1000.0, 77.0, 1000.0, 111.0, 180.0, 1000.0, 95.0, 18.0]
2025-09-12 03:41:00,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 17 hours, 3 minutes, 42 seconds)
2025-09-12 03:52:45,107 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:52:45,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:54:28,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 168.78143 ± 147.763
2025-09-12 03:54:28,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [12.107086, 160.2591, 15.982354, 359.13443, 23.681686, 395.07745, 70.3937, 171.65979, 94.65718, 384.86142]
2025-09-12 03:54:28,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 190.0, 34.0, 1000.0, 24.0, 1000.0, 102.0, 203.0, 57.0, 1000.0]
2025-09-12 03:54:28,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 16 hours, 41 minutes, 56 seconds)
2025-09-12 04:06:34,303 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:06:34,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:07:36,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 168.76263 ± 173.667
2025-09-12 04:07:36,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [224.64479, 271.29922, 29.755165, 31.967024, 608.8284, 38.422226, 75.572334, 115.94976, 29.97516, 261.21216]
2025-09-12 04:07:36,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [283.0, 333.0, 34.0, 31.0, 1000.0, 39.0, 73.0, 155.0, 51.0, 247.0]
2025-09-12 04:07:36,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 16 hours, 25 minutes, 52 seconds)
2025-09-12 04:18:29,731 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:18:29,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:19:28,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 136.14333 ± 90.835
2025-09-12 04:19:28,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [33.881977, 69.98694, 51.203278, 198.02736, 305.24533, 144.82176, 35.629803, 259.228, 168.22533, 95.183464]
2025-09-12 04:19:28,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [41.0, 83.0, 47.0, 174.0, 1000.0, 110.0, 39.0, 187.0, 239.0, 137.0]
2025-09-12 04:19:28,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 15 hours, 59 minutes, 45 seconds)
2025-09-12 04:31:49,298 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:31:49,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:33:34,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 172.47760 ± 133.395
2025-09-12 04:33:34,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [4.073119, 89.2978, 176.11124, 98.169464, 87.04012, 372.29358, 354.45605, 111.12721, 372.97366, 59.233772]
2025-09-12 04:33:34,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [13.0, 143.0, 138.0, 61.0, 131.0, 1000.0, 1000.0, 121.0, 1000.0, 66.0]
2025-09-12 04:33:34,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 15 hours, 51 minutes, 14 seconds)
2025-09-12 04:45:11,326 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:45:11,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:46:11,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 207.36328 ± 203.644
2025-09-12 04:46:11,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [26.08777, 71.13546, 62.261963, 119.795, 165.73456, 517.65125, 568.1862, 38.400692, 444.27582, 60.10416]
2025-09-12 04:46:11,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [31.0, 75.0, 63.0, 146.0, 209.0, 505.0, 560.0, 29.0, 446.0, 42.0]
2025-09-12 04:46:11,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 15 hours, 12 minutes, 23 seconds)
2025-09-12 04:58:06,660 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:58:06,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:58:49,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 143.35919 ± 78.947
2025-09-12 04:58:49,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [26.934649, 170.35785, 96.44161, 174.66727, 189.07425, 105.34317, 162.1635, 230.99155, 9.867294, 267.75076]
2025-09-12 04:58:49,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [28.0, 125.0, 122.0, 162.0, 205.0, 70.0, 228.0, 254.0, 30.0, 318.0]
2025-09-12 04:58:49,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 14 hours, 48 minutes)
2025-09-12 05:11:36,707 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:11:36,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:13:01,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 186.42946 ± 167.340
2025-09-12 05:13:01,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [471.7851, 171.37175, 65.97229, 416.54782, 102.12878, 79.66209, 36.7387, 417.5278, 56.565052, 45.995113]
2025-09-12 05:13:01,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 178.0, 51.0, 405.0, 87.0, 88.0, 75.0, 1000.0, 57.0, 66.0]
2025-09-12 05:13:01,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 14 hours, 49 minutes, 39 seconds)
2025-09-12 05:24:16,837 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:24:16,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:27:28,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 328.18039 ± 182.037
2025-09-12 05:27:28,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [465.58618, 100.48925, 350.37457, 59.80459, 423.01962, 531.36224, 529.55927, 395.33655, 395.23624, 31.035343]
2025-09-12 05:27:28,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 100.0, 1000.0, 94.0, 1000.0, 514.0, 1000.0, 1000.0, 1000.0, 53.0]
2025-09-12 05:27:28,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (328.18) for latency MM1Queue_a033_s075
2025-09-12 05:27:28,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 15 hours, 11 minutes, 16 seconds)
2025-09-12 05:40:18,399 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:40:18,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:42:04,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 203.25708 ± 156.979
2025-09-12 05:42:04,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [368.1245, 138.04988, 203.2077, 105.92687, 86.63243, 483.4421, 440.33548, 37.900246, 113.172165, 55.779415]
2025-09-12 05:42:04,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 129.0, 146.0, 130.0, 88.0, 1000.0, 1000.0, 54.0, 134.0, 49.0]
2025-09-12 05:42:04,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 15 hours, 4 minutes, 15 seconds)
2025-09-12 05:53:45,246 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:53:45,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:56:32,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 295.50729 ± 146.097
2025-09-12 05:56:32,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [518.1786, 356.67938, 426.60382, 366.85822, 359.6879, 293.60837, 133.66663, 347.3642, 142.0004, 10.425352]
2025-09-12 05:56:32,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 296.0, 1000.0, 115.0, 307.0, 123.0, 36.0]
2025-09-12 05:56:32,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 15 hours, 14 minutes, 31 seconds)
2025-09-12 06:07:47,613 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:07:47,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:10:50,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 235.63943 ± 132.688
2025-09-12 06:10:50,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [-19.180832, 34.492943, 199.13475, 291.07718, 405.0097, 159.3209, 350.6362, 338.39658, 297.94717, 299.55954]
2025-09-12 06:10:50,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [61.0, 38.0, 216.0, 1000.0, 1000.0, 169.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:10:50,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 15 hours, 21 minutes, 56 seconds)
2025-09-12 06:23:04,149 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:23:04,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:24:20,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 158.07431 ± 140.322
2025-09-12 06:24:20,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [156.1369, 75.487595, 191.4867, 68.728714, 92.27202, 42.87764, 364.8637, 62.287407, 51.99882, 474.60364]
2025-09-12 06:24:20,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [170.0, 94.0, 154.0, 83.0, 42.0, 41.0, 1000.0, 56.0, 61.0, 1000.0]
2025-09-12 06:24:20,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 14 hours, 58 minutes, 36 seconds)
2025-09-12 06:36:49,055 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:36:49,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:39:20,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 314.16241 ± 231.768
2025-09-12 06:39:20,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [102.699196, 344.80997, 526.14465, 572.09424, 30.04806, 467.91437, 340.18753, 671.1198, 63.99417, 22.612232]
2025-09-12 06:39:20,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [95.0, 1000.0, 1000.0, 491.0, 53.0, 1000.0, 1000.0, 575.0, 55.0, 36.0]
2025-09-12 06:39:20,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 14 hours, 51 minutes, 6 seconds)
2025-09-12 06:50:33,633 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:50:33,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:54:13,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 439.64185 ± 205.543
2025-09-12 06:54:13,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [563.7087, 129.87291, 459.2236, 791.3635, 78.779366, 557.13104, 357.85678, 608.9582, 484.81638, 364.70798]
2025-09-12 06:54:13,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 116.0, 1000.0, 1000.0, 61.0, 1000.0, 1000.0, 1000.0, 479.0, 1000.0]
2025-09-12 06:54:13,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (439.64) for latency MM1Queue_a033_s075
2025-09-12 06:54:13,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 14 hours, 40 minutes, 7 seconds)
2025-09-12 07:06:38,465 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:06:38,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:08:29,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 255.77280 ± 167.022
2025-09-12 07:08:29,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [99.73947, 59.626926, 313.64026, 484.21942, 59.741814, 65.96893, 214.9116, 462.14493, 420.56268, 377.1718]
2025-09-12 07:08:29,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [65.0, 82.0, 211.0, 1000.0, 60.0, 49.0, 115.0, 391.0, 1000.0, 1000.0]
2025-09-12 07:08:30,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 14 hours, 23 minutes, 34 seconds)
2025-09-12 07:20:58,349 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:20:58,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:23:12,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 350.13681 ± 287.960
2025-09-12 07:23:12,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [901.67535, 168.13431, 103.36474, 464.13147, 600.3201, 669.7498, 22.757126, 119.10001, 50.12218, 402.01324]
2025-09-12 07:23:12,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [912.0, 109.0, 78.0, 1000.0, 457.0, 1000.0, 26.0, 121.0, 67.0, 1000.0]
2025-09-12 07:23:12,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 14 hours, 13 minutes, 55 seconds)
2025-09-12 07:34:09,746 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:34:09,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:36:52,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 308.92160 ± 160.768
2025-09-12 07:36:52,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [361.37912, 40.65019, 252.19368, 607.957, 400.04697, 357.07675, 352.43845, 140.0337, 131.80482, 445.63535]
2025-09-12 07:36:52,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 41.0, 184.0, 1000.0, 1000.0, 336.0, 1000.0, 66.0, 92.0, 1000.0]
2025-09-12 07:36:52,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 14 hours, 1 minute, 22 seconds)
2025-09-12 07:49:24,784 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:49:24,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:51:49,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 345.84091 ± 176.592
2025-09-12 07:51:49,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [147.21442, 150.5419, 339.7856, 52.99089, 448.28967, 638.7229, 330.34283, 452.5109, 547.1764, 350.83356]
2025-09-12 07:51:49,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [152.0, 156.0, 239.0, 55.0, 1000.0, 1000.0, 318.0, 266.0, 1000.0, 1000.0]
2025-09-12 07:51:49,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 13 hours, 46 minutes, 12 seconds)
2025-09-12 08:03:53,813 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:03:53,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:05:29,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 229.46416 ± 218.042
2025-09-12 08:05:29,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [207.27797, 105.207, 13.315452, 79.14654, 738.45215, 41.51416, 198.51424, 497.135, 312.32043, 101.75885]
2025-09-12 08:05:29,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [187.0, 95.0, 33.0, 91.0, 631.0, 35.0, 178.0, 1000.0, 1000.0, 87.0]
2025-09-12 08:05:29,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 13 hours, 18 minutes, 11 seconds)
2025-09-12 08:17:35,048 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:17:35,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:20:07,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 260.07675 ± 200.151
2025-09-12 08:20:07,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [79.42685, 172.04996, 599.29474, 9.120842, 362.95956, 343.4404, 515.6989, 396.0609, 31.136976, 91.57846]
2025-09-12 08:20:07,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [85.0, 105.0, 1000.0, 16.0, 1000.0, 1000.0, 1000.0, 1000.0, 42.0, 73.0]
2025-09-12 08:20:07,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 13 hours, 7 minutes, 56 seconds)
2025-09-12 08:31:58,096 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:31:58,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:33:40,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 170.53081 ± 153.811
2025-09-12 08:33:40,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [49.882347, 228.45494, 73.63952, 52.45292, 29.818708, 308.0621, 528.6598, 75.50079, 81.49654, 277.34045]
2025-09-12 08:33:40,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [75.0, 188.0, 71.0, 58.0, 27.0, 1000.0, 1000.0, 56.0, 102.0, 1000.0]
2025-09-12 08:33:40,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 12 hours, 41 minutes, 1 second)
2025-09-12 08:45:32,674 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:45:32,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:47:41,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 215.50652 ± 148.282
2025-09-12 08:47:41,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [364.36252, 88.6864, 126.55996, 338.38702, 64.58445, 52.09591, 394.6632, 460.7484, 92.99691, 171.98024]
2025-09-12 08:47:41,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 62.0, 85.0, 1000.0, 104.0, 38.0, 1000.0, 1000.0, 82.0, 137.0]
2025-09-12 08:47:41,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 12 hours, 30 minutes, 38 seconds)
2025-09-12 08:59:09,930 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:59:09,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:01:50,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 410.14908 ± 283.784
2025-09-12 09:01:50,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [1066.9954, 661.589, 219.59193, 406.26816, 417.837, 331.5342, 377.8274, 41.307022, 514.86597, 63.675045]
2025-09-12 09:01:50,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [882.0, 1000.0, 211.0, 279.0, 269.0, 1000.0, 1000.0, 41.0, 1000.0, 40.0]
2025-09-12 09:01:50,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 12 hours, 8 minutes, 19 seconds)
2025-09-12 09:14:29,301 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:14:29,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:16:47,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 277.64310 ± 164.011
2025-09-12 09:16:47,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [532.55084, 209.23653, 117.96318, 24.611588, 346.2014, 370.22577, 417.72076, 474.34256, 116.10416, 167.47417]
2025-09-12 09:16:47,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [335.0, 173.0, 79.0, 29.0, 1000.0, 1000.0, 1000.0, 1000.0, 145.0, 145.0]
2025-09-12 09:16:47,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 12 hours, 7 minutes, 19 seconds)
2025-09-12 09:28:33,087 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:28:33,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:30:34,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 284.54178 ± 139.495
2025-09-12 09:30:34,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [295.24097, 57.736523, 523.0503, 424.45938, 331.3269, 335.6376, 390.4708, 108.749695, 161.6924, 217.05342]
2025-09-12 09:30:34,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 67.0, 1000.0, 358.0, 194.0, 1000.0, 306.0, 86.0, 126.0, 204.0]
2025-09-12 09:30:34,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 11 hours, 44 minutes, 29 seconds)
2025-09-12 09:42:35,151 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:42:35,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:45:16,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 299.07294 ± 177.225
2025-09-12 09:45:16,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [318.42178, 385.72177, 286.50858, 597.64166, 38.250145, 343.2721, 361.51917, 69.567024, 507.41763, 82.40954]
2025-09-12 09:45:16,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 460.0, 52.0, 1000.0, 1000.0, 54.0, 297.0, 71.0]
2025-09-12 09:45:16,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 11 hours, 41 minutes, 38 seconds)
2025-09-12 09:56:39,358 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:56:39,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:58:19,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 211.89644 ± 188.514
2025-09-12 09:58:19,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [85.69871, 161.71925, 24.626364, 109.65025, 603.5067, 392.27646, 291.6205, 49.422802, 9.848318, 390.59503]
2025-09-12 09:58:19,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [53.0, 132.0, 20.0, 87.0, 1000.0, 1000.0, 277.0, 46.0, 20.0, 1000.0]
2025-09-12 09:58:19,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 11 hours, 17 minutes, 59 seconds)
2025-09-12 10:10:03,930 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:10:03,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:12:37,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 304.77423 ± 151.668
2025-09-12 10:12:37,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [472.12646, 361.7773, 179.04935, 342.64206, 313.73447, 508.29636, 443.62192, 305.9543, 84.66062, 35.879272]
2025-09-12 10:12:37,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 183.0, 138.0, 1000.0, 226.0, 1000.0, 1000.0, 1000.0, 64.0, 42.0]
2025-09-12 10:12:37,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 11 hours, 5 minutes, 18 seconds)
2025-09-12 10:23:47,052 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:23:47,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:25:36,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 299.02164 ± 207.115
2025-09-12 10:25:36,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [321.66257, 390.86887, 356.8309, 218.53894, 679.0928, 146.78658, 153.3911, 80.2206, 27.74312, 615.0807]
2025-09-12 10:25:36,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [288.0, 1000.0, 1000.0, 166.0, 499.0, 153.0, 356.0, 56.0, 65.0, 418.0]
2025-09-12 10:25:36,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 10 hours, 33 minutes, 7 seconds)
2025-09-12 10:37:18,681 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:37:18,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:39:15,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 265.48328 ± 139.347
2025-09-12 10:39:15,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [24.13048, 190.00931, 465.1367, 398.2147, 168.62471, 273.4976, 462.4367, 156.10295, 179.45132, 337.22812]
2025-09-12 10:39:15,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [28.0, 134.0, 301.0, 1000.0, 267.0, 268.0, 1000.0, 114.0, 122.0, 1000.0]
2025-09-12 10:39:15,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 10 hours, 18 minutes, 7 seconds)
2025-09-12 10:51:04,672 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:51:04,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:54:28,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 451.89581 ± 266.501
2025-09-12 10:54:28,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [802.35925, 772.81604, 48.229073, 792.6988, 85.53886, 444.03314, 423.02963, 214.51433, 386.1825, 549.5566]
2025-09-12 10:54:28,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 46.0, 1000.0, 52.0, 224.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:54:28,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (451.90) for latency MM1Queue_a033_s075
2025-09-12 10:54:28,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 10 hours, 8 minutes, 58 seconds)
2025-09-12 11:06:31,307 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:06:31,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:10:04,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 366.41769 ± 105.555
2025-09-12 11:10:04,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [307.93314, 237.57372, 353.43985, 294.943, 299.60968, 363.18628, 343.82684, 643.1062, 429.41833, 391.1398]
2025-09-12 11:10:04,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 208.0, 245.0, 175.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:10:04,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 10 hours, 17 minutes)
2025-09-12 11:21:10,293 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:21:10,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:23:42,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 252.67953 ± 164.268
2025-09-12 11:23:42,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [337.94852, 61.676914, 307.78693, 52.467545, 386.61578, 311.88306, 427.90118, 494.38254, 135.81822, 10.314861]
2025-09-12 11:23:42,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 39.0, 176.0, 65.0, 1000.0, 1000.0, 1000.0, 1000.0, 124.0, 15.0]
2025-09-12 11:23:42,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 9 hours, 57 minutes, 4 seconds)
2025-09-12 11:36:02,382 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:36:02,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:38:33,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 279.58832 ± 210.293
2025-09-12 11:38:33,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [81.76963, 310.74667, 242.70358, 470.82422, 398.55832, 397.37372, 34.555943, 712.29803, 118.074585, 28.978226]
2025-09-12 11:38:33,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [42.0, 1000.0, 197.0, 1000.0, 1000.0, 1000.0, 65.0, 1000.0, 106.0, 28.0]
2025-09-12 11:38:33,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 9 hours, 58 minutes, 8 seconds)
2025-09-12 11:49:47,735 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:49:47,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:52:26,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 338.69522 ± 265.777
2025-09-12 11:52:26,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [721.2734, 16.29697, 418.41458, 116.75344, 237.6343, 245.20322, 858.08655, 426.10785, 331.02878, 16.153156]
2025-09-12 11:52:26,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 30.0, 366.0, 83.0, 194.0, 1000.0, 1000.0, 1000.0, 1000.0, 17.0]
2025-09-12 11:52:26,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 9 hours, 45 minutes, 27 seconds)
2025-09-12 12:04:14,655 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:04:14,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:07:17,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 417.08496 ± 243.975
2025-09-12 12:07:17,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [499.3618, 358.14093, 342.87442, 218.05977, 1035.925, 450.24982, 519.414, 432.83267, 86.99601, 226.99513]
2025-09-12 12:07:17,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 174.0, 1000.0, 280.0, 1000.0, 1000.0, 85.0, 136.0]
2025-09-12 12:07:17,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 9 hours, 27 minutes, 58 seconds)
2025-09-12 12:18:58,973 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:18:58,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:21:10,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 259.92453 ± 155.308
2025-09-12 12:21:10,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [334.75586, 546.6639, 161.42561, 143.94873, 137.51811, 90.276405, 344.11584, 477.28885, 81.944016, 281.3082]
2025-09-12 12:21:10,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 361.0, 106.0, 104.0, 105.0, 79.0, 1000.0, 1000.0, 61.0, 1000.0]
2025-09-12 12:21:10,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 9 hours, 28 seconds)
2025-09-12 12:33:23,245 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:33:23,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:36:13,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 353.22906 ± 189.442
2025-09-12 12:36:13,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [509.16187, 496.44653, 47.146152, 530.10876, 470.4257, 393.32788, 54.344322, 270.58298, 575.4442, 185.30215]
2025-09-12 12:36:13,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 340.0, 34.0, 1000.0, 1000.0, 1000.0, 76.0, 1000.0, 543.0, 149.0]
2025-09-12 12:36:13,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 8 hours, 56 minutes, 34 seconds)
2025-09-12 12:47:39,933 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:47:39,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:49:27,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 228.96451 ± 157.332
2025-09-12 12:49:27,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [315.5755, 230.47314, 475.35074, 458.59097, 81.4692, 20.47435, 170.75647, 362.30966, 93.53419, 81.110725]
2025-09-12 12:49:27,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 170.0, 1000.0, 265.0, 69.0, 51.0, 120.0, 1000.0, 63.0, 67.0]
2025-09-12 12:49:27,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 8 hours, 30 minutes, 27 seconds)
2025-09-12 13:01:30,108 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:01:30,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:04:41,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 428.69159 ± 256.041
2025-09-12 13:04:41,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [199.09781, 228.53574, 52.32598, 595.39557, 304.5211, 671.23047, 354.2108, 987.29956, 467.59125, 426.70764]
2025-09-12 13:04:41,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [133.0, 109.0, 57.0, 529.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:04:41,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 8 hours, 25 minutes, 42 seconds)
2025-09-12 13:15:58,288 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:15:58,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:18:39,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 373.92169 ± 252.250
2025-09-12 13:18:39,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [927.1531, 477.50125, 330.4863, 305.11728, 154.27446, 326.64377, 25.065937, 696.3286, 299.36026, 197.28601]
2025-09-12 13:18:39,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 213.0, 114.0, 1000.0, 24.0, 1000.0, 216.0, 183.0]
2025-09-12 13:18:39,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 8 hours, 5 minutes, 18 seconds)
2025-09-12 13:30:55,547 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:30:55,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:34:18,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 411.50385 ± 244.393
2025-09-12 13:34:18,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [436.0699, 606.6317, 339.9966, 88.79274, 202.38853, 192.11694, 979.4369, 369.41605, 572.94836, 327.24094]
2025-09-12 13:34:18,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 70.0, 125.0, 148.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:34:18,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 2 minutes, 38 seconds)
2025-09-12 13:45:21,196 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:45:21,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:47:12,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 307.12006 ± 218.741
2025-09-12 13:47:12,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [18.039476, 22.291473, 90.79832, 610.92883, 435.4478, 186.13974, 671.5587, 384.86243, 370.51443, 280.61948]
2025-09-12 13:47:12,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [17.0, 33.0, 75.0, 1000.0, 223.0, 156.0, 1000.0, 232.0, 274.0, 1000.0]
2025-09-12 13:47:12,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 7 hours, 34 minutes, 19 seconds)
2025-09-12 13:59:36,959 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:59:36,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:02:06,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 370.09381 ± 292.038
2025-09-12 14:02:06,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [368.21582, 54.179123, 681.8092, 297.3847, 1005.04193, 34.694168, 423.58743, 272.4948, 50.505585, 513.02515]
2025-09-12 14:02:06,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [221.0, 49.0, 444.0, 1000.0, 575.0, 57.0, 1000.0, 1000.0, 32.0, 1000.0]
2025-09-12 14:02:06,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 30 minutes, 28 seconds)
2025-09-12 14:13:47,284 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:13:47,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:17:21,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 416.98770 ± 175.230
2025-09-12 14:17:21,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [535.54755, 303.5792, 781.2157, 465.29453, 301.5677, 548.92487, 393.6599, 257.28662, 122.85015, 459.95065]
2025-09-12 14:17:21,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 248.0, 1000.0, 70.0, 318.0]
2025-09-12 14:17:21,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 16 minutes)
2025-09-12 14:28:16,752 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:28:16,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:29:57,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 183.10025 ± 132.048
2025-09-12 14:29:57,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [161.43213, 231.07089, 322.80508, 191.46004, 266.09418, 45.395954, 454.44446, 21.652912, 60.66144, 75.98523]
2025-09-12 14:29:57,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [95.0, 199.0, 1000.0, 167.0, 1000.0, 35.0, 1000.0, 44.0, 60.0, 57.0]
2025-09-12 14:29:57,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 53 minutes, 32 seconds)
2025-09-12 14:41:44,715 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:41:44,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:43:56,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 252.86600 ± 150.024
2025-09-12 14:43:56,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [273.18204, 19.625217, 64.21277, 63.79259, 414.19836, 318.52795, 338.5079, 461.48062, 200.05753, 375.07504]
2025-09-12 14:43:56,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 19.0, 59.0, 44.0, 1000.0, 181.0, 1000.0, 1000.0, 149.0, 309.0]
2025-09-12 14:43:56,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 29 minutes, 57 seconds)
2025-09-12 14:56:34,831 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:56:34,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:58:49,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 385.86874 ± 257.224
2025-09-12 14:58:49,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [184.18492, 769.54846, 135.58894, 237.32127, 590.0434, 538.04443, 829.0952, 193.6413, 105.31991, 275.89966]
2025-09-12 14:58:49,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [168.0, 509.0, 113.0, 176.0, 1000.0, 1000.0, 584.0, 157.0, 123.0, 1000.0]
2025-09-12 14:58:49,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 26 minutes, 45 seconds)
2025-09-12 15:10:03,618 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:10:03,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:12:34,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 311.18948 ± 221.431
2025-09-12 15:12:34,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [26.560192, 127.36923, 440.49884, 650.2778, 695.88965, 304.50372, 213.46571, 332.02554, 18.410593, 302.8934]
2025-09-12 15:12:34,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [29.0, 102.0, 1000.0, 1000.0, 1000.0, 184.0, 127.0, 1000.0, 29.0, 1000.0]
2025-09-12 15:12:34,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 6 minutes, 23 seconds)
2025-09-12 15:24:56,975 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:24:56,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:28:02,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 459.89313 ± 329.984
2025-09-12 15:28:02,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [293.41782, 313.1788, 281.6958, 163.99759, 348.361, 663.7473, 354.27908, 444.68747, 1378.7697, 356.7969]
2025-09-12 15:28:02,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [189.0, 197.0, 198.0, 153.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:28:02,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (459.89) for latency MM1Queue_a033_s075
2025-09-12 15:28:02,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 53 minutes, 25 seconds)
2025-09-12 15:38:49,500 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:38:49,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:41:17,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 463.49152 ± 354.311
2025-09-12 15:41:17,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [580.0897, 165.63257, 263.72614, 1116.491, 378.91266, 65.46438, 344.03677, 362.9251, 1128.2854, 229.35132]
2025-09-12 15:41:17,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 115.0, 205.0, 705.0, 1000.0, 51.0, 1000.0, 347.0, 730.0, 231.0]
2025-09-12 15:41:17,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (463.49) for latency MM1Queue_a033_s075
2025-09-12 15:41:17,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 42 minutes, 24 seconds)
2025-09-12 15:54:38,299 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:54:38,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:56:32,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 313.16583 ± 262.622
2025-09-12 15:56:32,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [-3.7546442, 72.69718, 155.70932, 817.5579, 292.26944, 584.75446, 564.8449, 186.37498, 24.464397, 436.74017]
2025-09-12 15:56:32,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [10.0, 77.0, 110.0, 522.0, 405.0, 356.0, 378.0, 1000.0, 31.0, 1000.0]
2025-09-12 15:56:32,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 33 minutes, 57 seconds)
2025-09-12 16:08:08,520 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:08:08,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:09:56,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 293.84088 ± 214.999
2025-09-12 16:09:56,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [20.994967, 392.07224, 764.2059, 244.96402, 225.12172, 285.8635, 54.113354, 88.686935, 510.07996, 352.3062]
2025-09-12 16:09:56,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [27.0, 1000.0, 1000.0, 174.0, 161.0, 222.0, 31.0, 64.0, 1000.0, 267.0]
2025-09-12 16:09:56,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 12 minutes, 52 seconds)
2025-09-12 16:21:00,661 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:21:00,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:22:57,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 283.86505 ± 166.624
2025-09-12 16:22:57,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [332.67548, 204.24742, 425.47723, 637.6413, 133.15701, 377.2851, 299.12628, 286.0613, 41.331696, 101.64713]
2025-09-12 16:22:57,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [237.0, 128.0, 264.0, 358.0, 112.0, 1000.0, 1000.0, 1000.0, 35.0, 75.0]
2025-09-12 16:22:57,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 55 minutes, 38 seconds)
2025-09-12 16:34:47,004 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:34:47,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:38:06,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 345.53638 ± 211.583
2025-09-12 16:38:06,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [331.58838, 798.8162, 317.0656, 580.9868, 204.30676, 478.7346, 302.1541, 108.05946, 52.96516, 280.6869]
2025-09-12 16:38:06,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 134.0, 1000.0, 1000.0, 87.0, 60.0, 1000.0]
2025-09-12 16:38:06,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 40 minutes, 17 seconds)
2025-09-12 16:49:32,053 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:49:32,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:51:40,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 265.76880 ± 147.887
2025-09-12 16:51:40,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [149.23483, 552.41205, 434.99088, 209.95872, 131.07614, 139.05544, 343.1562, 371.3732, 66.1688, 260.26202]
2025-09-12 16:51:40,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [109.0, 1000.0, 1000.0, 172.0, 115.0, 54.0, 1000.0, 1000.0, 40.0, 192.0]
2025-09-12 16:51:40,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 27 minutes, 24 seconds)
2025-09-12 17:04:09,489 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:04:09,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:06:02,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 436.81934 ± 289.328
2025-09-12 17:06:02,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [254.15372, 334.19183, 94.648895, 328.6931, 983.2999, 999.5968, 300.30014, 446.0617, 312.79077, 314.45605]
2025-09-12 17:06:02,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [137.0, 340.0, 82.0, 188.0, 642.0, 1000.0, 170.0, 361.0, 198.0, 1000.0]
2025-09-12 17:06:02,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 10 minutes, 11 seconds)
2025-09-12 17:17:00,247 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:17:00,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:19:49,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 374.40912 ± 169.713
2025-09-12 17:19:49,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [164.61365, 414.2575, 324.13116, 629.50116, 533.26764, 37.915092, 502.8283, 387.1025, 474.50037, 275.97394]
2025-09-12 17:19:49,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [109.0, 1000.0, 190.0, 1000.0, 343.0, 52.0, 1000.0, 1000.0, 370.0, 1000.0]
2025-09-12 17:19:49,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 57 minutes, 37 seconds)
2025-09-12 17:31:27,118 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:31:27,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:34:34,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 503.81445 ± 277.612
2025-09-12 17:34:34,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [284.24548, 726.72186, 306.976, 354.4701, 544.07733, 539.0547, 566.21216, 603.5181, 13.523968, 1099.3445]
2025-09-12 17:34:34,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 163.0, 230.0, 1000.0, 1000.0, 364.0, 1000.0, 16.0, 1000.0]
2025-09-12 17:34:34,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (503.81) for latency MM1Queue_a033_s075
2025-09-12 17:34:34,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 49 minutes, 9 seconds)
2025-09-12 17:46:27,458 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:46:27,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:48:30,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 311.60004 ± 303.040
2025-09-12 17:48:30,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [575.3722, -7.183055, 508.5754, 1022.0657, 389.48865, 245.61609, 51.901524, 69.19973, 156.37914, 104.58508]
2025-09-12 17:48:30,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 10.0, 1000.0, 1000.0, 257.0, 1000.0, 59.0, 49.0, 110.0, 54.0]
2025-09-12 17:48:30,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 31 minutes, 11 seconds)
2025-09-12 18:00:12,824 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:00:12,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:02:48,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 295.41241 ± 157.880
2025-09-12 18:02:48,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [370.293, 60.110493, 314.82736, 361.09677, 118.30081, 35.425568, 312.68787, 442.23975, 502.0925, 437.05]
2025-09-12 18:02:48,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 47.0, 144.0, 1000.0, 97.0, 34.0, 1000.0, 269.0, 1000.0, 1000.0]
2025-09-12 18:02:48,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 19 minutes, 12 seconds)
2025-09-12 18:15:07,733 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:15:07,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:17:34,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 617.69080 ± 458.266
2025-09-12 18:17:34,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [451.72256, 356.1276, 325.09793, 1029.981, 274.14853, 25.552132, 725.2227, 1743.6271, 638.9373, 606.49115]
2025-09-12 18:17:34,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [305.0, 216.0, 227.0, 1000.0, 157.0, 32.0, 437.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:17:34,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (617.69) for latency MM1Queue_a033_s075
2025-09-12 18:17:34,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 5 minutes, 59 seconds)
2025-09-12 18:29:04,923 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:29:04,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:30:55,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 280.58716 ± 193.118
2025-09-12 18:30:55,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [394.16898, 133.40675, 593.1574, 238.69986, 2.2241104, 242.06374, 226.07954, 8.192817, 477.37653, 490.50183]
2025-09-12 18:30:55,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 81.0, 344.0, 118.0, 17.0, 1000.0, 121.0, 15.0, 1000.0, 270.0]
2025-09-12 18:30:55,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 50 minutes, 37 seconds)
2025-09-12 18:42:40,281 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:42:40,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:44:39,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 465.96429 ± 361.258
2025-09-12 18:44:39,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [323.87973, 22.587303, 1271.2485, 308.74643, 359.16568, 371.34036, 103.087326, 921.0044, 323.59174, 654.99176]
2025-09-12 18:44:39,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [155.0, 23.0, 831.0, 225.0, 1000.0, 204.0, 69.0, 526.0, 1000.0, 360.0]
2025-09-12 18:44:39,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 34 minutes, 12 seconds)
2025-09-12 18:56:24,574 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:56:24,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:57:59,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 343.23611 ± 195.281
2025-09-12 18:57:59,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [556.83997, 136.77203, 264.80728, 289.8293, 555.8718, 313.53375, 655.22437, 81.91203, 106.524376, 471.04617]
2025-09-12 18:57:59,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [301.0, 81.0, 139.0, 174.0, 325.0, 1000.0, 1000.0, 40.0, 115.0, 297.0]
2025-09-12 18:57:59,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 18 minutes, 56 seconds)
2025-09-12 19:09:30,697 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:09:30,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:11:29,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 451.90454 ± 339.108
2025-09-12 19:11:29,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [138.6359, 544.7617, 156.20232, 811.41473, 374.14215, 21.060904, 622.9191, 38.906063, 1013.9217, 797.081]
2025-09-12 19:11:29,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [99.0, 1000.0, 97.0, 479.0, 236.0, 32.0, 342.0, 40.0, 1000.0, 1000.0]
2025-09-12 19:11:29,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 3 minutes, 36 seconds)
2025-09-12 19:22:44,373 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:22:44,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:25:05,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 511.99350 ± 381.127
2025-09-12 19:25:05,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [296.80563, 455.75006, 1383.1842, 48.14052, 565.1081, 54.663837, 438.04477, 338.65778, 925.6299, 613.9507]
2025-09-12 19:25:05,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [174.0, 265.0, 728.0, 36.0, 336.0, 38.0, 1000.0, 1000.0, 493.0, 1000.0]
2025-09-12 19:25:05,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 48 minutes)
2025-09-12 19:37:36,257 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:37:36,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:40:38,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 437.71417 ± 374.846
2025-09-12 19:40:38,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [240.5553, 406.0378, 498.78806, 67.03981, 465.82028, 113.687294, 301.96194, 290.86887, 1472.9119, 519.47034]
2025-09-12 19:40:38,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 268.0, 1000.0, 40.0, 1000.0, 103.0, 1000.0, 1000.0, 882.0, 335.0]
2025-09-12 19:40:39,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 37 minutes, 37 seconds)
2025-09-12 19:51:42,286 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:51:42,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:53:39,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 354.10788 ± 262.427
2025-09-12 19:53:39,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [22.970383, 75.33239, 750.816, 447.36157, 53.34006, 568.8522, 204.6245, 636.81995, 606.97894, 173.98294]
2025-09-12 19:53:39,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [19.0, 73.0, 1000.0, 1000.0, 40.0, 352.0, 1000.0, 312.0, 382.0, 125.0]
2025-09-12 19:53:39,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 22 minutes, 48 seconds)
2025-09-12 20:05:13,975 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:05:13,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:06:49,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 314.12860 ± 191.511
2025-09-12 20:06:49,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [776.7823, 441.10776, 284.0947, 429.60315, 95.85293, 226.03755, 106.45865, 262.32236, 346.617, 172.40993]
2025-09-12 20:06:49,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [471.0, 1000.0, 156.0, 208.0, 74.0, 140.0, 58.0, 1000.0, 198.0, 165.0]
2025-09-12 20:06:49,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 8 minutes, 50 seconds)
2025-09-12 20:19:10,251 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:19:10,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:21:18,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 435.55322 ± 437.883
2025-09-12 20:21:18,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [126.88854, 523.3495, 108.285194, 7.4746256, 939.0375, 811.8301, 1.9035915, 74.462494, 426.4833, 1335.8176]
2025-09-12 20:21:18,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [102.0, 331.0, 104.0, 14.0, 1000.0, 1000.0, 22.0, 51.0, 1000.0, 1000.0]
2025-09-12 20:21:18,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 55 minutes, 51 seconds)
2025-09-12 20:32:26,229 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:32:26,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:34:59,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 724.43323 ± 487.253
2025-09-12 20:34:59,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [1663.6428, 756.48755, 316.6022, 327.35358, 425.42004, 486.29315, 215.68997, 451.42188, 1408.7683, 1192.653]
2025-09-12 20:34:59,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 540.0, 214.0, 182.0, 323.0, 248.0, 141.0, 1000.0, 843.0, 1000.0]
2025-09-12 20:34:59,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1226 [INFO]: New best (724.43) for latency MM1Queue_a033_s075
2025-09-12 20:34:59,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 41 minutes, 56 seconds)
2025-09-12 20:47:10,289 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:47:10,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:48:29,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 431.38177 ± 330.654
2025-09-12 20:48:29,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [883.07513, 61.62652, 467.36966, 423.7118, 401.24814, 129.77287, 230.02133, 1107.9503, 558.35895, 50.68295]
2025-09-12 20:48:29,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 66.0, 257.0, 203.0, 239.0, 75.0, 95.0, 648.0, 262.0, 57.0]
2025-09-12 20:48:29,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 27 minutes, 8 seconds)
2025-09-12 20:59:31,749 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:59:31,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:01:03,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 272.05084 ± 167.202
2025-09-12 21:01:03,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [210.56168, 508.71463, 119.47018, 383.7573, 34.52324, 342.6106, 200.88448, 268.1427, 89.55954, 562.2838]
2025-09-12 21:01:03,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [151.0, 338.0, 126.0, 1000.0, 30.0, 1000.0, 110.0, 166.0, 79.0, 336.0]
2025-09-12 21:01:03,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 13 minutes, 28 seconds)
2025-09-12 21:13:24,359 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:13:24,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:14:11,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1221 [DEBUG]: Total Reward: 148.92213 ± 113.640
2025-09-12 21:14:11,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1222 [DEBUG]: All rewards: [212.1598, 106.60979, 112.06776, 260.004, 78.95145, 125.38468, 174.84125, 15.793508, 402.0012, 1.4078069]
2025-09-12 21:14:11,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [168.0, 57.0, 76.0, 142.0, 68.0, 71.0, 104.0, 26.0, 1000.0, 11.0]
2025-09-12 21:14:11,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-ant):1251 [DEBUG]: Training session finished
