2025-09-11 22:53:35,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc25-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-09-11 22:53:35,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc25-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-09-11 22:53:35,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x151cc6582390>}
2025-09-11 22:53:35,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1111 [DEBUG]: using device: cuda
2025-09-11 22:53:35,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1133 [INFO]: Creating new trainer
2025-09-11 22:53:35,307 baseline-mbpac-noiseperc25-ant:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-09-11 22:53:35,308 baseline-mbpac-noiseperc25-ant:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 22:53:35,318 baseline-mbpac-noiseperc25-ant:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=27, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=8, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-11 22:53:36,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1194 [DEBUG]: Starting training session...
2025-09-11 22:53:36,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 1/100
2025-09-11 23:05:06,764 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:05:06,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:05:37,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -95.25161 ± 80.884
2025-09-11 23:05:37,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-103.67866, -52.86963, -126.259445, -115.56054, -267.95386, 1.8983873, -19.615988, -184.0789, 0.6746882, -85.07217]
2025-09-11 23:05:37,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [74.0, 32.0, 165.0, 118.0, 352.0, 24.0, 17.0, 206.0, 14.0, 94.0]
2025-09-11 23:05:37,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (-95.25) for latency MM1Queue_a033_s075
2025-09-11 23:05:37,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 19 hours, 50 minutes, 55 seconds)
2025-09-11 23:17:41,676 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:17:41,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:18:30,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -113.43794 ± 176.331
2025-09-11 23:18:30,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-22.5921, -628.98254, -24.806044, -37.775715, -25.84554, -52.977154, -43.7695, -37.293545, -110.584404, -149.75284]
2025-09-11 23:18:30,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [53.0, 1000.0, 28.0, 60.0, 20.0, 82.0, 60.0, 57.0, 167.0, 229.0]
2025-09-11 23:18:30,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 20 hours, 20 minutes, 17 seconds)
2025-09-11 23:31:48,700 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:31:48,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:32:59,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -152.79068 ± 252.999
2025-09-11 23:32:59,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [5.182191, 1.7934477, 6.5604553, -12.11527, -25.251959, -657.043, -103.2393, -19.26762, -72.89215, -651.63367]
2025-09-11 23:32:59,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [42.0, 29.0, 47.0, 12.0, 41.0, 1000.0, 153.0, 51.0, 109.0, 1000.0]
2025-09-11 23:32:59,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 21 hours, 13 minutes, 25 seconds)
2025-09-11 23:45:31,180 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:45:31,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:47:08,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -187.64946 ± 244.231
2025-09-11 23:47:08,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-23.208445, -1.9719371, -96.69802, -532.03375, -577.6893, -9.637547, 12.173909, -563.9961, -34.01465, -49.418873]
2025-09-11 23:47:08,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [18.0, 26.0, 144.0, 1000.0, 1000.0, 28.0, 58.0, 1000.0, 32.0, 98.0]
2025-09-11 23:47:08,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 21 hours, 24 minutes, 57 seconds)
2025-09-11 23:58:49,356 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:58:49,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:59:56,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -136.18073 ± 207.664
2025-09-11 23:59:56,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-39.497257, -558.08496, -17.157494, -542.45874, -35.802975, -15.170632, -58.89741, -33.77764, -6.6021194, -54.357872]
2025-09-11 23:59:56,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [63.0, 1000.0, 31.0, 1000.0, 88.0, 26.0, 88.0, 30.0, 40.0, 64.0]
2025-09-11 23:59:56,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 21 hours, 23 seconds)
2025-09-12 00:12:51,296 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:12:51,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:13:06,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -22.95762 ± 32.321
2025-09-12 00:13:06,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-28.357073, -81.96038, -69.83812, 1.4815633, 9.579621, -53.71302, 13.627042, -6.3969803, -4.2955637, -9.703276]
2025-09-12 00:13:06,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [76.0, 135.0, 113.0, 28.0, 15.0, 77.0, 44.0, 20.0, 14.0, 12.0]
2025-09-12 00:13:06,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (-22.96) for latency MM1Queue_a033_s075
2025-09-12 00:13:06,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 21 hours, 8 minutes, 27 seconds)
2025-09-12 00:25:02,048 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:25:02,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:25:36,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -74.74585 ± 200.927
2025-09-12 00:25:36,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [23.28068, -39.893364, 12.56755, -14.614427, -14.126716, -674.9215, -4.468274, -4.9373937, 5.516746, -35.8618]
2025-09-12 00:25:36,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [39.0, 34.0, 13.0, 25.0, 29.0, 1000.0, 19.0, 12.0, 14.0, 56.0]
2025-09-12 00:25:36,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 20 hours, 48 minutes, 13 seconds)
2025-09-12 00:38:49,770 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:38:49,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:39:36,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -89.15243 ± 196.034
2025-09-12 00:39:36,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-16.34283, -69.8822, -3.52931, -70.861336, -6.2226343, -672.6326, -10.559559, 3.1176152, -22.641829, -21.96948]
2025-09-12 00:39:36,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [37.0, 48.0, 25.0, 111.0, 15.0, 1000.0, 18.0, 24.0, 41.0, 44.0]
2025-09-12 00:39:36,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 20 hours, 25 minutes, 52 seconds)
2025-09-12 00:51:21,512 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:51:21,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:51:36,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -28.86459 ± 30.854
2025-09-12 00:51:36,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [1.7831755, -8.999936, -40.349773, -8.939769, -82.272385, -31.888756, 11.81978, -10.167274, -38.90409, -80.72687]
2025-09-12 00:51:36,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [27.0, 18.0, 81.0, 14.0, 152.0, 38.0, 42.0, 12.0, 72.0, 75.0]
2025-09-12 00:51:36,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 19 hours, 33 minutes, 8 seconds)
2025-09-12 01:04:21,992 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:04:21,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:04:32,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -2.44482 ± 17.160
2025-09-12 01:04:32,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [32.327755, -13.051599, -37.432217, -14.057565, 0.5646047, 3.6265798, -0.0007522175, 11.579159, -6.3182626, -1.6859424]
2025-09-12 01:04:32,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [52.0, 46.0, 77.0, 42.0, 16.0, 12.0, 44.0, 18.0, 34.0, 25.0]
2025-09-12 01:04:32,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (-2.44) for latency MM1Queue_a033_s075
2025-09-12 01:04:32,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 19 hours, 22 minutes, 41 seconds)
2025-09-12 01:16:19,519 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:16:19,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:16:31,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -14.42564 ± 18.845
2025-09-12 01:16:31,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-16.992832, -2.8997014, 16.192818, -37.887836, -37.876373, -42.799503, -2.2763698, 5.4768906, -9.54842, -15.645112]
2025-09-12 01:16:31,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [29.0, 31.0, 33.0, 38.0, 85.0, 88.0, 28.0, 33.0, 15.0, 40.0]
2025-09-12 01:16:31,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 18 hours, 48 minutes, 47 seconds)
2025-09-12 01:29:01,086 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:29:01,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:29:12,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -24.41827 ± 25.440
2025-09-12 01:29:12,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-78.01421, -27.347612, -16.480888, -3.6218028, -6.932349, -6.2547565, -50.105335, -52.04115, -0.32662463, -3.0579393]
2025-09-12 01:29:12,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [70.0, 58.0, 34.0, 21.0, 13.0, 22.0, 78.0, 92.0, 18.0, 11.0]
2025-09-12 01:29:12,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 18 hours, 39 minutes, 14 seconds)
2025-09-12 01:41:33,814 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:41:33,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:41:38,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -5.96231 ± 4.628
2025-09-12 01:41:38,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-10.55326, -0.20924643, 0.54958594, -3.306675, -4.1123652, -6.5951476, -4.177191, -5.2868104, -14.545842, -11.386184]
2025-09-12 01:41:38,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [27.0, 12.0, 26.0, 10.0, 19.0, 12.0, 9.0, 13.0, 17.0, 30.0]
2025-09-12 01:41:38,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 17 hours, 59 minutes, 19 seconds)
2025-09-12 01:54:00,425 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:54:00,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:54:08,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -0.36101 ± 8.032
2025-09-12 01:54:08,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-2.9843786, -4.2038507, 6.72394, -1.0991358, -9.782209, 8.760004, -11.788557, -5.553572, 1.1582506, 15.159413]
2025-09-12 01:54:08,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [9.0, 12.0, 32.0, 39.0, 98.0, 8.0, 11.0, 14.0, 27.0, 49.0]
2025-09-12 01:54:08,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (-0.36) for latency MM1Queue_a033_s075
2025-09-12 01:54:08,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 17 hours, 55 minutes, 43 seconds)
2025-09-12 02:06:26,353 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:06:26,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:06:35,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -15.05166 ± 28.253
2025-09-12 02:06:35,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-18.823032, 18.772778, -80.10742, -1.3126067, 3.8015242, -32.671864, -12.201378, -27.97612, 23.545715, -23.544176]
2025-09-12 02:06:35,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [39.0, 46.0, 50.0, 25.0, 14.0, 27.0, 11.0, 61.0, 32.0, 26.0]
2025-09-12 02:06:35,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 17 hours, 34 minutes, 57 seconds)
2025-09-12 02:19:39,189 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:19:39,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:19:48,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -10.33717 ± 14.212
2025-09-12 02:19:48,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-36.46598, -35.294598, -15.82541, 8.403529, -7.57036, 1.3160089, -1.2236377, -10.330335, -1.7404226, -4.6404724]
2025-09-12 02:19:48,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [63.0, 58.0, 20.0, 28.0, 9.0, 24.0, 14.0, 39.0, 21.0, 53.0]
2025-09-12 02:19:48,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 17 hours, 43 minutes, 9 seconds)
2025-09-12 02:32:18,740 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:32:18,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:32:29,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -4.34339 ± 18.087
2025-09-12 02:32:29,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [10.733642, -4.9258347, -5.508522, 4.6073675, 24.951801, -18.358976, -34.402622, 13.354256, -30.60491, -3.2800717]
2025-09-12 02:32:29,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 16.0, 10.0, 24.0, 38.0, 25.0, 102.0, 19.0, 56.0, 11.0]
2025-09-12 02:32:29,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 17 hours, 30 minutes, 35 seconds)
2025-09-12 02:43:51,944 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:43:51,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:43:58,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -1.74830 ± 8.536
2025-09-12 02:43:58,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-7.051256, -0.17379339, -4.814065, -6.717311, 5.7632375, -2.656415, 18.00154, -14.247563, -8.191289, 2.6039522]
2025-09-12 02:43:58,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [25.0, 10.0, 11.0, 10.0, 17.0, 14.0, 60.0, 15.0, 12.0, 64.0]
2025-09-12 02:43:58,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 17 hours, 2 minutes, 13 seconds)
2025-09-12 02:56:23,935 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:56:23,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:56:31,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 3.99604 ± 15.853
2025-09-12 02:56:31,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-6.476173, 24.74529, -20.198053, 24.143457, -21.752966, 17.926983, 3.541729, 4.6640387, -0.71965516, 14.085706]
2025-09-12 02:56:31,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [13.0, 28.0, 52.0, 23.0, 16.0, 38.0, 17.0, 23.0, 15.0, 47.0]
2025-09-12 02:56:31,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (4.00) for latency MM1Queue_a033_s075
2025-09-12 02:56:31,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 16 hours, 50 minutes, 32 seconds)
2025-09-12 03:08:24,045 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:08:24,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:08:36,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -17.47972 ± 26.269
2025-09-12 03:08:36,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [2.332689, 4.710251, 1.8331566, -20.434868, -26.664034, -22.21462, -89.409134, -14.773102, -9.5098915, -0.66763085]
2025-09-12 03:08:36,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [27.0, 13.0, 19.0, 30.0, 72.0, 23.0, 176.0, 50.0, 14.0, 9.0]
2025-09-12 03:08:36,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 16 hours, 32 minutes, 11 seconds)
2025-09-12 03:20:41,889 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:20:41,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:21:43,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -94.57076 ± 149.127
2025-09-12 03:21:43,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-26.307745, -84.104645, -30.022528, -30.350945, 0.953968, -384.59033, -392.0792, 13.727473, 6.3523626, -19.286034]
2025-09-12 03:21:43,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [29.0, 50.0, 31.0, 26.0, 16.0, 1000.0, 1000.0, 15.0, 19.0, 34.0]
2025-09-12 03:21:43,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 16 hours, 18 minutes, 22 seconds)
2025-09-12 03:34:22,607 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:34:22,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:34:30,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -6.74856 ± 7.092
2025-09-12 03:34:30,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-11.285962, -6.659783, -7.416219, -7.1864977, 1.912583, -5.472103, -3.8174403, -23.009348, -9.060077, 4.5092254]
2025-09-12 03:34:30,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [12.0, 9.0, 16.0, 72.0, 17.0, 28.0, 10.0, 19.0, 12.0, 30.0]
2025-09-12 03:34:30,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 16 hours, 7 minutes, 16 seconds)
2025-09-12 03:46:32,387 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:46:32,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:46:39,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -3.46122 ± 12.289
2025-09-12 03:46:39,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-16.065989, -11.723791, -3.7801547, -1.7205034, -26.413887, -0.89171004, 3.6998746, 22.294485, 1.9929508, -2.00351]
2025-09-12 03:46:39,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [35.0, 25.0, 9.0, 12.0, 29.0, 12.0, 89.0, 20.0, 18.0, 26.0]
2025-09-12 03:46:40,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 16 hours, 5 minutes, 28 seconds)
2025-09-12 03:58:51,428 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:58:51,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:58:57,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -1.86150 ± 13.632
2025-09-12 03:58:57,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-9.0391655, -3.3424487, -13.673099, 31.412285, -1.6385462, -21.288807, -1.9003186, -9.165413, 0.9745207, 9.046006]
2025-09-12 03:58:57,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [24.0, 10.0, 23.0, 20.0, 13.0, 30.0, 11.0, 20.0, 16.0, 47.0]
2025-09-12 03:58:57,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 15 hours, 48 minutes, 57 seconds)
2025-09-12 04:11:16,234 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:11:16,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:11:50,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -33.90719 ± 87.210
2025-09-12 04:11:50,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-13.023781, -31.716091, -5.097173, 2.182112, 23.53433, -292.16058, -19.64002, 4.4108305, -2.9258232, -4.635691]
2025-09-12 04:11:50,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [16.0, 32.0, 21.0, 29.0, 35.0, 1000.0, 67.0, 20.0, 19.0, 12.0]
2025-09-12 04:11:50,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 15 hours, 48 minutes, 40 seconds)
2025-09-12 04:24:03,759 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:24:03,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:24:12,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -2.29551 ± 14.810
2025-09-12 04:24:12,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-28.0736, -8.807404, 4.816417, -7.282344, -7.1997266, 4.097924, 4.2238903, 2.023611, -17.104128, 30.350246]
2025-09-12 04:24:12,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [50.0, 15.0, 26.0, 12.0, 14.0, 20.0, 20.0, 30.0, 13.0, 46.0]
2025-09-12 04:24:12,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 15 hours, 24 minutes, 39 seconds)
2025-09-12 04:36:30,069 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:36:30,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:36:35,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 2.30070 ± 7.333
2025-09-12 04:36:35,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [4.506342, -1.3691112, 7.033809, 12.205502, -5.2934594, -4.2613792, 14.186967, 7.312601, -4.0690846, -7.2451935]
2025-09-12 04:36:35,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [12.0, 15.0, 28.0, 22.0, 13.0, 13.0, 23.0, 17.0, 12.0, 48.0]
2025-09-12 04:36:35,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 15 hours, 6 minutes, 31 seconds)
2025-09-12 04:48:45,296 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:48:45,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:48:51,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -6.81986 ± 13.018
2025-09-12 04:48:51,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-0.19910024, -13.55282, -6.9542108, -1.7222162, -2.3823984, -5.6348863, -19.15072, 13.629692, -36.30604, 4.074138]
2025-09-12 04:48:51,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [12.0, 13.0, 20.0, 15.0, 42.0, 10.0, 33.0, 17.0, 62.0, 14.0]
2025-09-12 04:48:51,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 14 hours, 55 minutes, 36 seconds)
2025-09-12 05:01:52,225 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:01:52,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:02:02,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -9.43006 ± 13.908
2025-09-12 05:02:02,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-18.989403, 4.1051435, -10.049962, 9.50574, 2.0998979, -4.7057204, -24.81139, -28.152466, 4.9093986, -28.21188]
2025-09-12 05:02:02,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [64.0, 61.0, 11.0, 53.0, 19.0, 28.0, 42.0, 66.0, 16.0, 38.0]
2025-09-12 05:02:03,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 14 hours, 55 minutes, 57 seconds)
2025-09-12 05:13:15,874 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:13:15,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:13:22,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -3.09085 ± 10.244
2025-09-12 05:13:22,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-8.6327305, -4.3529153, 2.5238776, -29.093315, 6.7470765, -1.2226695, 5.183423, -3.8177893, 8.247333, -6.490793]
2025-09-12 05:13:22,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [24.0, 15.0, 19.0, 29.0, 11.0, 15.0, 68.0, 10.0, 36.0, 13.0]
2025-09-12 05:13:22,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 14 hours, 21 minutes, 20 seconds)
2025-09-12 05:25:25,088 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:25:25,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:25:33,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -1.28921 ± 10.393
2025-09-12 05:25:33,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-6.3012857, -0.7834138, -10.831932, -2.0103266, 1.2951945, -4.5641837, 28.198847, -5.7708225, -8.371509, -3.7526424]
2025-09-12 05:25:33,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [24.0, 11.0, 56.0, 17.0, 27.0, 11.0, 27.0, 29.0, 11.0, 41.0]
2025-09-12 05:25:33,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 14 hours, 6 minutes, 44 seconds)
2025-09-12 05:37:40,778 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:37:40,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:38:18,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -17.29813 ± 54.045
2025-09-12 05:38:18,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-51.870216, -168.16591, 12.430931, 11.068795, 4.670053, -7.4528546, -4.0400596, 0.18840745, 0.882446, 29.30714]
2025-09-12 05:38:18,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [117.0, 1000.0, 56.0, 31.0, 10.0, 15.0, 21.0, 16.0, 34.0, 51.0]
2025-09-12 05:38:18,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 13 hours, 59 minutes, 15 seconds)
2025-09-12 05:51:23,502 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:51:23,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:51:56,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -14.41088 ± 69.605
2025-09-12 05:51:56,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-6.6070104, 10.216616, -10.139344, 27.863905, -216.52574, 47.554806, -7.749779, 14.78689, 4.0650864, -7.574262]
2025-09-12 05:51:56,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [29.0, 30.0, 12.0, 23.0, 1000.0, 32.0, 12.0, 24.0, 12.0, 11.0]
2025-09-12 05:51:56,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 14 hours, 5 minutes, 13 seconds)
2025-09-12 06:03:05,390 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:03:05,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:03:11,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -1.10715 ± 10.832
2025-09-12 06:03:11,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [19.976625, -11.595589, -12.314175, -2.2793617, 5.2255464, 14.117004, -2.0599074, -1.5971017, -5.234954, -15.309632]
2025-09-12 06:03:11,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [19.0, 11.0, 31.0, 9.0, 12.0, 52.0, 11.0, 49.0, 13.0, 19.0]
2025-09-12 06:03:11,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 13 hours, 27 minutes, 5 seconds)
2025-09-12 06:16:15,182 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:16:15,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:16:23,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 2.85867 ± 12.192
2025-09-12 06:16:23,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [10.55853, 4.242866, -0.4438336, 10.62681, -7.8187175, 18.575775, -6.5185843, 18.310568, -22.931087, 3.984335]
2025-09-12 06:16:23,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [22.0, 16.0, 12.0, 32.0, 41.0, 29.0, 16.0, 46.0, 29.0, 12.0]
2025-09-12 06:16:23,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 13 hours, 39 minutes, 18 seconds)
2025-09-12 06:28:06,289 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:28:06,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:28:11,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 1.29245 ± 8.284
2025-09-12 06:28:11,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-3.0790577, 2.594132, 14.334371, -8.004371, -7.9045076, -0.85291725, 18.315811, -3.8812187, 1.8079967, -0.40569073]
2025-09-12 06:28:11,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [14.0, 16.0, 14.0, 12.0, 19.0, 16.0, 32.0, 12.0, 10.0, 14.0]
2025-09-12 06:28:11,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 13 hours, 21 minutes, 41 seconds)
2025-09-12 06:40:04,524 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:40:04,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:40:11,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 8.72230 ± 21.663
2025-09-12 06:40:11,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [4.2992096, -8.860153, -1.8382951, 71.092636, -2.2930298, -3.9145532, 0.94783574, 8.5863495, 6.4658484, 12.7371855]
2025-09-12 06:40:11,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [15.0, 14.0, 11.0, 60.0, 14.0, 31.0, 18.0, 15.0, 33.0, 36.0]
2025-09-12 06:40:11,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (8.72) for latency MM1Queue_a033_s075
2025-09-12 06:40:11,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 12 hours, 59 minutes, 44 seconds)
2025-09-12 06:52:16,662 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:52:16,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:52:51,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -8.70740 ± 51.384
2025-09-12 06:52:51,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [0.23190954, 20.3843, -2.3015833, -157.74998, 5.4481454, -16.924303, 20.72127, 9.927847, 1.4216747, 31.766705]
2025-09-12 06:52:51,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [32.0, 41.0, 12.0, 1000.0, 23.0, 31.0, 25.0, 20.0, 21.0, 37.0]
2025-09-12 06:52:51,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 12 hours, 35 minutes, 26 seconds)
2025-09-12 07:05:48,488 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:05:48,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:05:53,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 2.47164 ± 10.751
2025-09-12 07:05:53,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [0.6279458, 28.469381, 3.1532788, -4.21146, 4.6435747, -4.99543, -6.7701364, -9.653204, -0.42719018, 13.879685]
2025-09-12 07:05:53,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [12.0, 24.0, 14.0, 9.0, 11.0, 13.0, 24.0, 25.0, 10.0, 41.0]
2025-09-12 07:05:53,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 12 hours, 44 minutes, 55 seconds)
2025-09-12 07:17:22,911 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:17:22,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:18:23,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -43.98220 ± 99.370
2025-09-12 07:18:23,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [28.844486, -16.476742, 7.337176, 4.0436864, 16.258043, 1.5039755, -5.883431, -255.44789, 6.532108, -226.53336]
2025-09-12 07:18:23,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [37.0, 22.0, 30.0, 15.0, 34.0, 15.0, 13.0, 1000.0, 22.0, 1000.0]
2025-09-12 07:18:23,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 12 hours, 23 minutes, 57 seconds)
2025-09-12 07:30:26,552 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:30:26,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:30:36,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -5.48859 ± 30.090
2025-09-12 07:30:36,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [2.2561946, -34.802605, 54.383183, 1.24712, 4.350007, -7.042579, -70.96549, -7.0841494, -1.8826199, 4.6550164]
2025-09-12 07:30:36,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [18.0, 25.0, 68.0, 20.0, 27.0, 12.0, 87.0, 19.0, 11.0, 15.0]
2025-09-12 07:30:36,673 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 12 hours, 16 minutes, 32 seconds)
2025-09-12 07:43:53,662 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:43:53,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:44:06,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 1.38399 ± 7.301
2025-09-12 07:44:06,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-5.061898, 12.079994, 2.957934, -2.8610363, -14.20418, -2.8887272, 8.37633, 6.7743144, 4.3044133, 4.362799]
2025-09-12 07:44:06,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [30.0, 52.0, 18.0, 13.0, 68.0, 20.0, 19.0, 20.0, 27.0, 110.0]
2025-09-12 07:44:06,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 12 hours, 21 minutes, 27 seconds)
2025-09-12 07:55:01,781 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:55:01,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:55:18,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 16.12156 ± 36.941
2025-09-12 07:55:18,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-3.362748, 40.163578, 8.881312, -3.3326445, -17.213713, 14.835984, -11.47472, 16.468695, 116.66707, -0.41720933]
2025-09-12 07:55:18,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [61.0, 47.0, 26.0, 21.0, 27.0, 24.0, 23.0, 75.0, 106.0, 78.0]
2025-09-12 07:55:18,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (16.12) for latency MM1Queue_a033_s075
2025-09-12 07:55:18,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 11 hours, 51 minutes, 51 seconds)
2025-09-12 08:07:29,674 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:07:29,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:07:37,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 16.96073 ± 23.615
2025-09-12 08:07:37,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [40.265774, 70.17587, 4.8678255, -4.7827826, 4.3516183, -1.9775223, 39.43592, 6.60134, 14.885471, -4.2161975]
2025-09-12 08:07:37,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [45.0, 55.0, 14.0, 9.0, 13.0, 34.0, 48.0, 11.0, 29.0, 18.0]
2025-09-12 08:07:37,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (16.96) for latency MM1Queue_a033_s075
2025-09-12 08:07:37,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 11 hours, 31 minutes, 21 seconds)
2025-09-12 08:19:53,282 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:19:53,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:20:07,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 6.59733 ± 23.338
2025-09-12 08:20:07,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-26.545786, 36.478165, 41.713314, 6.596961, 2.4146767, -23.675983, 4.0369554, -0.4805431, 36.763954, -11.328381]
2025-09-12 08:20:07,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [64.0, 51.0, 34.0, 87.0, 29.0, 22.0, 31.0, 10.0, 55.0, 22.0]
2025-09-12 08:20:07,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 11 hours, 18 minutes, 58 seconds)
2025-09-12 08:32:50,468 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:32:50,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:33:00,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 9.41997 ± 18.601
2025-09-12 08:33:00,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-11.994461, 6.5911436, 2.7939732, -11.011022, 55.41845, 5.2271376, -2.5464299, 8.648798, 22.492632, 18.579512]
2025-09-12 08:33:00,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [21.0, 32.0, 38.0, 10.0, 78.0, 27.0, 15.0, 16.0, 44.0, 18.0]
2025-09-12 08:33:00,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 11 hours, 13 minutes, 54 seconds)
2025-09-12 08:44:45,175 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:44:45,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:44:54,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 16.84698 ± 18.811
2025-09-12 08:44:54,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [4.7009115, 36.019035, -0.4072627, 2.7272506, 0.9891729, 4.3893237, 3.6364126, 26.50127, 56.61117, 33.302525]
2025-09-12 08:44:54,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [38.0, 35.0, 23.0, 14.0, 17.0, 16.0, 12.0, 36.0, 119.0, 29.0]
2025-09-12 08:44:54,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 10 hours, 44 minutes, 31 seconds)
2025-09-12 08:57:20,464 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:57:20,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:57:26,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 6.67805 ± 16.323
2025-09-12 08:57:26,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [6.255182, 21.106232, 45.463116, 13.192779, -13.146134, -12.812465, 3.7626402, -2.5123138, 0.23890455, 5.232567]
2025-09-12 08:57:26,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [15.0, 22.0, 33.0, 16.0, 31.0, 21.0, 14.0, 16.0, 27.0, 21.0]
2025-09-12 08:57:26,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 10 hours, 46 minutes, 11 seconds)
2025-09-12 09:09:39,518 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:09:39,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:10:24,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 1.29916 ± 96.942
2025-09-12 09:10:24,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-1.1521227, 23.814554, 23.668545, -268.896, -15.247111, 55.924255, 58.66246, 23.382431, 114.92789, -2.0933115]
2025-09-12 09:10:24,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [15.0, 23.0, 46.0, 1000.0, 19.0, 60.0, 30.0, 25.0, 65.0, 19.0]
2025-09-12 09:10:24,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 10 hours, 40 minutes, 25 seconds)
2025-09-12 09:22:42,106 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:22:42,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:22:51,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 11.90120 ± 18.122
2025-09-12 09:22:51,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-3.2548957, 15.268609, 37.691883, 47.92036, 3.9144042, 24.96194, -8.344455, 4.848197, -1.8077643, -2.1863263]
2025-09-12 09:22:51,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [13.0, 18.0, 76.0, 64.0, 24.0, 28.0, 30.0, 36.0, 39.0, 19.0]
2025-09-12 09:22:51,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 10 hours, 27 minutes, 25 seconds)
2025-09-12 09:35:05,653 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:35:05,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:35:13,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 10.21726 ± 24.737
2025-09-12 09:35:13,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-22.077879, 2.361727, -17.299284, 4.370271, 17.056221, 10.492712, 51.90567, 56.985973, 1.1756276, -2.798422]
2025-09-12 09:35:13,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 16.0, 18.0, 13.0, 23.0, 13.0, 32.0, 47.0, 15.0, 26.0]
2025-09-12 09:35:13,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 10 hours, 9 minutes, 43 seconds)
2025-09-12 09:48:34,690 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:48:34,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:48:42,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 17.79963 ± 24.066
2025-09-12 09:48:42,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [56.354458, 16.795954, 0.3177639, 64.34668, -5.362734, -7.3044395, 7.1891723, -3.6025102, 25.421082, 23.840876]
2025-09-12 09:48:42,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 29.0, 10.0, 45.0, 10.0, 34.0, 44.0, 15.0, 33.0, 31.0]
2025-09-12 09:48:42,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (17.80) for latency MM1Queue_a033_s075
2025-09-12 09:48:42,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 10 hours, 12 minutes, 28 seconds)
2025-09-12 09:59:50,084 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:59:50,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:59:59,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 6.23754 ± 33.858
2025-09-12 09:59:59,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-32.003292, 10.415179, -40.39227, 3.26071, 5.089589, 5.5573664, 91.5786, -3.9027882, 24.799252, -2.0269983]
2025-09-12 09:59:59,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [38.0, 21.0, 62.0, 14.0, 15.0, 41.0, 62.0, 16.0, 69.0, 15.0]
2025-09-12 09:59:59,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 9 hours, 48 minutes)
2025-09-12 10:13:28,217 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:13:28,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:13:39,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 8.80839 ± 20.414
2025-09-12 10:13:39,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-12.725767, 9.623459, 2.6398363, 20.330128, 26.889885, 41.912773, 24.999979, -30.268309, 13.215697, -8.5338]
2025-09-12 10:13:39,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [31.0, 10.0, 14.0, 84.0, 19.0, 32.0, 59.0, 119.0, 24.0, 19.0]
2025-09-12 10:13:39,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 9 hours, 41 minutes, 54 seconds)
2025-09-12 10:25:56,009 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:25:56,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:26:04,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 3.91706 ± 17.204
2025-09-12 10:26:04,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-3.4561403, 15.367027, -10.00914, -5.855439, -6.1088743, -5.3918867, -0.24923237, 6.6598797, 51.103104, -2.8887193]
2025-09-12 10:26:04,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [25.0, 24.0, 23.0, 24.0, 25.0, 87.0, 38.0, 15.0, 40.0, 11.0]
2025-09-12 10:26:04,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 9 hours, 28 minutes, 57 seconds)
2025-09-12 10:38:13,968 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:38:13,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:38:21,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 3.72184 ± 18.882
2025-09-12 10:38:21,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [7.4683127, -4.692147, 5.3618913, -32.82695, -3.4251208, 17.590387, -10.55247, 40.13928, 21.548382, -3.393187]
2025-09-12 10:38:21,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [32.0, 13.0, 22.0, 31.0, 9.0, 30.0, 15.0, 48.0, 44.0, 18.0]
2025-09-12 10:38:21,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 9 hours, 15 minutes, 30 seconds)
2025-09-12 10:49:39,050 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:49:39,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:49:47,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 13.51113 ± 25.848
2025-09-12 10:49:47,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [11.493791, -12.748211, 23.044027, -11.097486, -3.7683785, 82.41568, 10.971607, 1.7748648, 22.930923, 10.094486]
2025-09-12 10:49:47,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [62.0, 19.0, 43.0, 15.0, 22.0, 64.0, 18.0, 12.0, 41.0, 28.0]
2025-09-12 10:49:47,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 8 hours, 45 minutes, 22 seconds)
2025-09-12 11:01:57,658 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:01:57,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:02:08,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 26.40415 ± 25.315
2025-09-12 11:02:08,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [9.708737, 57.13845, 29.111895, 5.7501893, 43.25922, 14.372351, 79.77886, 8.3018675, 23.977453, -7.3574767]
2025-09-12 11:02:08,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [19.0, 49.0, 32.0, 25.0, 50.0, 24.0, 70.0, 21.0, 21.0, 13.0]
2025-09-12 11:02:08,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (26.40) for latency MM1Queue_a033_s075
2025-09-12 11:02:08,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 8 hours, 42 minutes, 4 seconds)
2025-09-12 11:14:29,763 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:14:29,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:14:38,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 10.28458 ± 16.382
2025-09-12 11:14:38,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-10.958911, 23.782835, -1.6835244, -7.603961, 1.0032349, 23.82829, 29.698433, 21.164635, -8.41007, 32.02487]
2025-09-12 11:14:38,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [16.0, 43.0, 11.0, 15.0, 11.0, 38.0, 33.0, 20.0, 44.0, 34.0]
2025-09-12 11:14:38,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 20 minutes, 5 seconds)
2025-09-12 11:27:37,038 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:27:37,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:27:45,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 18.37745 ± 27.994
2025-09-12 11:27:45,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [29.951242, 23.565521, -0.037312314, 92.46844, -5.801714, 1.3574405, 0.09110583, 6.284243, 2.263366, 33.63221]
2025-09-12 11:27:45,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [42.0, 20.0, 16.0, 102.0, 20.0, 12.0, 20.0, 16.0, 14.0, 58.0]
2025-09-12 11:27:45,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 8 hours, 13 minutes, 29 seconds)
2025-09-12 11:39:18,371 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:39:18,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:40:02,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -22.75628 ± 99.202
2025-09-12 11:40:02,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-7.776751, 10.934873, 13.605692, 9.728655, 17.076284, 23.222073, -318.8348, -1.7905544, 25.257372, 1.0144147]
2025-09-12 11:40:02,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [16.0, 41.0, 14.0, 25.0, 49.0, 27.0, 1000.0, 13.0, 31.0, 76.0]
2025-09-12 11:40:02,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 1 minute, 12 seconds)
2025-09-12 11:52:26,954 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:52:26,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:52:36,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 6.55316 ± 11.702
2025-09-12 11:52:36,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-0.6067884, 10.455322, 6.20054, 3.4140086, -7.69873, -3.190537, 17.564287, 26.42449, -8.98489, 21.953878]
2025-09-12 11:52:36,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [14.0, 58.0, 14.0, 18.0, 24.0, 25.0, 40.0, 94.0, 33.0, 17.0]
2025-09-12 11:52:36,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 7 hours, 57 minutes, 19 seconds)
2025-09-12 12:04:55,932 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:04:55,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:05:03,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 16.73721 ± 17.706
2025-09-12 12:05:03,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [5.4975567, 47.222084, -14.745419, 38.774727, 1.0380511, 14.92276, 21.010302, 3.7759066, 28.934433, 20.941717]
2025-09-12 12:05:03,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [23.0, 31.0, 51.0, 31.0, 15.0, 18.0, 34.0, 16.0, 26.0, 30.0]
2025-09-12 12:05:03,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 7 hours, 45 minutes, 32 seconds)
2025-09-12 12:17:44,870 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:17:44,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:17:55,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 26.30226 ± 39.098
2025-09-12 12:17:55,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-3.1151586, 22.638618, -0.96129173, -4.659395, 80.51631, 45.777153, 93.45861, -22.366028, 61.280373, -9.546617]
2025-09-12 12:17:55,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [10.0, 29.0, 12.0, 30.0, 99.0, 48.0, 61.0, 30.0, 54.0, 22.0]
2025-09-12 12:17:55,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 35 minutes, 38 seconds)
2025-09-12 12:29:42,873 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:29:42,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:29:56,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 14.61232 ± 28.121
2025-09-12 12:29:56,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-7.8609, 59.376, 5.9284577, 56.989902, 12.791695, 35.839848, -32.902355, -8.701043, 23.78432, 0.87731814]
2025-09-12 12:29:56,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [18.0, 66.0, 33.0, 44.0, 17.0, 40.0, 76.0, 29.0, 46.0, 45.0]
2025-09-12 12:29:56,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 15 minutes, 17 seconds)
2025-09-12 12:42:07,976 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:42:07,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:42:17,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 22.11787 ± 19.984
2025-09-12 12:42:17,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [15.554622, 48.103462, 15.972249, 53.28391, 44.76532, 10.609449, 0.79321986, 33.32447, -2.5021687, 1.2741566]
2025-09-12 12:42:17,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [28.0, 34.0, 21.0, 103.0, 60.0, 26.0, 9.0, 33.0, 14.0, 12.0]
2025-09-12 12:42:17,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 3 minutes, 13 seconds)
2025-09-12 12:54:32,252 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:54:32,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:54:45,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 21.94376 ± 27.105
2025-09-12 12:54:45,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-1.9697117, 40.759033, 9.219691, -3.431613, 17.578571, 38.822598, -12.849517, 79.394844, 45.547207, 6.366457]
2025-09-12 12:54:45,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [21.0, 101.0, 25.0, 101.0, 55.0, 41.0, 38.0, 53.0, 45.0, 17.0]
2025-09-12 12:54:45,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 6 hours, 50 minutes, 15 seconds)
2025-09-12 13:06:47,955 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:06:47,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:07:04,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 28.79281 ± 20.714
2025-09-12 13:07:04,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-8.7253065, 59.3484, 32.67877, 16.0582, 8.084817, 43.75038, 34.928455, 31.323637, 57.5178, 12.96293]
2025-09-12 13:07:04,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [11.0, 72.0, 62.0, 14.0, 34.0, 66.0, 60.0, 40.0, 114.0, 17.0]
2025-09-12 13:07:04,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (28.79) for latency MM1Queue_a033_s075
2025-09-12 13:07:04,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 6 hours, 36 minutes, 55 seconds)
2025-09-12 13:20:07,184 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:20:07,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:20:16,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 10.61421 ± 9.930
2025-09-12 13:20:16,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [15.361076, 0.06319046, 6.0295, 22.412241, 14.477881, 27.376398, -4.137439, 13.575037, -2.1611016, 13.145347]
2025-09-12 13:20:16,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [55.0, 29.0, 13.0, 76.0, 19.0, 60.0, 24.0, 20.0, 17.0, 21.0]
2025-09-12 13:20:16,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 26 minutes, 31 seconds)
2025-09-12 13:32:11,653 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:32:11,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:32:25,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 37.37547 ± 43.252
2025-09-12 13:32:25,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [15.812742, 12.8297825, 48.010796, 113.38278, 118.901375, 19.198051, 47.850533, -12.319794, 12.226038, -2.137617]
2025-09-12 13:32:25,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [67.0, 50.0, 44.0, 81.0, 111.0, 29.0, 55.0, 37.0, 25.0, 9.0]
2025-09-12 13:32:25,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (37.38) for latency MM1Queue_a033_s075
2025-09-12 13:32:25,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 14 minutes, 52 seconds)
2025-09-12 13:44:01,905 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:44:01,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:44:09,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 8.43202 ± 16.022
2025-09-12 13:44:09,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [15.034905, -4.5109334, 8.126835, 3.6065102, 41.563976, 3.1277378, -11.928523, -9.818689, 9.27984, 29.838535]
2025-09-12 13:44:09,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [15.0, 12.0, 28.0, 27.0, 55.0, 13.0, 25.0, 46.0, 16.0, 42.0]
2025-09-12 13:44:09,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 5 hours, 58 minutes, 51 seconds)
2025-09-12 13:56:38,554 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:56:38,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:56:47,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 17.96037 ± 22.700
2025-09-12 13:56:47,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [61.86167, 30.35358, 14.555393, -24.673737, 0.9716926, 34.020367, 8.029148, 37.813988, 5.314845, 11.356799]
2025-09-12 13:56:47,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [64.0, 51.0, 48.0, 34.0, 15.0, 44.0, 14.0, 30.0, 19.0, 26.0]
2025-09-12 13:56:47,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 5 hours, 47 minutes, 23 seconds)
2025-09-12 14:09:00,637 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:09:00,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:09:12,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 25.97256 ± 23.325
2025-09-12 14:09:12,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [33.059605, 9.78318, 21.019718, 1.9572535, 86.66077, 30.720476, 2.0381086, 35.76249, 25.083452, 13.640526]
2025-09-12 14:09:12,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [32.0, 57.0, 27.0, 14.0, 98.0, 30.0, 35.0, 39.0, 50.0, 34.0]
2025-09-12 14:09:12,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 35 minutes, 28 seconds)
2025-09-12 14:22:13,342 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:22:13,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:22:25,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 13.98503 ± 14.086
2025-09-12 14:22:25,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [42.651558, 2.3824975, 8.406705, 4.5232244, 37.43671, 11.129858, -2.4603393, 6.7320614, 18.082943, 10.965107]
2025-09-12 14:22:25,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [29.0, 12.0, 25.0, 30.0, 121.0, 19.0, 30.0, 71.0, 22.0, 13.0]
2025-09-12 14:22:25,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 23 minutes, 14 seconds)
2025-09-12 14:34:41,526 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:34:41,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:34:50,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 23.72656 ± 16.502
2025-09-12 14:34:50,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [46.094547, 22.312986, 49.02349, 29.835804, 8.300118, 40.12427, 6.2049685, 15.923809, 21.483507, -2.037932]
2025-09-12 14:34:50,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [48.0, 39.0, 65.0, 31.0, 19.0, 32.0, 15.0, 27.0, 39.0, 14.0]
2025-09-12 14:34:50,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 12 minutes, 4 seconds)
2025-09-12 14:46:17,245 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:46:17,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:46:25,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 26.14272 ± 28.886
2025-09-12 14:46:25,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-0.91784513, -5.0272794, 15.469558, 72.7977, -2.6886892, 42.930584, 8.421063, 42.368816, 75.80646, 12.266859]
2025-09-12 14:46:25,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [17.0, 10.0, 37.0, 53.0, 12.0, 63.0, 15.0, 43.0, 40.0, 19.0]
2025-09-12 14:46:25,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 4 hours, 58 minutes, 54 seconds)
2025-09-12 14:58:46,440 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:58:46,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:58:59,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 28.81600 ± 26.670
2025-09-12 14:58:59,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [12.505305, 35.88793, 97.58257, 20.590828, 0.75867176, 31.36696, 23.656363, 47.37209, 3.4053671, 15.033946]
2025-09-12 14:58:59,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [31.0, 37.0, 97.0, 24.0, 14.0, 25.0, 53.0, 71.0, 13.0, 29.0]
2025-09-12 14:58:59,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 46 minutes, 6 seconds)
2025-09-12 15:11:15,239 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:11:15,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:11:23,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 21.43312 ± 27.237
2025-09-12 15:11:23,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [88.991264, 36.087864, 9.665872, 10.773296, 8.973712, -5.147345, 48.049526, 9.871298, -1.0853368, 8.151114]
2025-09-12 15:11:23,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [50.0, 47.0, 23.0, 22.0, 19.0, 29.0, 59.0, 18.0, 13.0, 17.0]
2025-09-12 15:11:23,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 33 minutes, 37 seconds)
2025-09-12 15:24:43,708 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:24:43,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:24:51,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 14.01615 ± 19.858
2025-09-12 15:24:51,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [6.449337, 13.452625, 20.426699, 2.0600588, 18.833757, 63.638836, 23.425512, -10.844052, -8.37076, 11.089528]
2025-09-12 15:24:51,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [21.0, 40.0, 23.0, 25.0, 33.0, 54.0, 31.0, 17.0, 43.0, 18.0]
2025-09-12 15:24:52,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 22 minutes, 13 seconds)
2025-09-12 15:36:06,150 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:36:06,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:36:45,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -3.74629 ± 99.031
2025-09-12 15:36:45,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [29.554506, 5.970821, 1.2582729, -10.486584, 60.846905, 48.926502, 20.689156, -287.73795, 4.9225106, 88.593]
2025-09-12 15:36:45,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [46.0, 18.0, 14.0, 13.0, 43.0, 87.0, 38.0, 1000.0, 39.0, 104.0]
2025-09-12 15:36:45,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 7 minutes, 38 seconds)
2025-09-12 15:49:03,886 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:49:03,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:49:14,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 30.86079 ± 22.409
2025-09-12 15:49:14,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [64.84799, 63.717686, 46.079327, -3.1813965, 34.49017, 33.34496, 36.221798, 18.814455, 13.37056, 0.9023278]
2025-09-12 15:49:14,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [30.0, 35.0, 67.0, 33.0, 51.0, 66.0, 35.0, 37.0, 30.0, 17.0]
2025-09-12 15:49:14,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 3 hours, 58 minutes, 42 seconds)
2025-09-12 16:01:31,150 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:01:31,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:02:50,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -54.11819 ± 151.182
2025-09-12 16:02:50,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [41.417206, 88.79388, -1.8912137, 21.468723, -367.41907, 24.596127, -8.957309, -9.57999, 5.0291147, -334.63934]
2025-09-12 16:02:50,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [68.0, 115.0, 10.0, 23.0, 1000.0, 27.0, 15.0, 12.0, 14.0, 1000.0]
2025-09-12 16:02:50,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 49 minutes, 49 seconds)
2025-09-12 16:14:59,817 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:14:59,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:15:37,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -6.85994 ± 93.788
2025-09-12 16:15:37,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-14.224955, 21.68542, 34.77616, 46.971546, 37.58327, -282.08276, 49.822845, 1.8296472, 6.0975285, 28.941872]
2025-09-12 16:15:37,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [23.0, 29.0, 41.0, 53.0, 44.0, 1000.0, 62.0, 16.0, 15.0, 71.0]
2025-09-12 16:15:37,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 38 minutes, 25 seconds)
2025-09-12 16:27:53,912 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:27:53,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:28:13,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 55.42917 ± 54.726
2025-09-12 16:28:13,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [158.08455, 12.794327, -13.300005, 77.1572, -5.669942, 30.489388, 109.61263, 28.37011, 119.12987, 37.62352]
2025-09-12 16:28:13,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [136.0, 30.0, 30.0, 81.0, 17.0, 35.0, 121.0, 19.0, 82.0, 22.0]
2025-09-12 16:28:13,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (55.43) for latency MM1Queue_a033_s075
2025-09-12 16:28:13,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 22 minutes, 44 seconds)
2025-09-12 16:40:38,285 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:40:38,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:40:46,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 25.76929 ± 43.836
2025-09-12 16:40:46,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [13.12138, 133.10207, 4.592577, 31.18936, -18.485453, 0.06766649, 20.397686, 78.60748, -6.042157, 1.1423004]
2025-09-12 16:40:46,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 58.0, 18.0, 35.0, 28.0, 14.0, 41.0, 64.0, 10.0, 22.0]
2025-09-12 16:40:46,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 12 minutes, 5 seconds)
2025-09-12 16:53:02,867 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:53:02,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:53:13,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 45.04993 ± 28.509
2025-09-12 16:53:13,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [80.987, 36.46987, 13.004271, 26.117825, 90.63799, 20.359575, 70.5709, 59.347927, 2.5690787, 50.434803]
2025-09-12 16:53:13,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [77.0, 25.0, 14.0, 22.0, 49.0, 27.0, 63.0, 44.0, 14.0, 50.0]
2025-09-12 16:53:13,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 59 minutes, 8 seconds)
2025-09-12 17:05:29,840 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:05:29,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:05:40,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 36.38107 ± 51.272
2025-09-12 17:05:40,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [8.285579, 16.48297, 181.69609, 51.82898, 12.694805, 8.690595, 44.412033, 20.496193, 27.914188, -8.690665]
2025-09-12 17:05:40,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [16.0, 16.0, 108.0, 55.0, 53.0, 16.0, 49.0, 22.0, 36.0, 14.0]
2025-09-12 17:05:40,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 43 minutes, 22 seconds)
2025-09-12 17:18:01,221 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:18:01,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:18:19,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 72.69731 ± 67.263
2025-09-12 17:18:19,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [104.69028, 47.124588, 23.024738, 77.22194, 6.0870743, 100.38809, 100.52547, 48.44885, -16.301653, 235.76378]
2025-09-12 17:18:19,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [72.0, 65.0, 36.0, 65.0, 29.0, 74.0, 67.0, 57.0, 40.0, 145.0]
2025-09-12 17:18:19,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (72.70) for latency MM1Queue_a033_s075
2025-09-12 17:18:19,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 30 minutes, 27 seconds)
2025-09-12 17:30:30,565 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:30:30,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:31:21,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: -0.72088 ± 129.154
2025-09-12 17:31:21,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [65.9888, 29.263945, 27.720638, 24.331831, 50.20839, -382.88098, 8.045432, 84.13085, 54.912407, 31.069916]
2025-09-12 17:31:21,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [60.0, 24.0, 41.0, 45.0, 54.0, 1000.0, 32.0, 128.0, 47.0, 37.0]
2025-09-12 17:31:21,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 18 minutes, 52 seconds)
2025-09-12 17:43:42,409 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:43:42,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:43:54,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 35.59553 ± 22.155
2025-09-12 17:43:54,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [47.337353, 55.625072, 42.063053, 30.504313, 27.867598, 77.368095, 20.10844, 47.842808, -1.9574604, 9.195979]
2025-09-12 17:43:54,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [54.0, 56.0, 28.0, 38.0, 28.0, 53.0, 16.0, 38.0, 15.0, 28.0]
2025-09-12 17:43:54,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 6 minutes, 14 seconds)
2025-09-12 17:57:06,795 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:57:06,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:57:22,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 54.34434 ± 22.614
2025-09-12 17:57:22,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [37.291122, 63.70047, 105.50254, 21.518154, 39.9242, 68.91202, 37.289185, 45.40982, 53.753178, 70.14269]
2025-09-12 17:57:22,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [45.0, 63.0, 91.0, 19.0, 35.0, 103.0, 52.0, 37.0, 62.0, 66.0]
2025-09-12 17:57:22,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 55 minutes, 28 seconds)
2025-09-12 18:08:59,636 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:08:59,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:09:17,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 76.13008 ± 110.112
2025-09-12 18:09:17,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [384.25076, -1.6583381, 125.53765, 15.641594, 50.236557, 6.348181, 36.06894, 104.17801, 19.468872, 21.22855]
2025-09-12 18:09:17,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [267.0, 15.0, 88.0, 16.0, 57.0, 12.0, 41.0, 85.0, 19.0, 51.0]
2025-09-12 18:09:17,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1226 [INFO]: New best (76.13) for latency MM1Queue_a033_s075
2025-09-12 18:09:18,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 41 minutes, 47 seconds)
2025-09-12 18:21:33,885 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:21:33,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:21:52,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 49.54232 ± 49.882
2025-09-12 18:21:52,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [97.33234, 8.958761, 159.67421, 82.2596, 14.250211, 47.061314, 56.37728, 39.302322, 9.610594, -19.403378]
2025-09-12 18:21:52,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [66.0, 72.0, 96.0, 61.0, 39.0, 26.0, 79.0, 51.0, 18.0, 50.0]
2025-09-12 18:21:53,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 28 minutes, 59 seconds)
2025-09-12 18:34:10,868 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:34:10,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:34:21,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 25.94070 ± 18.329
2025-09-12 18:34:21,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [45.138447, 8.254216, 41.315117, 14.019704, 28.538033, -0.059669796, 39.627457, 5.7913184, 19.500816, 57.281586]
2025-09-12 18:34:21,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 58.0, 42.0, 29.0, 45.0, 27.0, 57.0, 30.0, 19.0, 47.0]
2025-09-12 18:34:21,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 15 minutes, 36 seconds)
2025-09-12 18:47:08,413 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:47:08,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:47:22,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 55.36136 ± 63.329
2025-09-12 18:47:22,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [136.2893, 21.776934, 11.656, 57.76372, 35.240932, 3.497871, 10.932997, 14.798521, 209.59314, 52.064133]
2025-09-12 18:47:22,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [90.0, 42.0, 21.0, 56.0, 73.0, 17.0, 17.0, 21.0, 147.0, 44.0]
2025-09-12 18:47:22,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 3 minutes, 28 seconds)
2025-09-12 18:59:11,988 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:59:11,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:00:02,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 26.55909 ± 89.626
2025-09-12 19:00:02,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [49.158466, 39.92351, 36.538548, 30.882656, -4.9154, -16.897526, 59.83344, 202.4755, -183.09973, 51.6914]
2025-09-12 19:00:02,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [52.0, 74.0, 63.0, 34.0, 18.0, 21.0, 31.0, 108.0, 1000.0, 62.0]
2025-09-12 19:00:02,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 50 minutes, 7 seconds)
2025-09-12 19:12:13,224 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:12:13,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:12:23,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 42.55184 ± 46.143
2025-09-12 19:12:23,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [58.4611, 12.116023, 30.548048, -2.154952, -7.972106, 151.1303, 16.655312, 61.741283, 17.920094, 87.0733]
2025-09-12 19:12:23,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 15.0, 36.0, 39.0, 34.0, 105.0, 28.0, 37.0, 19.0, 49.0]
2025-09-12 19:12:23,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 37 minutes, 51 seconds)
2025-09-12 19:24:45,255 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:24:45,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:25:00,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 41.28952 ± 41.935
2025-09-12 19:25:00,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [-10.789036, 25.072517, 126.307144, 21.168972, 97.869774, 41.605736, -10.600907, 28.543974, 67.57985, 26.137173]
2025-09-12 19:25:00,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [16.0, 25.0, 135.0, 46.0, 85.0, 46.0, 73.0, 30.0, 73.0, 35.0]
2025-09-12 19:25:00,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 25 minutes, 15 seconds)
2025-09-12 19:37:20,494 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:37:20,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:37:41,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 60.17756 ± 63.046
2025-09-12 19:37:41,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [135.18289, -26.471712, 46.6172, 5.683515, 15.007389, 23.05176, 19.99601, 90.1712, 183.03815, 109.49922]
2025-09-12 19:37:41,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [110.0, 89.0, 92.0, 34.0, 101.0, 43.0, 20.0, 45.0, 140.0, 88.0]
2025-09-12 19:37:41,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 12 minutes, 39 seconds)
2025-09-12 19:49:58,739 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:49:58,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:50:16,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1221 [DEBUG]: Total Reward: 61.91213 ± 63.494
2025-09-12 19:50:16,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1222 [DEBUG]: All rewards: [6.1032267, 165.74397, 13.105254, 24.167604, 112.26158, 12.5374365, 25.43462, 63.732452, 15.258793, 180.77641]
2025-09-12 19:50:16,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [17.0, 159.0, 31.0, 38.0, 85.0, 30.0, 28.0, 91.0, 24.0, 142.0]
2025-09-12 19:50:16,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-ant):1251 [DEBUG]: Training session finished
