2025-09-11 21:58:45,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc10-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-09-11 21:58:45,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc10-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-09-11 21:58:45,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x15015790a750>}
2025-09-11 21:58:45,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1111 [DEBUG]: using device: cuda
2025-09-11 21:58:45,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1133 [INFO]: Creating new trainer
2025-09-11 21:58:45,472 baseline-mbpac-noiseperc10-ant:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-09-11 21:58:45,472 baseline-mbpac-noiseperc10-ant:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 21:58:45,482 baseline-mbpac-noiseperc10-ant:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=27, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=8, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-11 21:58:46,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1194 [DEBUG]: Starting training session...
2025-09-11 21:58:46,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 1/100
2025-09-11 22:09:31,915 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:09:31,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 22:11:17,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: -144.92813 ± 129.641
2025-09-11 22:11:17,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [-11.613465, -83.56919, -112.58272, -380.55344, -54.238194, -282.23648, -332.36188, -136.05978, -14.830281, -41.235867]
2025-09-11 22:11:17,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [16.0, 206.0, 311.0, 1000.0, 169.0, 572.0, 1000.0, 282.0, 64.0, 97.0]
2025-09-11 22:11:17,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (-144.93) for latency MM1Queue_a033_s075
2025-09-11 22:11:17,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 20 hours, 39 minutes, 43 seconds)
2025-09-11 22:24:08,654 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:24:08,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 22:26:08,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: -24.20899 ± 86.648
2025-09-11 22:26:08,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [17.598036, -45.472534, -112.18105, -161.77132, -145.23079, 89.082794, 8.426711, 101.54275, 16.74104, -10.8255625]
2025-09-11 22:26:08,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [32.0, 89.0, 1000.0, 1000.0, 1000.0, 263.0, 33.0, 530.0, 108.0, 95.0]
2025-09-11 22:26:08,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (-24.21) for latency MM1Queue_a033_s075
2025-09-11 22:26:08,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 22 hours, 21 minutes, 14 seconds)
2025-09-11 22:37:41,408 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:37:41,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 22:39:43,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 33.92288 ± 68.633
2025-09-11 22:39:43,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [-36.993904, 65.07199, -56.59629, -6.331116, 14.935427, 187.36937, 108.43643, -8.063973, 41.364906, 30.035984]
2025-09-11 22:39:43,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [181.0, 179.0, 355.0, 1000.0, 169.0, 751.0, 239.0, 1000.0, 351.0, 29.0]
2025-09-11 22:39:43,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (33.92) for latency MM1Queue_a033_s075
2025-09-11 22:39:43,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 22 hours, 4 minutes)
2025-09-11 22:51:25,138 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:51:25,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 22:53:19,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 86.24819 ± 65.428
2025-09-11 22:53:19,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [114.50013, 24.479954, 81.30483, 234.89091, 2.5916855, 97.545204, 90.5781, 30.861698, 39.125362, 146.604]
2025-09-11 22:53:19,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [226.0, 1000.0, 412.0, 1000.0, 11.0, 552.0, 150.0, 55.0, 37.0, 564.0]
2025-09-11 22:53:19,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (86.25) for latency MM1Queue_a033_s075
2025-09-11 22:53:19,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 21 hours, 49 minutes, 7 seconds)
2025-09-11 23:05:20,918 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:05:20,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:07:20,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 123.93404 ± 58.130
2025-09-11 23:07:20,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [90.76279, 61.216415, 83.03266, 227.70306, 131.31723, 201.20471, 177.61337, 119.24586, 39.523438, 107.720985]
2025-09-11 23:07:20,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [218.0, 244.0, 485.0, 1000.0, 227.0, 606.0, 1000.0, 225.0, 45.0, 128.0]
2025-09-11 23:07:20,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (123.93) for latency MM1Queue_a033_s075
2025-09-11 23:07:20,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 21 hours, 42 minutes, 44 seconds)
2025-09-11 23:19:57,546 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:19:57,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:21:21,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 62.61477 ± 72.700
2025-09-11 23:21:21,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [96.23561, 46.63636, 57.496506, 45.725388, 30.480799, 264.89984, 35.78108, -14.441742, 49.418606, 13.915266]
2025-09-11 23:21:21,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [158.0, 906.0, 50.0, 62.0, 66.0, 1000.0, 213.0, 255.0, 210.0, 19.0]
2025-09-11 23:21:21,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 21 hours, 57 minutes, 5 seconds)
2025-09-11 23:33:08,234 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:33:08,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:35:54,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 202.35541 ± 181.223
2025-09-11 23:35:54,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [144.83228, 277.64236, 610.732, 290.93646, 235.31374, 150.2855, 64.16694, -111.91609, 286.21036, 75.35057]
2025-09-11 23:35:54,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [275.0, 485.0, 1000.0, 1000.0, 1000.0, 185.0, 96.0, 661.0, 1000.0, 92.0]
2025-09-11 23:35:54,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (202.36) for latency MM1Queue_a033_s075
2025-09-11 23:35:54,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 21 hours, 37 minutes, 31 seconds)
2025-09-11 23:47:44,181 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:47:44,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:50:36,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 327.37823 ± 194.536
2025-09-11 23:50:36,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [178.3937, 177.71402, 217.33873, 665.4599, 382.82574, 585.9318, 22.795515, 180.7561, 402.52658, 460.04007]
2025-09-11 23:50:36,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [291.0, 316.0, 304.0, 1000.0, 1000.0, 761.0, 21.0, 247.0, 1000.0, 1000.0]
2025-09-11 23:50:36,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (327.38) for latency MM1Queue_a033_s075
2025-09-11 23:50:36,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 21 hours, 44 minutes, 16 seconds)
2025-09-12 00:03:21,748 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:03:21,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:05:22,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 300.85077 ± 167.656
2025-09-12 00:05:22,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [350.65182, 283.07477, 48.5946, 110.12355, 344.77448, 162.85495, 499.0725, 415.0599, 189.27576, 605.02527]
2025-09-12 00:05:22,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [363.0, 430.0, 619.0, 115.0, 396.0, 186.0, 583.0, 397.0, 204.0, 1000.0]
2025-09-12 00:05:22,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 21 hours, 51 minutes, 20 seconds)
2025-09-12 00:17:35,964 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:17:35,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:21:43,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 406.01749 ± 136.546
2025-09-12 00:21:43,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [429.96765, 406.64062, 30.056372, 369.74197, 438.37534, 441.2232, 509.60828, 407.9761, 573.1853, 453.40018]
2025-09-12 00:21:43,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 36.0, 437.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:21:43,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (406.02) for latency MM1Queue_a033_s075
2025-09-12 00:21:43,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 22 hours, 18 minutes, 49 seconds)
2025-09-12 00:33:16,640 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:33:16,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:36:17,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 375.32892 ± 257.387
2025-09-12 00:36:17,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [753.2544, 498.5437, 232.72935, 803.8329, 562.926, 76.56273, 79.92151, 391.2357, 106.45309, 247.8297]
2025-09-12 00:36:17,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 620.0, 873.0, 1000.0, 58.0, 100.0, 1000.0, 273.0, 271.0]
2025-09-12 00:36:17,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 22 hours, 13 minutes, 56 seconds)
2025-09-12 00:48:52,571 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:48:52,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:50:53,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 382.30948 ± 298.313
2025-09-12 00:50:53,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [168.64804, 305.41525, 167.10893, 399.89728, 91.76625, 984.6517, 575.9491, 322.67303, 792.4288, 14.556462]
2025-09-12 00:50:53,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [125.0, 251.0, 159.0, 343.0, 96.0, 1000.0, 1000.0, 332.0, 790.0, 17.0]
2025-09-12 00:50:53,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 21 hours, 59 minutes, 41 seconds)
2025-09-12 01:02:41,417 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:02:41,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:04:45,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 366.15680 ± 301.855
2025-09-12 01:04:45,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [135.2563, 1049.1243, 619.44244, 142.1789, 155.57953, 122.51783, 567.29126, 559.82306, 98.62278, 211.73132]
2025-09-12 01:04:45,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [215.0, 977.0, 1000.0, 173.0, 110.0, 127.0, 526.0, 1000.0, 112.0, 124.0]
2025-09-12 01:04:45,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 21 hours, 30 minutes, 20 seconds)
2025-09-12 01:16:23,638 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:16:23,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:19:08,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 446.01239 ± 218.693
2025-09-12 01:19:08,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [180.86212, 231.36638, 593.76794, 787.56604, 239.10298, 202.66422, 719.4989, 348.03223, 548.66016, 608.6032]
2025-09-12 01:19:08,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [157.0, 212.0, 1000.0, 739.0, 249.0, 164.0, 1000.0, 315.0, 1000.0, 1000.0]
2025-09-12 01:19:08,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (446.01) for latency MM1Queue_a033_s075
2025-09-12 01:19:08,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 21 hours, 8 minutes, 44 seconds)
2025-09-12 01:32:17,989 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:32:17,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:35:29,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 480.17368 ± 313.113
2025-09-12 01:35:29,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [586.1112, 987.1065, 252.74638, 572.38416, 19.91215, 627.05237, 482.7055, 26.813862, 916.66925, 330.2355]
2025-09-12 01:35:29,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 267.0, 1000.0, 16.0, 1000.0, 1000.0, 28.0, 1000.0, 344.0]
2025-09-12 01:35:29,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (480.17) for latency MM1Queue_a033_s075
2025-09-12 01:35:29,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 20 hours, 54 minutes, 3 seconds)
2025-09-12 01:47:32,212 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:47:32,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:49:32,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 360.35126 ± 249.376
2025-09-12 01:49:32,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [792.42676, 7.3544745, 38.500633, 573.511, 480.20407, 342.77496, 556.5561, 503.54468, 192.98332, 115.65652]
2025-09-12 01:49:32,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [666.0, 17.0, 27.0, 508.0, 1000.0, 295.0, 1000.0, 401.0, 144.0, 112.0]
2025-09-12 01:49:32,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 20 hours, 30 minutes, 30 seconds)
2025-09-12 02:00:35,609 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:00:35,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:03:07,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 523.52356 ± 387.433
2025-09-12 02:03:07,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [75.85213, 514.17993, 548.5342, 179.55153, 333.1234, 838.56586, 1174.41, 314.15256, 1155.973, 100.89241]
2025-09-12 02:03:07,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [78.0, 432.0, 1000.0, 113.0, 306.0, 1000.0, 1000.0, 233.0, 1000.0, 102.0]
2025-09-12 02:03:07,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (523.52) for latency MM1Queue_a033_s075
2025-09-12 02:03:07,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 19 hours, 59 minutes, 9 seconds)
2025-09-12 02:15:12,702 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:15:12,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:18:00,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 543.05579 ± 398.320
2025-09-12 02:18:00,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [823.1599, 318.16443, 54.305946, 587.3668, 854.17236, 522.2565, 1396.2467, 20.019201, 641.03253, 213.83328]
2025-09-12 02:18:00,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 171.0, 38.0, 514.0, 1000.0, 1000.0, 1000.0, 16.0, 1000.0, 141.0]
2025-09-12 02:18:00,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (543.06) for latency MM1Queue_a033_s075
2025-09-12 02:18:00,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 20 hours, 1 minute, 17 seconds)
2025-09-12 02:30:50,494 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:30:50,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:33:57,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 567.44275 ± 291.595
2025-09-12 02:33:57,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1117.3135, 635.7777, 152.63045, 818.1587, 296.874, 584.7673, 812.89606, 271.61502, 686.7843, 297.61035]
2025-09-12 02:33:57,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 136.0, 1000.0, 402.0, 1000.0, 592.0, 238.0, 1000.0, 296.0]
2025-09-12 02:33:57,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (567.44) for latency MM1Queue_a033_s075
2025-09-12 02:33:58,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 20 hours, 12 minutes, 14 seconds)
2025-09-12 02:44:55,083 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:44:55,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:47:02,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 385.69736 ± 267.191
2025-09-12 02:47:02,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [188.49539, 497.79828, 813.08514, 225.46933, 796.0405, 487.69653, 5.660454, 198.27094, 112.79824, 531.65875]
2025-09-12 02:47:02,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [139.0, 1000.0, 535.0, 168.0, 1000.0, 1000.0, 16.0, 151.0, 72.0, 428.0]
2025-09-12 02:47:02,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 19 hours, 4 minutes, 51 seconds)
2025-09-12 02:59:43,648 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:59:43,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:02:27,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 499.46094 ± 313.488
2025-09-12 03:02:27,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [673.7029, 959.89166, 508.23447, 915.3253, 736.7661, 180.08786, 160.10721, 611.41785, 135.23535, 113.84094]
2025-09-12 03:02:27,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 378.0, 1000.0, 1000.0, 95.0, 122.0, 1000.0, 110.0, 83.0]
2025-09-12 03:02:27,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 19 hours, 12 minutes, 1 second)
2025-09-12 03:13:56,845 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:13:56,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:16:29,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 459.51923 ± 364.857
2025-09-12 03:16:29,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [987.0276, 201.23158, 539.41376, 601.2116, 558.0349, 85.82894, 143.42633, 1157.0417, 312.1884, 9.787671]
2025-09-12 03:16:29,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 194.0, 1000.0, 1000.0, 1000.0, 76.0, 92.0, 814.0, 163.0, 8.0]
2025-09-12 03:16:29,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 19 hours, 4 minutes, 23 seconds)
2025-09-12 03:28:22,830 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:28:22,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:30:34,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 440.82724 ± 243.891
2025-09-12 03:30:34,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [102.77783, 629.598, 169.3253, 606.28357, 393.83865, 180.98242, 856.8279, 244.68634, 567.3193, 656.63354]
2025-09-12 03:30:34,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [89.0, 1000.0, 111.0, 340.0, 288.0, 114.0, 536.0, 157.0, 1000.0, 1000.0]
2025-09-12 03:30:34,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 18 hours, 37 minutes, 28 seconds)
2025-09-12 03:42:11,297 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:42:11,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:43:11,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 301.12390 ± 181.845
2025-09-12 03:43:11,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [22.898827, 485.62146, 431.57248, 550.06213, 89.31143, 556.68256, 177.33354, 212.85844, 236.74112, 248.15694]
2025-09-12 03:43:11,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [31.0, 313.0, 253.0, 366.0, 54.0, 450.0, 109.0, 205.0, 172.0, 200.0]
2025-09-12 03:43:11,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 17 hours, 32 minutes, 5 seconds)
2025-09-12 03:55:09,088 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:55:09,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:58:22,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 540.30786 ± 284.021
2025-09-12 03:58:22,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [99.08136, 793.36346, 656.02203, 497.9117, 410.44238, 67.05558, 520.2969, 652.7949, 1057.0931, 649.017]
2025-09-12 03:58:22,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [83.0, 1000.0, 1000.0, 376.0, 296.0, 53.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:58:22,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 17 hours, 49 minutes, 56 seconds)
2025-09-12 04:10:00,523 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:10:00,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:12:52,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 584.71960 ± 277.069
2025-09-12 04:12:52,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1083.3059, 329.2148, 305.3281, 524.74994, 1125.4421, 571.181, 410.81418, 530.35913, 597.0966, 369.7044]
2025-09-12 04:12:52,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 251.0, 198.0, 1000.0, 777.0, 1000.0, 247.0, 376.0, 1000.0, 224.0]
2025-09-12 04:12:52,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (584.72) for latency MM1Queue_a033_s075
2025-09-12 04:12:52,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 17 hours, 22 minutes, 16 seconds)
2025-09-12 04:24:32,117 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:24:32,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:26:55,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 455.82050 ± 284.705
2025-09-12 04:26:55,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [656.93915, 945.5477, 60.215996, 408.50864, 278.59222, 646.40076, 665.84784, 186.16609, 66.32606, 643.6608]
2025-09-12 04:26:55,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 66.0, 281.0, 153.0, 1000.0, 493.0, 117.0, 49.0, 1000.0]
2025-09-12 04:26:55,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 17 hours, 8 minutes, 18 seconds)
2025-09-12 04:39:15,919 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:39:15,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:42:44,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 746.14496 ± 479.608
2025-09-12 04:42:44,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [375.77338, 320.13345, 611.0407, 1200.27, 82.08062, 1630.2739, 553.9937, 1403.5381, 790.0252, 494.3202]
2025-09-12 04:42:44,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [372.0, 249.0, 1000.0, 748.0, 67.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:42:44,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (746.14) for latency MM1Queue_a033_s075
2025-09-12 04:42:44,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 17 hours, 19 minutes, 12 seconds)
2025-09-12 04:54:44,256 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:54:44,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:56:45,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 457.16486 ± 346.389
2025-09-12 04:56:45,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [61.895565, 213.58948, 465.69815, 702.43427, 386.44507, 113.90402, 364.38925, 1279.8464, 729.8718, 253.57419]
2025-09-12 04:56:45,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [87.0, 124.0, 1000.0, 503.0, 258.0, 60.0, 231.0, 1000.0, 1000.0, 145.0]
2025-09-12 04:56:45,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 17 hours, 24 minutes, 50 seconds)
2025-09-12 05:08:07,403 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:08:07,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:10:05,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 448.57715 ± 360.340
2025-09-12 05:10:05,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [82.96563, 980.871, 19.822086, 561.5395, 695.5127, 17.272326, 962.54724, 58.99988, 547.80035, 558.44104]
2025-09-12 05:10:05,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [69.0, 1000.0, 20.0, 321.0, 362.0, 32.0, 1000.0, 45.0, 344.0, 1000.0]
2025-09-12 05:10:05,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 16 hours, 44 minutes, 7 seconds)
2025-09-12 05:22:44,325 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:22:44,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:25:06,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 615.66351 ± 482.278
2025-09-12 05:25:06,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1362.4747, 976.9889, 124.29254, 406.42532, 1465.1318, 79.83698, 191.71815, 267.8356, 496.5433, 785.38776]
2025-09-12 05:25:06,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [919.0, 1000.0, 89.0, 270.0, 871.0, 73.0, 136.0, 145.0, 1000.0, 472.0]
2025-09-12 05:25:06,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 16 hours, 36 minutes, 46 seconds)
2025-09-12 05:36:08,578 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:36:08,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:38:15,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 672.77026 ± 547.233
2025-09-12 05:38:15,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [183.92561, 299.33838, 92.18072, 1237.1842, 1421.8511, 913.0222, 880.2802, 95.67666, 119.41772, 1484.8263]
2025-09-12 05:38:15,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [109.0, 212.0, 72.0, 1000.0, 899.0, 510.0, 511.0, 76.0, 99.0, 1000.0]
2025-09-12 05:38:15,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 16 hours, 10 minutes, 20 seconds)
2025-09-12 05:50:01,620 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:50:01,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:54:16,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 893.87665 ± 422.512
2025-09-12 05:54:16,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1193.1271, 537.8993, 650.7496, 1108.6218, 1344.9349, 1006.51544, 512.0582, 196.31094, 1660.4634, 728.08575]
2025-09-12 05:54:16,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 844.0, 1000.0, 1000.0, 106.0, 1000.0, 1000.0]
2025-09-12 05:54:16,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (893.88) for latency MM1Queue_a033_s075
2025-09-12 05:54:16,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 15 hours, 58 minutes, 34 seconds)
2025-09-12 06:06:16,870 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:06:16,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:09:04,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 488.86343 ± 223.675
2025-09-12 06:09:04,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [120.881325, 327.4119, 314.78052, 837.4686, 472.01236, 586.1589, 763.9322, 581.5481, 652.74567, 231.69473]
2025-09-12 06:09:04,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [93.0, 188.0, 185.0, 1000.0, 1000.0, 1000.0, 1000.0, 327.0, 1000.0, 157.0]
2025-09-12 06:09:04,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 15 hours, 54 minutes, 34 seconds)
2025-09-12 06:20:57,364 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:20:57,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:22:34,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 516.42657 ± 348.389
2025-09-12 06:22:34,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [753.78796, 10.20229, 85.848526, 76.16873, 881.88947, 496.3474, 920.99384, 765.1696, 863.03485, 310.82254]
2025-09-12 06:22:34,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [443.0, 19.0, 45.0, 59.0, 1000.0, 287.0, 511.0, 447.0, 437.0, 175.0]
2025-09-12 06:22:34,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 15 hours, 42 minutes, 15 seconds)
2025-09-12 06:34:26,111 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:34:26,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:37:21,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 736.77234 ± 434.128
2025-09-12 06:37:21,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [925.70197, 1592.394, 251.21207, 606.39984, 417.98245, 1252.0071, 67.915985, 752.1172, 907.54626, 594.447]
2025-09-12 06:37:21,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [590.0, 1000.0, 121.0, 1000.0, 282.0, 1000.0, 54.0, 1000.0, 1000.0, 313.0]
2025-09-12 06:37:21,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 15 hours, 24 minutes, 54 seconds)
2025-09-12 06:49:09,393 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:49:09,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:51:24,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 752.95288 ± 585.305
2025-09-12 06:51:24,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [98.54564, 1316.4667, 764.8543, 1075.9221, 284.7537, 1805.7404, 291.26212, 208.70573, 1442.8497, 240.42856]
2025-09-12 06:51:24,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [49.0, 729.0, 1000.0, 555.0, 160.0, 1000.0, 184.0, 149.0, 848.0, 144.0]
2025-09-12 06:51:24,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 15 hours, 21 minutes, 36 seconds)
2025-09-12 07:03:57,965 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:03:57,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:04:58,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 265.99786 ± 219.953
2025-09-12 07:04:58,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [199.47578, 14.304827, 39.138767, 755.70056, 319.37427, 425.85953, 472.96378, 151.38136, 208.20918, 73.57061]
2025-09-12 07:04:58,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [118.0, 16.0, 62.0, 1000.0, 221.0, 236.0, 250.0, 123.0, 128.0, 60.0]
2025-09-12 07:04:58,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 14 hours, 36 minutes, 39 seconds)
2025-09-12 07:16:20,455 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:16:20,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:18:10,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 709.96960 ± 419.627
2025-09-12 07:18:10,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [625.11743, 1100.4613, 1790.1663, 417.63873, 493.31607, 404.5637, 853.24506, 552.25464, 495.33777, 367.59464]
2025-09-12 07:18:10,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [356.0, 692.0, 1000.0, 259.0, 253.0, 227.0, 520.0, 275.0, 245.0, 197.0]
2025-09-12 07:18:10,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 14 hours, 2 minutes, 58 seconds)
2025-09-12 07:29:46,986 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:29:46,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:32:16,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 506.10358 ± 327.793
2025-09-12 07:32:16,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [790.2131, 770.72424, 142.72684, 23.812775, 128.92561, 1070.0381, 473.62564, 796.19904, 395.46292, 469.30707]
2025-09-12 07:32:16,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [409.0, 1000.0, 86.0, 15.0, 79.0, 602.0, 1000.0, 1000.0, 237.0, 1000.0]
2025-09-12 07:32:16,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 13 hours, 56 minutes, 25 seconds)
2025-09-12 07:44:21,235 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:44:21,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:46:18,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 592.25299 ± 571.635
2025-09-12 07:46:18,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [17.641676, 330.86304, 1138.0173, 170.30457, 26.550604, 610.4838, 93.6703, 1887.9348, 962.3732, 684.69037]
2025-09-12 07:46:18,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [15.0, 141.0, 564.0, 128.0, 19.0, 1000.0, 83.0, 1000.0, 1000.0, 310.0]
2025-09-12 07:46:18,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 13 hours, 33 minutes, 32 seconds)
2025-09-12 07:58:14,577 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:58:14,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:00:19,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 584.06097 ± 404.658
2025-09-12 08:00:19,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [961.8551, 906.3565, 696.9215, 1451.6902, 375.53738, 289.01556, 282.8149, 606.37634, 147.30989, 122.73188]
2025-09-12 08:00:19,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [432.0, 457.0, 1000.0, 761.0, 1000.0, 147.0, 161.0, 382.0, 87.0, 78.0]
2025-09-12 08:00:19,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 13 hours, 19 minutes, 22 seconds)
2025-09-12 08:12:23,748 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:12:23,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:14:51,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 743.73273 ± 590.309
2025-09-12 08:14:51,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [41.91024, 442.22256, 1341.2915, 248.33896, 260.77084, 1890.5859, 1338.6316, 142.84953, 812.38586, 918.3401]
2025-09-12 08:14:51,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [38.0, 1000.0, 1000.0, 125.0, 116.0, 1000.0, 1000.0, 62.0, 368.0, 514.0]
2025-09-12 08:14:51,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 13 hours, 16 minutes, 41 seconds)
2025-09-12 08:26:22,740 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:26:22,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:28:58,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 576.03503 ± 552.273
2025-09-12 08:28:58,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [330.41412, 1967.3312, 138.83203, 21.61272, 429.67953, 1025.4951, 819.8638, 396.03705, 86.97103, 544.11346]
2025-09-12 08:28:58,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [166.0, 1000.0, 68.0, 16.0, 192.0, 1000.0, 1000.0, 1000.0, 63.0, 1000.0]
2025-09-12 08:28:58,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 13 hours, 12 minutes, 55 seconds)
2025-09-12 08:41:22,325 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:41:22,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:44:28,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 860.22266 ± 691.780
2025-09-12 08:44:28,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [28.99127, 404.31558, 1617.1566, 587.7916, 768.979, 1949.3021, 130.28232, 1974.6022, 796.86487, 343.94046]
2025-09-12 08:44:28,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [23.0, 1000.0, 884.0, 1000.0, 419.0, 1000.0, 78.0, 1000.0, 1000.0, 176.0]
2025-09-12 08:44:28,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 13 hours, 14 minutes, 6 seconds)
2025-09-12 08:56:05,141 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:56:05,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:57:56,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 783.66907 ± 559.544
2025-09-12 08:57:56,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [381.50146, 40.458176, 947.9267, 302.65958, 330.27112, 528.699, 644.1559, 1795.4766, 1401.3431, 1464.1985]
2025-09-12 08:57:56,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [167.0, 25.0, 459.0, 169.0, 199.0, 221.0, 381.0, 1000.0, 668.0, 677.0]
2025-09-12 08:57:56,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 12 hours, 53 minutes, 38 seconds)
2025-09-12 09:09:06,656 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:09:06,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:11:23,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 893.87634 ± 831.114
2025-09-12 09:11:23,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [385.25418, 1937.5077, 278.32516, 619.4513, 191.0904, 95.0242, 2322.6365, 225.94281, 776.372, 2107.1597]
2025-09-12 09:11:23,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [184.0, 1000.0, 162.0, 1000.0, 105.0, 78.0, 1000.0, 121.0, 314.0, 1000.0]
2025-09-12 09:11:24,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 12 hours, 33 minutes, 26 seconds)
2025-09-12 09:24:07,240 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:24:07,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:27:18,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1306.00903 ± 688.855
2025-09-12 09:27:18,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [315.50867, 865.39935, 651.36566, 2071.9062, 2125.2595, 964.0419, 2016.8696, 2264.8276, 894.31323, 890.59796]
2025-09-12 09:27:18,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [222.0, 405.0, 300.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 519.0, 453.0]
2025-09-12 09:27:18,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (1306.01) for latency MM1Queue_a033_s075
2025-09-12 09:27:18,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 12 hours, 33 minutes, 28 seconds)
2025-09-12 09:38:22,623 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:38:22,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:40:32,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 950.93408 ± 612.272
2025-09-12 09:40:32,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [610.875, 2107.508, 1945.4059, 266.07642, 1381.5193, 839.9896, 427.48285, 877.3483, 535.44214, 517.69305]
2025-09-12 09:40:32,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [272.0, 981.0, 946.0, 175.0, 651.0, 452.0, 236.0, 369.0, 311.0, 258.0]
2025-09-12 09:40:32,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 12 hours, 9 minutes, 56 seconds)
2025-09-12 09:52:47,877 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:52:47,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:55:11,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 758.32312 ± 302.327
2025-09-12 09:55:11,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1188.2253, 1023.14, 425.52225, 1021.75726, 940.20416, 166.02214, 738.0179, 473.4271, 735.02313, 871.89215]
2025-09-12 09:55:11,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [579.0, 471.0, 217.0, 519.0, 539.0, 81.0, 351.0, 1000.0, 1000.0, 469.0]
2025-09-12 09:55:12,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 11 hours, 47 minutes, 19 seconds)
2025-09-12 10:06:49,065 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:06:49,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:09:29,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 826.16223 ± 636.169
2025-09-12 10:09:29,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [613.7695, 440.5204, 253.19963, 1947.5079, 666.5902, 539.14777, 384.56973, 133.5262, 1362.3047, 1920.4866]
2025-09-12 10:09:29,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 140.0, 875.0, 326.0, 244.0, 233.0, 90.0, 1000.0, 836.0]
2025-09-12 10:09:29,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 11 hours, 41 minutes, 8 seconds)
2025-09-12 10:20:58,506 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:20:58,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:24:27,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1634.72119 ± 909.430
2025-09-12 10:24:27,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [2145.877, 2237.8054, 2163.3047, 312.78613, 84.21071, 2368.8848, 2313.127, 2037.0731, 2312.1836, 371.95862]
2025-09-12 10:24:27,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 172.0, 40.0, 1000.0, 1000.0, 1000.0, 1000.0, 156.0]
2025-09-12 10:24:27,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (1634.72) for latency MM1Queue_a033_s075
2025-09-12 10:24:27,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 11 hours, 41 minutes, 19 seconds)
2025-09-12 10:37:12,263 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:37:12,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:39:15,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 451.51694 ± 354.323
2025-09-12 10:39:15,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [567.309, 164.48137, 678.27606, 37.58192, 1295.6866, 596.56903, 37.64333, 332.67477, 483.0998, 321.8477]
2025-09-12 10:39:15,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [291.0, 88.0, 1000.0, 25.0, 625.0, 1000.0, 33.0, 151.0, 1000.0, 140.0]
2025-09-12 10:39:15,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 11 hours, 16 minutes, 16 seconds)
2025-09-12 10:50:48,440 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:50:48,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:53:20,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 898.65106 ± 635.230
2025-09-12 10:53:20,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1530.6846, 703.0795, 1152.755, 586.761, 41.27207, 305.0233, 2224.96, 202.17183, 1174.8486, 1064.9545]
2025-09-12 10:53:20,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [709.0, 1000.0, 513.0, 1000.0, 24.0, 150.0, 1000.0, 105.0, 483.0, 517.0]
2025-09-12 10:53:20,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 11 hours, 9 minutes, 49 seconds)
2025-09-12 11:05:28,304 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:05:28,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:06:57,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 716.99243 ± 553.688
2025-09-12 11:06:57,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [582.69183, 854.08844, 516.28326, 909.62317, 488.06668, 2242.7092, 69.71917, 547.46533, 553.42084, 405.85577]
2025-09-12 11:06:57,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [237.0, 444.0, 226.0, 412.0, 232.0, 925.0, 56.0, 248.0, 285.0, 181.0]
2025-09-12 11:06:57,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 10 hours, 45 minutes, 53 seconds)
2025-09-12 11:18:55,746 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:18:55,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:20:54,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 546.85913 ± 606.324
2025-09-12 11:20:54,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [135.12178, 1003.8302, 70.93581, 720.56335, 349.90738, 2126.6948, 66.966156, 67.08915, 590.80725, 336.67517]
2025-09-12 11:20:54,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [59.0, 471.0, 39.0, 298.0, 1000.0, 1000.0, 35.0, 43.0, 238.0, 1000.0]
2025-09-12 11:20:54,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 10 hours, 28 minutes, 29 seconds)
2025-09-12 11:31:59,685 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:31:59,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:34:14,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1075.23022 ± 971.227
2025-09-12 11:34:14,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [517.90564, 304.9119, 61.784916, 31.321949, 1994.8535, 375.0316, 2462.0962, 502.3254, 2423.325, 2078.7458]
2025-09-12 11:34:14,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [250.0, 121.0, 34.0, 26.0, 938.0, 177.0, 1000.0, 195.0, 1000.0, 1000.0]
2025-09-12 11:34:14,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 10 hours, 6 seconds)
2025-09-12 11:46:22,528 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:46:22,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:48:24,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 641.76251 ± 591.889
2025-09-12 11:48:24,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [366.63187, 168.03355, 24.604387, 661.8156, 1307.9497, 683.5499, 750.05054, 40.110737, 2032.8724, 382.00623]
2025-09-12 11:48:24,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [229.0, 68.0, 36.0, 304.0, 1000.0, 349.0, 399.0, 34.0, 1000.0, 1000.0]
2025-09-12 11:48:24,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 9 hours, 40 minutes, 48 seconds)
2025-09-12 12:00:34,980 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:00:34,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:03:24,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1038.24744 ± 843.737
2025-09-12 12:03:24,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [2184.9197, 2199.276, 69.15716, 1715.7897, 1247.3817, 599.3689, 1807.57, 369.93475, 64.07473, 125.00241]
2025-09-12 12:03:24,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 44.0, 768.0, 1000.0, 1000.0, 921.0, 190.0, 34.0, 79.0]
2025-09-12 12:03:24,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 9 hours, 34 minutes, 29 seconds)
2025-09-12 12:14:51,796 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:14:51,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:15:48,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 448.91766 ± 320.717
2025-09-12 12:15:48,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [69.479546, 453.21225, 941.73975, 174.68153, 305.75763, 976.4748, 23.492908, 293.1902, 620.016, 631.13245]
2025-09-12 12:15:48,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [42.0, 167.0, 394.0, 94.0, 151.0, 480.0, 36.0, 158.0, 243.0, 274.0]
2025-09-12 12:15:48,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 9 hours, 10 minutes, 42 seconds)
2025-09-12 12:27:56,477 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:27:56,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:29:43,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 877.68048 ± 675.604
2025-09-12 12:29:43,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [892.0494, 1410.5917, 437.17892, 45.26606, 2235.2517, 445.14862, 899.02155, 371.09836, 290.79102, 1750.4067]
2025-09-12 12:29:43,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [346.0, 606.0, 235.0, 25.0, 1000.0, 213.0, 380.0, 138.0, 124.0, 774.0]
2025-09-12 12:29:43,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 56 minutes, 47 seconds)
2025-09-12 12:40:46,602 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:40:46,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:41:54,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 562.08704 ± 487.194
2025-09-12 12:41:54,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [999.8421, 168.31892, 336.31027, 632.58307, 116.31446, 222.86235, 1810.9564, 462.01343, 269.09937, 602.5708]
2025-09-12 12:41:54,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [392.0, 89.0, 175.0, 278.0, 61.0, 145.0, 759.0, 182.0, 109.0, 268.0]
2025-09-12 12:41:54,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 8 hours, 34 minutes, 17 seconds)
2025-09-12 12:53:40,090 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:53:40,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:56:10,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 718.20837 ± 692.636
2025-09-12 12:56:10,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [168.82153, 222.9086, 99.6179, 417.4355, 2274.5808, 672.4199, 1237.8188, 487.40985, 87.805534, 1513.2648]
2025-09-12 12:56:10,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [101.0, 1000.0, 66.0, 207.0, 1000.0, 1000.0, 1000.0, 258.0, 53.0, 637.0]
2025-09-12 12:56:10,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 8 hours, 21 minutes, 27 seconds)
2025-09-12 13:08:13,100 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:08:13,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:09:55,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 643.12024 ± 493.544
2025-09-12 13:09:55,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [259.29337, 104.288155, 1517.2346, 901.18945, 660.58826, 1064.344, 1254.8866, 247.44664, 4.26692, 417.66388]
2025-09-12 13:09:55,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [146.0, 64.0, 653.0, 410.0, 311.0, 433.0, 579.0, 114.0, 11.0, 1000.0]
2025-09-12 13:09:55,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 58 minutes, 56 seconds)
2025-09-12 13:22:03,736 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:22:03,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:25:50,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1402.58081 ± 619.245
2025-09-12 13:25:50,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1719.4375, 1064.5818, 1977.2554, 1874.7441, 1216.9006, 958.605, 94.53882, 2120.5498, 2042.7123, 956.4826]
2025-09-12 13:25:50,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 492.0, 1000.0, 1000.0, 509.0, 1000.0, 55.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:25:50,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 8 hours, 10 minutes, 16 seconds)
2025-09-12 13:37:49,319 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:37:49,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:40:14,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1141.53564 ± 776.426
2025-09-12 13:40:14,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1728.1571, 2177.518, 266.3536, 1700.1094, 825.9293, 178.24661, 2406.3843, 417.3548, 1189.3497, 525.954]
2025-09-12 13:40:14,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 123.0, 709.0, 337.0, 64.0, 1000.0, 229.0, 508.0, 302.0]
2025-09-12 13:40:15,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 59 minutes, 33 seconds)
2025-09-12 13:51:52,042 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:51:52,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:54:43,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1051.73462 ± 677.845
2025-09-12 13:54:43,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1932.6145, 698.9161, 983.34454, 1283.0532, 2488.7832, 847.81415, 162.69568, 1201.5803, 544.603, 373.94104]
2025-09-12 13:54:43,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 298.0, 1000.0, 602.0, 1000.0, 377.0, 84.0, 445.0, 273.0, 1000.0]
2025-09-12 13:54:43,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 34 seconds)
2025-09-12 14:06:29,161 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:06:29,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:08:52,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 911.72522 ± 680.721
2025-09-12 14:08:52,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1571.9031, 321.51093, 644.3386, 968.05927, 97.965034, 1210.2819, 383.88763, 396.41626, 2480.9614, 1041.9281]
2025-09-12 14:08:52,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [627.0, 147.0, 273.0, 394.0, 49.0, 1000.0, 185.0, 1000.0, 1000.0, 414.0]
2025-09-12 14:08:52,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 7 hours, 45 minutes, 20 seconds)
2025-09-12 14:20:36,358 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:20:36,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:22:48,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 996.59503 ± 548.485
2025-09-12 14:22:48,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [618.1146, 431.05304, 764.84503, 1456.103, 1220.0072, 634.56915, 1585.15, 1924.0679, 85.384125, 1246.6562]
2025-09-12 14:22:48,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [257.0, 312.0, 330.0, 598.0, 515.0, 281.0, 1000.0, 816.0, 67.0, 524.0]
2025-09-12 14:22:48,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 31 minutes, 54 seconds)
2025-09-12 14:34:56,507 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:34:56,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:37:09,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1036.18347 ± 998.349
2025-09-12 14:37:09,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [2555.4766, 87.322014, 1124.5524, 687.054, 2444.481, 523.66785, 44.51453, 2449.5063, 420.26523, 24.993681]
2025-09-12 14:37:09,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 43.0, 1000.0, 314.0, 1000.0, 244.0, 58.0, 916.0, 192.0, 27.0]
2025-09-12 14:37:09,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 7 minutes, 52 seconds)
2025-09-12 14:48:24,966 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:48:24,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:51:55,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1456.59082 ± 742.717
2025-09-12 14:51:55,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1973.5249, 2232.3337, 1285.9362, 1864.3082, 969.6114, 529.7168, 1152.6951, 2110.802, 55.870182, 2391.1108]
2025-09-12 14:51:55,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [838.0, 1000.0, 523.0, 822.0, 1000.0, 224.0, 1000.0, 1000.0, 24.0, 1000.0]
2025-09-12 14:51:55,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 55 minutes, 44 seconds)
2025-09-12 15:04:07,897 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:04:07,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:05:53,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 572.79059 ± 393.170
2025-09-12 15:05:53,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [175.46677, 255.77899, 383.37732, 194.64021, 402.67053, 1240.7009, 1190.6382, 240.9432, 727.9609, 915.7287]
2025-09-12 15:05:53,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [87.0, 118.0, 145.0, 98.0, 166.0, 1000.0, 517.0, 1000.0, 316.0, 383.0]
2025-09-12 15:05:53,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 38 minutes, 34 seconds)
2025-09-12 15:17:38,336 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:17:38,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:20:18,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1135.65210 ± 741.525
2025-09-12 15:20:18,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [628.68835, 284.51276, 1065.9237, 1550.4523, 2058.6873, 2433.6199, 1540.6333, 28.602125, 504.57053, 1260.8314]
2025-09-12 15:20:18,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [287.0, 138.0, 1000.0, 554.0, 873.0, 1000.0, 1000.0, 25.0, 259.0, 568.0]
2025-09-12 15:20:18,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 25 minutes, 45 seconds)
2025-09-12 15:32:52,277 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:32:52,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:35:05,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 924.33020 ± 559.110
2025-09-12 15:35:06,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1816.215, 1423.8148, 591.90906, 350.7155, 1770.5508, 1199.0276, 871.21814, 574.0347, 345.9686, 299.84784]
2025-09-12 15:35:06,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [758.0, 1000.0, 1000.0, 133.0, 672.0, 462.0, 356.0, 228.0, 135.0, 129.0]
2025-09-12 15:35:06,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 15 minutes, 53 seconds)
2025-09-12 15:46:31,131 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:46:31,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:48:37,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 734.97693 ± 644.997
2025-09-12 15:48:37,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [265.37576, 313.82382, 192.57318, 494.18774, 2259.3523, 1415.166, 1077.6667, 158.95615, 841.60516, 331.06323]
2025-09-12 15:48:37,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [105.0, 1000.0, 77.0, 197.0, 947.0, 574.0, 1000.0, 70.0, 367.0, 162.0]
2025-09-12 15:48:37,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 57 minutes, 19 seconds)
2025-09-12 16:00:16,547 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:00:16,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:02:25,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 828.00378 ± 545.612
2025-09-12 16:02:25,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1564.576, 1327.0303, 262.6903, 231.18015, 1135.8635, 1730.9987, 434.3879, 370.4413, 345.63196, 877.2383]
2025-09-12 16:02:25,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 511.0, 121.0, 92.0, 488.0, 639.0, 1000.0, 188.0, 137.0, 390.0]
2025-09-12 16:02:25,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 38 minutes, 21 seconds)
2025-09-12 16:13:51,966 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:13:51,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:16:20,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 885.61072 ± 608.443
2025-09-12 16:16:20,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [294.5097, 948.931, 1928.064, 898.0573, 524.3861, 2048.3167, 296.55493, 564.2849, 1024.1295, 328.87286]
2025-09-12 16:16:20,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [125.0, 371.0, 1000.0, 343.0, 1000.0, 881.0, 112.0, 271.0, 1000.0, 164.0]
2025-09-12 16:16:20,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 24 minutes, 3 seconds)
2025-09-12 16:28:21,336 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:28:21,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:30:55,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1149.78931 ± 682.273
2025-09-12 16:30:55,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1147.0236, 1115.0874, 969.272, 873.00134, 1103.0671, 1323.6514, 2345.1445, 204.06345, 164.30396, 2253.279]
2025-09-12 16:30:55,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [495.0, 1000.0, 403.0, 393.0, 424.0, 521.0, 1000.0, 89.0, 110.0, 1000.0]
2025-09-12 16:30:55,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 10 minutes, 38 seconds)
2025-09-12 16:43:00,059 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:43:00,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:46:59,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1941.67603 ± 567.394
2025-09-12 16:46:59,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [829.6991, 2497.5925, 2210.8425, 2330.4578, 1785.682, 1448.2946, 1197.7178, 2571.2285, 2190.1614, 2355.0845]
2025-09-12 16:46:59,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [327.0, 1000.0, 1000.0, 1000.0, 876.0, 651.0, 536.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:46:59,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1226 [INFO]: New best (1941.68) for latency MM1Queue_a033_s075
2025-09-12 16:46:59,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 1 minute, 58 seconds)
2025-09-12 16:58:45,767 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:58:45,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:02:30,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1398.38354 ± 660.864
2025-09-12 17:02:30,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1021.1952, 1386.6063, 749.3844, 1786.4028, 1375.8715, 2108.1692, 364.75867, 680.3729, 2474.1707, 2036.9031]
2025-09-12 17:02:30,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 551.0, 287.0, 686.0, 495.0, 1000.0, 1000.0, 1000.0, 1000.0, 898.0]
2025-09-12 17:02:30,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 55 minutes, 34 seconds)
2025-09-12 17:14:01,483 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:14:01,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:17:48,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1922.81482 ± 596.606
2025-09-12 17:17:48,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [842.34235, 2027.5271, 1375.9497, 1262.1766, 1618.8531, 1941.1128, 2656.6067, 2488.102, 2435.2842, 2580.1943]
2025-09-12 17:17:48,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [364.0, 745.0, 501.0, 490.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 983.0]
2025-09-12 17:17:48,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 46 minutes, 28 seconds)
2025-09-12 17:30:03,518 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:30:03,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:31:38,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 841.13538 ± 665.504
2025-09-12 17:31:38,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1388.8771, 949.5247, 473.8293, 704.45874, 363.4583, 609.9172, 81.43778, 516.2341, 752.89575, 2570.7205]
2025-09-12 17:31:38,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [551.0, 432.0, 222.0, 278.0, 144.0, 236.0, 44.0, 216.0, 316.0, 1000.0]
2025-09-12 17:31:38,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 31 minutes, 5 seconds)
2025-09-12 17:44:00,053 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:44:00,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:46:44,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1322.68481 ± 809.679
2025-09-12 17:46:44,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [809.3574, 2484.6438, 45.434727, 1634.3718, 2151.4548, 242.00084, 2262.7058, 1330.6444, 684.05505, 1582.1808]
2025-09-12 17:46:44,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [383.0, 1000.0, 26.0, 674.0, 1000.0, 110.0, 1000.0, 518.0, 272.0, 1000.0]
2025-09-12 17:46:44,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 17 minutes, 49 seconds)
2025-09-12 17:57:48,800 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:57:48,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:01:14,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1177.34253 ± 771.383
2025-09-12 18:01:14,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [997.35333, 2467.893, 1977.9852, 2020.896, 954.614, 562.5559, 304.9828, 261.15747, 1770.0829, 455.90527]
2025-09-12 18:01:14,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [376.0, 910.0, 739.0, 1000.0, 403.0, 1000.0, 1000.0, 1000.0, 694.0, 216.0]
2025-09-12 18:01:14,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 57 minutes, 34 seconds)
2025-09-12 18:13:21,809 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:13:21,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:15:30,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1165.49146 ± 812.031
2025-09-12 18:15:30,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1380.7502, 2423.865, 2523.605, 1774.0988, 985.9199, 989.20746, 769.5445, 169.4618, 90.37303, 548.0889]
2025-09-12 18:15:30,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [484.0, 967.0, 1000.0, 793.0, 397.0, 385.0, 276.0, 80.0, 47.0, 249.0]
2025-09-12 18:15:30,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 39 minutes)
2025-09-12 18:26:38,119 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:26:38,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:29:24,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 989.98279 ± 768.448
2025-09-12 18:29:24,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [144.201, 495.6402, 501.8517, 1063.6652, 2545.7358, 180.20888, 592.949, 2171.336, 948.45715, 1255.7832]
2025-09-12 18:29:24,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [81.0, 180.0, 230.0, 427.0, 1000.0, 1000.0, 1000.0, 785.0, 390.0, 1000.0]
2025-09-12 18:29:24,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 20 minutes, 28 seconds)
2025-09-12 18:41:54,391 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:41:54,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:44:47,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1443.91846 ± 997.130
2025-09-12 18:44:47,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [225.00795, 1913.3579, 2711.7117, 2227.7961, 2453.902, 2588.8357, 316.8371, 327.87994, 372.61664, 1301.2394]
2025-09-12 18:44:47,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [138.0, 740.0, 1000.0, 913.0, 1000.0, 1000.0, 151.0, 149.0, 167.0, 1000.0]
2025-09-12 18:44:47,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 10 minutes, 9 seconds)
2025-09-12 18:56:26,871 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:56:26,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:59:59,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1425.42029 ± 893.953
2025-09-12 18:59:59,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [2475.598, 120.760345, 1080.2013, 170.2401, 1979.3503, 538.6501, 2387.6838, 1076.9242, 1975.1256, 2449.6685]
2025-09-12 18:59:59,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 51.0, 1000.0, 74.0, 1000.0, 1000.0, 1000.0, 451.0, 1000.0, 1000.0]
2025-09-12 18:59:59,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 55 minutes, 47 seconds)
2025-09-12 19:11:14,754 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:11:14,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:13:49,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1149.26147 ± 744.042
2025-09-12 19:13:49,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [912.50226, 395.74307, 302.95993, 1650.8021, 1698.3453, 116.74985, 845.98444, 2504.6873, 1953.1444, 1111.6963]
2025-09-12 19:13:49,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [376.0, 168.0, 97.0, 634.0, 760.0, 1000.0, 314.0, 883.0, 809.0, 422.0]
2025-09-12 19:13:49,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 39 minutes, 40 seconds)
2025-09-12 19:26:06,670 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:26:06,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:27:15,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 628.49890 ± 570.834
2025-09-12 19:27:15,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [39.359344, 297.39978, 746.711, 69.1416, 1762.7815, 1083.1595, 87.25937, 1376.1321, 278.13065, 544.91345]
2025-09-12 19:27:15,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 137.0, 274.0, 46.0, 610.0, 385.0, 42.0, 651.0, 96.0, 237.0]
2025-09-12 19:27:16,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 23 minutes, 30 seconds)
2025-09-12 19:38:43,040 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:38:43,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:40:15,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 848.41388 ± 694.242
2025-09-12 19:40:15,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [807.3266, 182.98895, 226.38376, 275.1123, 220.87874, 1294.133, 1023.0203, 2423.1213, 1516.1309, 515.0432]
2025-09-12 19:40:15,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [296.0, 77.0, 86.0, 110.0, 107.0, 539.0, 440.0, 975.0, 554.0, 208.0]
2025-09-12 19:40:15,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 7 minutes, 31 seconds)
2025-09-12 19:52:16,729 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:52:16,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:54:42,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1240.16406 ± 830.090
2025-09-12 19:54:42,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [2044.3491, 2365.6536, 390.03525, 2604.198, 945.2872, 606.6274, 682.4913, 289.75058, 648.78326, 1824.4647]
2025-09-12 19:54:42,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 158.0, 1000.0, 385.0, 203.0, 285.0, 114.0, 274.0, 727.0]
2025-09-12 19:54:42,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 51 minutes, 52 seconds)
2025-09-12 20:06:40,126 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:06:40,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:09:30,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1219.65454 ± 915.038
2025-09-12 20:09:30,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [2545.149, 944.90247, 572.2511, 1648.0491, 2492.5078, 124.55459, 245.87923, 33.317482, 1570.4023, 2019.5323]
2025-09-12 20:09:30,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 377.0, 286.0, 599.0, 1000.0, 72.0, 1000.0, 23.0, 659.0, 888.0]
2025-09-12 20:09:30,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 37 minutes, 19 seconds)
2025-09-12 20:21:59,086 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:21:59,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:23:49,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 925.56604 ± 676.326
2025-09-12 20:23:49,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [983.39386, 1503.0054, 631.40497, 439.24197, 2064.6978, 247.15067, 1876.2778, 76.58852, 245.343, 1188.5569]
2025-09-12 20:23:49,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [410.0, 1000.0, 260.0, 195.0, 724.0, 116.0, 663.0, 39.0, 117.0, 422.0]
2025-09-12 20:23:49,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 24 minutes, 1 second)
2025-09-12 20:36:11,472 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:36:11,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:38:44,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1322.49817 ± 757.146
2025-09-12 20:38:44,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [898.33875, 2100.3962, 961.09344, 2362.567, 476.26227, 1000.79974, 2780.4238, 1142.8109, 980.2564, 522.0322]
2025-09-12 20:38:44,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [371.0, 817.0, 441.0, 1000.0, 189.0, 398.0, 1000.0, 464.0, 459.0, 238.0]
2025-09-12 20:38:44,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 11 minutes, 28 seconds)
2025-09-12 20:50:18,127 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:50:18,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:52:51,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1205.92395 ± 1054.590
2025-09-12 20:52:51,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [2870.6184, 2755.562, 463.14542, 510.96027, 491.59494, 161.33281, 301.14264, 1298.1954, 561.7436, 2644.9443]
2025-09-12 20:52:51,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 968.0, 1000.0, 234.0, 195.0, 84.0, 115.0, 484.0, 246.0, 1000.0]
2025-09-12 20:52:51,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 58 minutes, 5 seconds)
2025-09-12 21:04:54,986 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:04:55,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:06:30,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 837.28241 ± 736.268
2025-09-12 21:06:30,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [961.85803, 254.41864, 468.8061, 714.38574, 17.448355, 2783.714, 532.97076, 1348.5051, 760.3064, 530.4114]
2025-09-12 21:06:30,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [403.0, 133.0, 194.0, 279.0, 21.0, 1000.0, 208.0, 475.0, 330.0, 262.0]
2025-09-12 21:06:30,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 43 minutes, 4 seconds)
2025-09-12 21:19:03,213 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:19:03,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:21:10,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 938.14825 ± 914.358
2025-09-12 21:21:10,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [1134.6956, 350.1341, 2689.702, 91.59898, 262.5733, 1248.0623, 292.00003, 408.91394, 353.3587, 2550.444]
2025-09-12 21:21:10,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [379.0, 1000.0, 1000.0, 53.0, 112.0, 508.0, 109.0, 162.0, 107.0, 1000.0]
2025-09-12 21:21:10,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 28 minutes, 39 seconds)
2025-09-12 21:33:05,466 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:33:05,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:36:07,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1212.55591 ± 842.102
2025-09-12 21:36:07,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [2126.624, 764.39276, 838.4515, 2606.2988, 1435.1226, 716.75653, 119.90483, 300.92578, 2426.6492, 790.4332]
2025-09-12 21:36:07,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 322.0, 327.0, 1000.0, 538.0, 1000.0, 68.0, 120.0, 1000.0, 1000.0]
2025-09-12 21:36:07,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 14 minutes, 27 seconds)
2025-09-12 21:47:34,536 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:47:34,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:49:54,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1221 [DEBUG]: Total Reward: 964.07782 ± 857.706
2025-09-12 21:49:54,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1222 [DEBUG]: All rewards: [479.35007, 2419.2422, 19.432262, 616.20856, 1452.6703, 482.14417, 62.550697, 1192.5736, 2499.0483, 417.55737]
2025-09-12 21:49:54,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [219.0, 1000.0, 18.0, 294.0, 689.0, 248.0, 44.0, 456.0, 1000.0, 1000.0]
2025-09-12 21:49:54,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-ant):1251 [DEBUG]: Training session finished
