2025-05-13 09:06:30,921 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mda-highdim-mem16
2025-05-13 09:06:30,921 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mda-highdim-mem16
2025-05-13 09:06:30,921 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x1490a0b1e350>}
2025-05-13 09:06:30,921 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:30,927 baseline-bpql-mda-noisy-ant:91 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-05-13 09:06:30,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1133 [INFO]: Creating new trainer
2025-05-13 09:06:30,955 baseline-bpql-mda-noisy-ant:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-05-13 09:06:30,955 baseline-bpql-mda-noisy-ant:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:30,962 baseline-bpql-mda-noisy-ant:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=27, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(8, 512, batch_first=True)
)
2025-05-13 09:06:31,829 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:31,829 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 1/100
2025-05-13 09:10:28,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:10:46,004 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -379.86905 ± 52.285
2025-05-13 09:10:46,005 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-406.6628, -399.5908, -416.78854, -399.3944, -414.01898, -407.89523, -330.50403, -389.06546, -239.26704, -395.50308]
2025-05-13 09:10:46,005 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:10:46,005 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (-379.87) for latency MM1Queue_a033_s075
2025-05-13 09:10:46,012 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 6 hours, 59 minutes, 24 seconds)
2025-05-13 09:14:46,738 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:15:04,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 762.67188 ± 2.482
2025-05-13 09:15:04,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [761.70984, 762.55817, 765.3699, 759.49084, 761.9714, 760.8724, 763.2891, 764.6474, 759.27124, 767.5382]
2025-05-13 09:15:04,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:15:04,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (762.67) for latency MM1Queue_a033_s075
2025-05-13 09:15:04,806 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 58 minutes, 55 seconds)
2025-05-13 09:19:04,930 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:19:23,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 827.93018 ± 11.232
2025-05-13 09:19:23,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [837.4099, 844.6602, 812.366, 817.99976, 815.9183, 844.89795, 830.9075, 821.141, 821.42633, 832.57465]
2025-05-13 09:19:23,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:19:23,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (827.93) for latency MM1Queue_a033_s075
2025-05-13 09:19:23,198 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 6 hours, 55 minutes, 40 seconds)
2025-05-13 09:23:22,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:23:40,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 814.14221 ± 9.234
2025-05-13 09:23:40,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [808.2746, 811.28107, 800.9479, 810.90497, 806.3483, 821.2248, 836.31213, 814.21844, 818.8246, 813.08527]
2025-05-13 09:23:40,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:23:40,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 6 hours, 51 minutes, 28 seconds)
2025-05-13 09:27:39,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:27:57,382 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 839.15833 ± 4.323
2025-05-13 09:27:57,382 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [834.4483, 842.69904, 839.8551, 838.44727, 838.0435, 832.8791, 841.1449, 844.99744, 833.52655, 845.54205]
2025-05-13 09:27:57,383 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:27:57,383 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (839.16) for latency MM1Queue_a033_s075
2025-05-13 09:27:57,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 6 hours, 47 minutes, 5 seconds)
2025-05-13 09:31:56,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:32:14,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 819.38086 ± 4.231
2025-05-13 09:32:14,778 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [818.6088, 822.69543, 824.9614, 820.14844, 819.9996, 822.03, 815.5243, 808.87537, 819.93774, 821.0272]
2025-05-13 09:32:14,778 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:32:14,786 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 6 hours, 43 minutes, 48 seconds)
2025-05-13 09:36:13,997 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:36:32,141 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 842.37714 ± 7.320
2025-05-13 09:36:32,141 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [831.6634, 846.6232, 837.8307, 835.1366, 841.5445, 855.1084, 854.79034, 840.66095, 840.7807, 839.63257]
2025-05-13 09:36:32,141 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:36:32,141 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (842.38) for latency MM1Queue_a033_s075
2025-05-13 09:36:32,147 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 6 hours, 39 minutes, 4 seconds)
2025-05-13 09:40:31,929 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:40:49,713 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 568.69226 ± 763.606
2025-05-13 09:40:49,713 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [835.507, 843.24744, -1718.9272, 834.9883, 703.5776, 830.8582, 850.0629, 836.1797, 833.559, 837.8699]
2025-05-13 09:40:49,713 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:40:49,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 6 hours, 34 minutes, 32 seconds)
2025-05-13 09:44:48,634 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:45:06,400 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 834.51819 ± 5.248
2025-05-13 09:45:06,400 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [838.3576, 843.95135, 837.7548, 832.1402, 839.80133, 833.82764, 827.9089, 825.74097, 833.5216, 832.17737]
2025-05-13 09:45:06,400 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:45:06,405 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 30 minutes, 3 seconds)
2025-05-13 09:49:04,769 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:49:22,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 820.12952 ± 62.806
2025-05-13 09:49:22,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [839.8242, 843.94745, 829.74146, 846.60486, 820.7892, 846.1595, 839.6244, 633.5989, 850.7508, 850.2542]
2025-05-13 09:49:22,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:49:22,478 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 6 hours, 25 minutes, 31 seconds)
2025-05-13 09:53:20,936 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:53:36,836 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 369.15918 ± 524.938
2025-05-13 09:53:36,836 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-411.18372, -125.445724, 811.5525, 785.38196, 777.1745, 11.429764, 810.303, 780.8615, 740.5891, -489.0712]
2025-05-13 09:53:36,836 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 22.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:53:36,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 6 hours, 20 minutes, 20 seconds)
2025-05-13 09:57:53,896 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:58:11,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 735.84558 ± 29.497
2025-05-13 09:58:11,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [727.57794, 788.269, 704.3493, 789.11633, 730.0574, 728.9643, 720.53973, 742.23474, 694.76013, 732.5871]
2025-05-13 09:58:11,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:58:11,663 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 6 hours, 21 minutes, 11 seconds)
2025-05-13 10:01:46,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:02:04,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 738.93951 ± 37.930
2025-05-13 10:02:04,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [680.2076, 781.30334, 798.7558, 721.33813, 705.5352, 694.71564, 760.2821, 724.9908, 744.57495, 777.6908]
2025-05-13 10:02:04,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:02:04,076 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 6 hours, 9 minutes, 33 seconds)
2025-05-13 10:06:02,476 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:06:20,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 809.26642 ± 12.337
2025-05-13 10:06:20,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [818.9155, 810.56195, 785.7538, 792.9566, 821.6563, 802.6508, 814.6595, 805.3972, 811.8663, 828.2463]
2025-05-13 10:06:20,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:06:20,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 6 hours, 5 minutes, 6 seconds)
2025-05-13 10:10:18,526 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:10:36,568 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 857.02429 ± 6.924
2025-05-13 10:10:36,568 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [861.88873, 858.44855, 859.67755, 869.1667, 860.16266, 851.71, 860.62177, 846.1192, 856.7892, 845.6589]
2025-05-13 10:10:36,568 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:10:36,568 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (857.02) for latency MM1Queue_a033_s075
2025-05-13 10:10:36,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 6 hours, 59 seconds)
2025-05-13 10:14:35,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:14:53,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 840.32501 ± 16.734
2025-05-13 10:14:53,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [844.2384, 824.10583, 861.54694, 844.2146, 841.8528, 865.09753, 848.8897, 844.29913, 817.4566, 811.5482]
2025-05-13 10:14:53,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:14:53,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 57 minutes, 21 seconds)
2025-05-13 10:18:51,737 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:19:09,387 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 651.98529 ± 548.801
2025-05-13 10:19:09,387 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [838.9633, 842.86096, -993.4974, 846.08954, 804.3481, 848.0404, 802.3405, 842.1102, 863.8206, 824.7769]
2025-05-13 10:19:09,387 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:19:09,397 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 47 minutes, 58 seconds)
2025-05-13 10:23:07,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:23:25,720 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 858.06299 ± 7.401
2025-05-13 10:23:25,720 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [859.406, 863.8937, 853.08795, 871.88824, 847.21216, 846.1558, 861.43054, 855.414, 861.234, 860.90784]
2025-05-13 10:23:25,720 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:23:25,720 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (858.06) for latency MM1Queue_a033_s075
2025-05-13 10:23:25,726 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 50 minutes, 19 seconds)
2025-05-13 10:27:24,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:27:40,501 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 785.41681 ± 252.419
2025-05-13 10:27:40,501 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [872.36505, 886.1306, 872.56256, 855.7812, 872.6899, 875.3402, 873.2076, 28.628355, 856.0667, 861.3962]
2025-05-13 10:27:40,502 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 24.0, 1000.0, 1000.0]
2025-05-13 10:27:40,509 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 45 minutes, 43 seconds)
2025-05-13 10:31:38,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:31:56,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 867.69708 ± 10.207
2025-05-13 10:31:56,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [865.3727, 841.8276, 866.85657, 868.92194, 877.3324, 877.1401, 874.4223, 863.97003, 863.2315, 877.89526]
2025-05-13 10:31:56,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:31:56,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (867.70) for latency MM1Queue_a033_s075
2025-05-13 10:31:56,667 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 41 minutes, 21 seconds)
2025-05-13 10:35:55,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:36:13,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 850.98157 ± 79.516
2025-05-13 10:36:13,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [882.4955, 878.63745, 873.22986, 874.561, 882.2792, 872.84235, 867.1877, 613.14417, 875.0293, 890.40875]
2025-05-13 10:36:13,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:36:13,105 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 37 minutes, 3 seconds)
2025-05-13 10:40:11,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:40:29,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 869.37579 ± 10.096
2025-05-13 10:40:29,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [895.91003, 870.3112, 864.0973, 867.86084, 873.05365, 864.4931, 862.07916, 873.0291, 856.2207, 866.70276]
2025-05-13 10:40:29,545 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:40:29,545 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (869.38) for latency MM1Queue_a033_s075
2025-05-13 10:40:29,551 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 32 minutes, 50 seconds)
2025-05-13 10:44:27,839 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:44:45,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 862.85388 ± 79.918
2025-05-13 10:44:45,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [623.7282, 889.2533, 886.2924, 890.12067, 885.2218, 875.559, 893.99615, 895.1842, 896.3899, 892.79425]
2025-05-13 10:44:45,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:44:45,615 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 28 minutes, 30 seconds)
2025-05-13 10:48:43,885 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:49:01,804 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 869.02911 ± 7.135
2025-05-13 10:49:01,804 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [860.11096, 856.1269, 868.0559, 868.9316, 873.44275, 872.48755, 867.53394, 868.37915, 883.96216, 871.26105]
2025-05-13 10:49:01,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:49:01,811 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 24 minutes, 35 seconds)
2025-05-13 10:53:00,159 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:53:17,912 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 881.53888 ± 5.300
2025-05-13 10:53:17,912 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [875.88873, 880.1042, 892.98724, 876.0434, 889.0129, 881.01953, 878.08966, 883.9567, 878.7367, 879.5498]
2025-05-13 10:53:17,912 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:53:17,912 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (881.54) for latency MM1Queue_a033_s075
2025-05-13 10:53:17,921 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 20 minutes, 18 seconds)
2025-05-13 10:57:16,239 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:57:34,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 870.53137 ± 39.663
2025-05-13 10:57:34,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [873.56006, 873.6893, 878.27686, 889.6052, 890.32275, 881.9651, 896.8316, 885.2097, 882.44183, 753.41077]
2025-05-13 10:57:34,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:57:34,193 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 16 minutes)
2025-05-13 11:01:32,689 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:01:48,946 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 793.94855 ± 259.204
2025-05-13 11:01:48,946 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [878.2455, 892.32153, 887.93945, 888.3116, 17.347776, 883.5663, 881.6256, 874.5393, 891.4119, 844.17645]
2025-05-13 11:01:48,946 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 26.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:01:48,954 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 11 minutes, 19 seconds)
2025-05-13 11:05:47,600 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:06:05,682 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 868.79700 ± 8.750
2025-05-13 11:06:05,683 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [857.59424, 852.96124, 880.72186, 869.76855, 869.5241, 861.13544, 870.7437, 871.75903, 872.2038, 881.55853]
2025-05-13 11:06:05,683 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:06:05,692 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 5 hours, 7 minutes, 13 seconds)
2025-05-13 11:10:04,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:10:22,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 876.69531 ± 51.239
2025-05-13 11:10:22,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [802.81604, 901.70044, 905.986, 905.30255, 895.1272, 900.2617, 891.1591, 910.3706, 751.8158, 902.4134]
2025-05-13 11:10:22,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:10:22,088 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 5 hours, 2 minutes, 59 seconds)
2025-05-13 11:14:20,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:14:38,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 904.75116 ± 9.203
2025-05-13 11:14:38,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [895.5675, 899.68066, 907.5707, 906.38464, 891.9707, 922.2821, 918.7574, 899.22424, 899.6434, 906.43024]
2025-05-13 11:14:38,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:14:38,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (904.75) for latency MM1Queue_a033_s075
2025-05-13 11:14:38,202 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 58 minutes, 43 seconds)
2025-05-13 11:18:36,599 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:18:54,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 898.58466 ± 5.930
2025-05-13 11:18:54,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [898.78345, 891.1228, 901.6261, 896.21344, 898.55164, 911.8728, 902.7897, 890.2619, 894.6363, 899.9888]
2025-05-13 11:18:54,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:18:54,666 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 54 minutes, 30 seconds)
2025-05-13 11:22:53,014 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:23:10,980 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 895.57568 ± 11.208
2025-05-13 11:23:10,980 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [901.5004, 884.98224, 889.50305, 905.5148, 885.28174, 875.64984, 903.5215, 909.20807, 909.6923, 890.90295]
2025-05-13 11:23:10,980 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:23:10,988 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 50 minutes, 35 seconds)
2025-05-13 11:27:04,769 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:27:22,923 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 900.83801 ± 5.917
2025-05-13 11:27:22,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [896.9692, 910.38245, 904.9056, 907.51117, 905.9097, 893.33295, 897.6357, 902.9786, 895.5314, 893.2236]
2025-05-13 11:27:22,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:27:22,938 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 45 minutes, 15 seconds)
2025-05-13 11:31:32,461 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:31:50,138 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 906.25378 ± 30.715
2025-05-13 11:31:50,138 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [819.1476, 913.6017, 894.4289, 930.77155, 921.8026, 925.3934, 906.2149, 910.89056, 924.41595, 915.8715]
2025-05-13 11:31:50,138 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:31:50,138 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (906.25) for latency MM1Queue_a033_s075
2025-05-13 11:31:50,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 43 minutes, 22 seconds)
2025-05-13 11:35:48,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:36:06,373 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 898.78955 ± 11.584
2025-05-13 11:36:06,373 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [912.24097, 900.46515, 907.81476, 884.4967, 896.5016, 896.5557, 894.5548, 904.96, 914.8977, 875.40845]
2025-05-13 11:36:06,373 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:36:06,382 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 39 minutes, 6 seconds)
2025-05-13 11:40:04,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:40:22,795 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 913.38165 ± 11.004
2025-05-13 11:40:22,795 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [899.5875, 924.1177, 914.4146, 893.26227, 916.6457, 919.86957, 924.00055, 928.86145, 906.9622, 906.0946]
2025-05-13 11:40:22,795 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:40:22,795 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (913.38) for latency MM1Queue_a033_s075
2025-05-13 11:40:22,809 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 34 minutes, 48 seconds)
2025-05-13 11:44:21,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:44:38,775 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 926.27307 ± 12.982
2025-05-13 11:44:38,775 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [929.7652, 937.07385, 933.0203, 899.45044, 939.33167, 927.47925, 934.18695, 931.59607, 903.31885, 927.5081]
2025-05-13 11:44:38,775 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:44:38,775 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (926.27) for latency MM1Queue_a033_s075
2025-05-13 11:44:38,785 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 30 minutes, 26 seconds)
2025-05-13 11:48:37,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:48:54,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 936.36169 ± 8.670
2025-05-13 11:48:54,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [938.66034, 933.4683, 935.3866, 944.96814, 930.7179, 940.83417, 949.7254, 937.72565, 915.60706, 936.5231]
2025-05-13 11:48:54,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:48:54,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (936.36) for latency MM1Queue_a033_s075
2025-05-13 11:48:54,767 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 26 minutes, 58 seconds)
2025-05-13 11:52:53,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:53:10,654 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 963.02454 ± 41.448
2025-05-13 11:53:10,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [925.39374, 974.35504, 939.88416, 981.4514, 992.19196, 1005.2424, 951.97003, 986.61945, 865.6894, 1007.4474]
2025-05-13 11:53:10,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:53:10,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (963.02) for latency MM1Queue_a033_s075
2025-05-13 11:53:10,662 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 20 minutes, 22 seconds)
2025-05-13 11:57:08,737 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:57:26,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 942.99170 ± 110.530
2025-05-13 11:57:26,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [857.73944, 1085.28, 1020.9748, 980.74176, 778.1624, 819.8731, 846.48474, 907.58264, 1109.704, 1023.3738]
2025-05-13 11:57:26,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:57:26,161 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 15 minutes, 57 seconds)
2025-05-13 12:01:32,846 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:01:50,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1237.87036 ± 53.503
2025-05-13 12:01:50,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1258.2875, 1194.5604, 1246.65, 1301.9248, 1172.798, 1261.2773, 1275.8563, 1122.9, 1285.4471, 1259.002]
2025-05-13 12:01:50,044 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:01:50,044 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1237.87) for latency MM1Queue_a033_s075
2025-05-13 12:01:50,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 13 minutes, 9 seconds)
2025-05-13 12:05:47,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:06:05,022 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1332.06934 ± 59.636
2025-05-13 12:06:05,022 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1304.9944, 1178.9194, 1333.5339, 1376.0481, 1335.2421, 1317.5332, 1418.5245, 1363.069, 1360.0521, 1332.7762]
2025-05-13 12:06:05,022 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:06:05,022 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1332.07) for latency MM1Queue_a033_s075
2025-05-13 12:06:05,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 8 minutes, 40 seconds)
2025-05-13 12:09:49,557 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:10:06,762 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1198.60767 ± 107.717
2025-05-13 12:10:06,762 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1370.0234, 968.6889, 1187.022, 1306.3037, 1120.3622, 1241.7893, 1162.0162, 1292.9633, 1200.1122, 1136.7955]
2025-05-13 12:10:06,763 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:10:06,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 4 hours, 1 minute, 40 seconds)
2025-05-13 12:13:59,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:14:16,394 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1330.11560 ± 44.140
2025-05-13 12:14:16,394 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1386.0471, 1280.3658, 1317.8414, 1372.5688, 1359.6036, 1259.9336, 1296.421, 1396.5355, 1325.4589, 1306.3804]
2025-05-13 12:14:16,394 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:14:16,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 56 minutes, 16 seconds)
2025-05-13 12:18:14,384 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:18:31,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1234.25610 ± 496.787
2025-05-13 12:18:31,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1600.127, 1534.7299, 1565.929, 1535.5504, 808.8832, 293.19766, 446.83334, 1653.5148, 1659.2112, 1244.5844]
2025-05-13 12:18:31,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:18:31,829 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 52 minutes, 2 seconds)
2025-05-13 12:22:24,683 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:22:41,893 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1141.82874 ± 410.023
2025-05-13 12:22:41,893 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1222.8453, 503.37863, 1700.1879, 1190.0219, 594.03296, 1178.3799, 639.0019, 1221.3398, 1571.4027, 1597.6968]
2025-05-13 12:22:41,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:22:41,902 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 45 minutes, 19 seconds)
2025-05-13 12:26:39,899 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:26:57,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1715.67969 ± 165.708
2025-05-13 12:26:57,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1698.7445, 1677.9755, 1809.2202, 1260.425, 1673.0148, 1794.8247, 1717.6292, 1869.5391, 1825.1377, 1830.2859]
2025-05-13 12:26:57,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:26:57,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1715.68) for latency MM1Queue_a033_s075
2025-05-13 12:26:57,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 41 minutes, 12 seconds)
2025-05-13 12:30:55,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:31:12,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1614.29773 ± 45.177
2025-05-13 12:31:12,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1671.2037, 1675.0809, 1554.6768, 1535.6328, 1619.0419, 1616.7003, 1669.1687, 1599.8745, 1608.6141, 1592.9841]
2025-05-13 12:31:12,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:31:12,634 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 39 minutes, 24 seconds)
2025-05-13 12:35:10,738 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:35:28,072 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1738.26599 ± 266.343
2025-05-13 12:35:28,072 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1208.142, 1863.018, 1868.38, 1819.9945, 1743.6014, 2041.0492, 1914.0538, 1244.6326, 1806.4082, 1873.3806]
2025-05-13 12:35:28,072 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:35:28,072 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1738.27) for latency MM1Queue_a033_s075
2025-05-13 12:35:28,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 36 minutes, 11 seconds)
2025-05-13 12:39:26,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:39:42,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1323.24573 ± 801.806
2025-05-13 12:39:42,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [798.79364, 1950.549, 1922.363, 1330.8121, 1814.3718, 1781.6866, 1693.8365, 1974.3152, -647.8217, 613.5494]
2025-05-13 12:39:42,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 371.0]
2025-05-13 12:39:42,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 31 minutes, 44 seconds)
2025-05-13 12:43:40,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:43:57,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1943.84790 ± 40.606
2025-05-13 12:43:57,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2046.7004, 1902.5753, 1911.8436, 1954.9392, 1976.0386, 1950.1675, 1909.358, 1928.1077, 1921.4133, 1937.3335]
2025-05-13 12:43:57,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:43:57,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1943.85) for latency MM1Queue_a033_s075
2025-05-13 12:43:57,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 28 minutes, 21 seconds)
2025-05-13 12:47:55,824 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:48:12,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1817.39221 ± 58.956
2025-05-13 12:48:12,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1846.5688, 1926.0554, 1779.6991, 1760.1224, 1739.2776, 1865.0742, 1767.8512, 1892.995, 1805.595, 1790.6823]
2025-05-13 12:48:12,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:48:12,890 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 24 minutes, 7 seconds)
2025-05-13 12:52:11,074 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:52:28,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2038.67798 ± 45.135
2025-05-13 12:52:28,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2062.5215, 2004.4863, 1997.3156, 2044.4282, 2055.0322, 2094.4014, 2074.9275, 1933.6531, 2071.4954, 2048.5173]
2025-05-13 12:52:28,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:52:28,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (2038.68) for latency MM1Queue_a033_s075
2025-05-13 12:52:28,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 19 minutes, 49 seconds)
2025-05-13 12:56:26,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:56:43,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1996.40173 ± 54.384
2025-05-13 12:56:43,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1993.4567, 2029.1847, 2057.5283, 1998.8385, 2022.6447, 1996.1361, 1880.4502, 1976.3926, 1933.7046, 2075.6812]
2025-05-13 12:56:43,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:56:43,569 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 15 minutes, 34 seconds)
2025-05-13 13:00:41,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:00:58,832 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1970.15356 ± 80.842
2025-05-13 13:00:58,832 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2130.0742, 2030.2991, 1954.8032, 1990.3495, 1968.7466, 2034.5105, 1873.8484, 1982.8107, 1891.3795, 1844.7156]
2025-05-13 13:00:58,832 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:00:58,841 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 11 minutes, 29 seconds)
2025-05-13 13:04:57,251 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:05:14,347 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1888.90112 ± 409.482
2025-05-13 13:05:14,347 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1997.215, 697.1941, 1963.5306, 2197.0334, 2010.949, 2161.6946, 2077.8765, 1990.1903, 1829.8094, 1963.52]
2025-05-13 13:05:14,347 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:05:14,357 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 7 minutes, 15 seconds)
2025-05-13 13:09:12,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:09:29,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1938.19312 ± 244.944
2025-05-13 13:09:29,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2042.5583, 1217.8038, 2036.1431, 2007.0778, 2029.4901, 1982.6104, 1985.468, 1945.0768, 1998.6799, 2137.0234]
2025-05-13 13:09:29,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:09:29,792 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 3 hours, 3 minutes, 1 second)
2025-05-13 13:13:28,005 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:13:45,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2026.22729 ± 43.336
2025-05-13 13:13:45,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2030.9604, 2074.614, 2025.8699, 1931.4952, 2051.654, 2101.677, 2022.171, 2006.2153, 2015.1095, 2002.5083]
2025-05-13 13:13:45,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:13:45,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 58 minutes, 46 seconds)
2025-05-13 13:17:43,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:18:00,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2121.55615 ± 69.272
2025-05-13 13:18:00,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2130.9094, 2033.2692, 2157.8723, 2235.2415, 2020.9275, 2183.126, 2124.6584, 2066.9968, 2197.4954, 2065.0635]
2025-05-13 13:18:00,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:18:00,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (2121.56) for latency MM1Queue_a033_s075
2025-05-13 13:18:00,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 54 minutes, 32 seconds)
2025-05-13 13:21:59,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:22:16,352 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2112.39380 ± 83.117
2025-05-13 13:22:16,352 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2160.5925, 2016.5623, 2182.0063, 2266.5625, 2016.9636, 2176.7869, 2048.0178, 2030.3129, 2063.932, 2162.201]
2025-05-13 13:22:16,352 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:22:16,362 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 50 minutes, 20 seconds)
2025-05-13 13:26:14,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:26:32,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2130.69141 ± 65.364
2025-05-13 13:26:32,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2112.1323, 2181.602, 2091.3298, 2157.3306, 2102.571, 2127.597, 2096.809, 2038.2504, 2293.258, 2106.0352]
2025-05-13 13:26:32,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:26:32,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (2130.69) for latency MM1Queue_a033_s075
2025-05-13 13:26:32,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 46 minutes, 6 seconds)
2025-05-13 13:30:30,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:30:47,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2150.39819 ± 71.974
2025-05-13 13:30:47,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2100.644, 2078.346, 2074.4592, 2247.5667, 2079.0464, 2260.1814, 2132.1943, 2244.2246, 2179.4202, 2107.9001]
2025-05-13 13:30:47,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:30:47,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (2150.40) for latency MM1Queue_a033_s075
2025-05-13 13:30:47,338 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 41 minutes, 49 seconds)
2025-05-13 13:34:45,798 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:35:03,173 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2084.20776 ± 127.470
2025-05-13 13:35:03,173 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1789.3015, 2103.3877, 2048.1619, 2264.2698, 2143.7463, 2085.4053, 2170.3254, 2062.7126, 1964.946, 2209.8225]
2025-05-13 13:35:03,173 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:35:03,186 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 37 minutes, 37 seconds)
2025-05-13 13:39:01,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:39:18,866 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2182.73145 ± 100.453
2025-05-13 13:39:18,866 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2134.9307, 2039.349, 2174.406, 2345.5886, 2343.0623, 2145.8127, 2287.6216, 2111.8022, 2099.0928, 2145.6472]
2025-05-13 13:39:18,866 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:39:18,866 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (2182.73) for latency MM1Queue_a033_s075
2025-05-13 13:39:18,875 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 33 minutes, 22 seconds)
2025-05-13 13:43:17,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:43:34,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2157.79224 ± 58.568
2025-05-13 13:43:34,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2175.3062, 2159.2942, 2181.6458, 2230.8987, 2089.6384, 2178.661, 2141.9282, 2202.1174, 2198.8352, 2019.5967]
2025-05-13 13:43:34,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:43:34,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 29 minutes, 4 seconds)
2025-05-13 13:47:32,634 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:47:49,988 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2015.18384 ± 275.054
2025-05-13 13:47:49,988 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1925.1019, 1834.8185, 2130.943, 2144.3296, 2248.137, 2185.678, 2079.733, 2209.4592, 1273.8151, 2119.8225]
2025-05-13 13:47:49,988 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:47:50,001 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 24 minutes, 49 seconds)
2025-05-13 13:51:48,565 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:52:05,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2149.72314 ± 82.710
2025-05-13 13:52:05,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2293.2966, 2110.5896, 2105.154, 2170.2585, 2040.7637, 2176.719, 2129.2979, 2245.5095, 2210.1323, 2015.5078]
2025-05-13 13:52:05,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:52:05,707 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 20 minutes, 37 seconds)
2025-05-13 13:56:04,368 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:56:21,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2027.05542 ± 434.909
2025-05-13 13:56:21,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2200.3457, 2136.4634, 2188.7954, 742.81586, 2358.8308, 2172.9312, 2166.48, 2070.0464, 2065.783, 2168.0632]
2025-05-13 13:56:21,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:56:21,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 16 minutes, 21 seconds)
2025-05-13 14:00:20,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:00:37,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2077.27588 ± 90.555
2025-05-13 14:00:37,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2305.0745, 2038.9401, 2073.1692, 2003.2336, 2134.593, 2059.084, 1962.8729, 2103.1536, 2002.4418, 2090.196]
2025-05-13 14:00:37,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:00:37,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 12 minutes, 4 seconds)
2025-05-13 14:04:35,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:04:52,477 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2181.69653 ± 67.278
2025-05-13 14:04:52,477 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2109.6777, 2313.9856, 2090.3235, 2236.6694, 2178.9055, 2155.788, 2166.7417, 2105.9487, 2230.1238, 2228.8032]
2025-05-13 14:04:52,477 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:04:52,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 7 minutes, 49 seconds)
2025-05-13 14:08:59,899 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:09:17,353 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2134.01367 ± 375.227
2025-05-13 14:09:17,353 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2220.5007, 2256.29, 2365.0852, 2301.9697, 2340.013, 2199.3574, 1021.17194, 2202.2603, 2201.8162, 2231.6729]
2025-05-13 14:09:17,353 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:09:17,365 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 4 minutes, 26 seconds)
2025-05-13 14:13:15,932 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:13:33,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2141.43311 ± 58.639
2025-05-13 14:13:33,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2087.346, 2078.4717, 2195.2957, 2101.9238, 2144.927, 2095.6084, 2214.976, 2094.8247, 2144.0566, 2256.9033]
2025-05-13 14:13:33,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:13:33,127 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 2 hours, 9 seconds)
2025-05-13 14:17:31,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:17:48,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2123.17114 ± 390.782
2025-05-13 14:17:48,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2280.0554, 2242.7634, 2219.601, 2188.7168, 2360.9841, 967.6967, 2188.2114, 2144.442, 2296.711, 2342.53]
2025-05-13 14:17:48,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:17:48,453 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 55 minutes, 49 seconds)
2025-05-13 14:21:46,866 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:22:04,176 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2241.13794 ± 58.775
2025-05-13 14:22:04,176 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2379.729, 2295.4202, 2157.5366, 2220.7007, 2228.8608, 2272.0989, 2232.8389, 2219.823, 2189.5906, 2214.7805]
2025-05-13 14:22:04,176 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:22:04,176 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (2241.14) for latency MM1Queue_a033_s075
2025-05-13 14:22:04,190 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 51 minutes, 32 seconds)
2025-05-13 14:26:02,543 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:26:19,662 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2239.71387 ± 101.744
2025-05-13 14:26:19,662 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2335.1492, 2263.8796, 2304.3113, 2265.1372, 2174.203, 2365.157, 2193.2031, 2151.7458, 2330.219, 2014.1344]
2025-05-13 14:26:19,662 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:26:19,672 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 47 minutes, 15 seconds)
2025-05-13 14:30:18,013 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:30:35,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1880.62659 ± 674.851
2025-05-13 14:30:35,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [528.244, 2246.9653, 2235.1147, 2204.6204, 2250.7214, 2096.0266, 2260.7751, 2301.0784, 2139.5386, 543.1799]
2025-05-13 14:30:35,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:30:35,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 42 minutes, 14 seconds)
2025-05-13 14:34:31,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:34:48,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2227.96680 ± 76.049
2025-05-13 14:34:48,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2237.2053, 2053.2988, 2262.1423, 2239.1624, 2179.2354, 2357.3557, 2199.1587, 2258.825, 2293.282, 2200.0032]
2025-05-13 14:34:48,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:34:48,548 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 37 minutes, 46 seconds)
2025-05-13 14:38:46,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:39:04,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2218.89526 ± 83.011
2025-05-13 14:39:04,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2319.0679, 2261.9927, 2215.374, 2207.2407, 2043.0421, 2346.2227, 2225.5374, 2196.4612, 2127.2156, 2246.7986]
2025-05-13 14:39:04,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:39:04,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 33 minutes, 33 seconds)
2025-05-13 14:43:02,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:43:19,903 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2123.14941 ± 90.849
2025-05-13 14:43:19,903 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2151.0698, 2126.9172, 2122.1147, 2096.269, 2104.0525, 2317.7786, 1924.2141, 2163.83, 2086.9216, 2138.3262]
2025-05-13 14:43:19,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:43:19,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 29 minutes, 18 seconds)
2025-05-13 14:47:13,124 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:47:30,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2206.36841 ± 81.095
2025-05-13 14:47:30,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2184.2063, 2216.259, 2309.6758, 2336.1753, 2173.666, 2237.9712, 2116.6592, 2205.9373, 2043.7446, 2239.3892]
2025-05-13 14:47:30,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:47:30,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 24 minutes, 41 seconds)
2025-05-13 14:51:24,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:51:41,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2173.14111 ± 404.490
2025-05-13 14:51:41,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2383.9404, 2147.9612, 2435.0493, 2207.253, 985.2021, 2378.3022, 2348.09, 2257.5354, 2316.854, 2271.2258]
2025-05-13 14:51:41,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:51:41,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 20 minutes, 9 seconds)
2025-05-13 14:55:32,789 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:55:50,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2193.42725 ± 239.182
2025-05-13 14:55:50,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2393.768, 2237.3442, 2263.421, 2103.929, 2294.0737, 2363.2446, 1513.1162, 2240.6814, 2220.014, 2304.6812]
2025-05-13 14:55:50,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:55:50,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 15 minutes, 41 seconds)
2025-05-13 14:59:45,144 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:00:02,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2246.28906 ± 39.093
2025-05-13 15:00:02,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2208.2961, 2217.2773, 2233.7275, 2217.1707, 2308.9438, 2276.0525, 2282.7324, 2265.3618, 2178.0955, 2275.233]
2025-05-13 15:00:02,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:00:02,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (2246.29) for latency MM1Queue_a033_s075
2025-05-13 15:00:02,402 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 11 minutes, 18 seconds)
2025-05-13 15:03:57,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:04:14,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2231.69092 ± 99.226
2025-05-13 15:04:14,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2385.17, 2159.6685, 2318.6777, 2271.7031, 2263.3213, 2266.429, 2326.2156, 2132.9592, 2057.3894, 2135.3748]
2025-05-13 15:04:14,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:04:14,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 6 minutes, 55 seconds)
2025-05-13 15:08:09,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:08:26,565 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2238.44531 ± 74.262
2025-05-13 15:08:26,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2247.2861, 2191.676, 2290.7236, 2235.2822, 2235.517, 2238.7117, 2397.4219, 2128.8103, 2134.7988, 2284.2236]
2025-05-13 15:08:26,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:08:26,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 2 minutes, 49 seconds)
2025-05-13 15:12:19,323 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:12:35,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1955.22327 ± 737.092
2025-05-13 15:12:35,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2510.7085, 2289.887, 2261.5786, 119.45077, 2255.3286, 2328.1533, 954.92487, 2304.3657, 2320.2524, 2207.582]
2025-05-13 15:12:35,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 197.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:12:35,033 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 58 minutes, 31 seconds)
2025-05-13 15:16:30,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:16:48,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2227.46875 ± 83.779
2025-05-13 15:16:48,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2333.9233, 2183.8223, 2370.1792, 2235.8398, 2228.2808, 2095.8955, 2256.706, 2283.9243, 2121.7122, 2164.4045]
2025-05-13 15:16:48,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:16:48,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 54 minutes, 30 seconds)
2025-05-13 15:20:43,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:21:00,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2254.97119 ± 87.333
2025-05-13 15:21:00,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2350.5173, 2257.384, 2218.1633, 2211.4426, 2253.3796, 2081.0464, 2182.7534, 2349.029, 2394.8152, 2251.1833]
2025-05-13 15:21:00,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:21:00,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (2254.97) for latency MM1Queue_a033_s075
2025-05-13 15:21:00,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 50 minutes, 18 seconds)
2025-05-13 15:24:52,599 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:25:09,673 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1999.30725 ± 658.262
2025-05-13 15:25:09,673 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2162.7341, 2271.9658, 2111.5664, 2348.612, 34.729523, 2243.767, 2136.2244, 2208.0364, 2270.1318, 2205.3035]
2025-05-13 15:25:09,673 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:25:09,687 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 46 minutes)
2025-05-13 15:29:02,520 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:29:19,397 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2041.87830 ± 682.723
2025-05-13 15:29:19,398 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2328.786, 2197.7922, 2335.363, 2247.9126, 2266.5522, 2259.4858, 2342.9817, 2158.714, 2280.5864, 0.6098609]
2025-05-13 15:29:19,398 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:29:19,410 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 41 minutes, 45 seconds)
2025-05-13 15:33:11,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:33:28,018 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2149.76782 ± 127.199
2025-05-13 15:33:28,018 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2109.504, 1983.8298, 1859.4398, 2212.2722, 2306.5938, 2176.0051, 2163.4907, 2232.245, 2217.3154, 2236.9824]
2025-05-13 15:33:28,018 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:33:28,032 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 37 minutes, 35 seconds)
2025-05-13 15:37:19,713 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:37:36,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2124.21094 ± 358.518
2025-05-13 15:37:36,631 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1064.7356, 2386.0698, 2242.3232, 2174.5747, 2169.6143, 2201.865, 2298.1316, 2220.2974, 2276.654, 2207.8428]
2025-05-13 15:37:36,631 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:37:36,642 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 33 minutes, 17 seconds)
2025-05-13 15:41:28,231 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:41:45,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2308.17529 ± 75.127
2025-05-13 15:41:45,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2277.6577, 2342.8625, 2446.5107, 2218.0422, 2334.8162, 2280.905, 2274.278, 2405.6865, 2314.845, 2186.1501]
2025-05-13 15:41:45,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:41:45,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (2308.18) for latency MM1Queue_a033_s075
2025-05-13 15:41:45,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 29 minutes, 3 seconds)
2025-05-13 15:45:37,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:45:53,996 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2258.99536 ± 76.819
2025-05-13 15:45:53,997 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2229.1743, 2227.6064, 2367.8865, 2187.6965, 2402.0598, 2237.109, 2220.7808, 2344.4482, 2185.0981, 2188.0938]
2025-05-13 15:45:53,997 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:45:54,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 24 minutes, 53 seconds)
2025-05-13 15:49:46,004 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:50:02,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2280.66846 ± 72.255
2025-05-13 15:50:02,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2315.7053, 2411.6125, 2271.969, 2234.9133, 2303.3145, 2232.296, 2246.1284, 2277.7083, 2371.5044, 2141.5322]
2025-05-13 15:50:02,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:50:02,899 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 20 minutes, 43 seconds)
2025-05-13 15:53:54,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:54:11,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2257.98706 ± 50.097
2025-05-13 15:54:11,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2157.1787, 2316.2, 2245.1462, 2207.8206, 2299.5688, 2279.9248, 2258.6885, 2311.3918, 2295.159, 2208.792]
2025-05-13 15:54:11,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:54:11,841 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 16 minutes, 35 seconds)
2025-05-13 15:58:03,533 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:58:20,475 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2253.44849 ± 103.085
2025-05-13 15:58:20,475 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2311.805, 2167.5745, 2056.6304, 2345.7578, 2212.4023, 2287.135, 2244.2234, 2159.8962, 2314.329, 2434.731]
2025-05-13 15:58:20,475 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:58:20,490 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 12 minutes, 26 seconds)
2025-05-13 16:02:12,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:02:28,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2253.78369 ± 88.274
2025-05-13 16:02:28,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2264.2214, 2382.3328, 2199.8381, 2275.5022, 2296.4746, 2276.103, 2149.0654, 2323.8882, 2308.2915, 2062.1199]
2025-05-13 16:02:28,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:02:28,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 8 minutes, 17 seconds)
2025-05-13 16:06:20,631 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:06:37,764 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2207.13599 ± 64.745
2025-05-13 16:06:37,765 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2236.3906, 2173.7637, 2202.46, 2263.5269, 2250.5151, 2103.0967, 2297.039, 2085.2354, 2223.1472, 2236.1846]
2025-05-13 16:06:37,765 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:06:37,776 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 8 seconds)
2025-05-13 16:10:30,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:10:48,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2231.80908 ± 78.309
2025-05-13 16:10:48,084 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2205.7686, 2383.872, 2119.2983, 2153.1042, 2173.8335, 2210.4893, 2257.1018, 2225.0833, 2350.6292, 2238.912]
2025-05-13 16:10:48,084 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:10:48,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1251 [DEBUG]: Training session finished
