2025-05-13 09:06:36,965 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mda-highdim-mem24
2025-05-13 09:06:36,965 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-bpql-mda-highdim-mem24
2025-05-13 09:06:36,965 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x15003850e1d0>}
2025-05-13 09:06:36,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:36,970 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1133 [INFO]: Creating new trainer
2025-05-13 09:06:36,986 baseline-bpql-mda-noisy-ant:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-05-13 09:06:36,986 baseline-bpql-mda-noisy-ant:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:36,993 baseline-bpql-mda-noisy-ant:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=27, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(8, 512, batch_first=True)
)
2025-05-13 09:06:37,660 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:37,661 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 1/100
2025-05-13 09:10:46,652 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:11:02,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -386.07898 ± 206.551
2025-05-13 09:11:02,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-499.57126, 26.264421, -452.76227, -494.08655, -518.73157, 24.725962, -465.32065, -500.2463, -482.71448, -498.3474]
2025-05-13 09:11:02,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 29.0, 1000.0, 1000.0, 1000.0, 30.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:11:02,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (-386.08) for latency ExtremeClogL1U23
2025-05-13 09:11:02,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 7 hours, 17 minutes, 5 seconds)
2025-05-13 09:15:11,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:15:31,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 716.13275 ± 27.787
2025-05-13 09:15:31,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [749.5131, 727.3936, 700.85657, 710.00024, 736.8447, 645.67993, 733.4927, 734.93225, 703.7259, 718.8882]
2025-05-13 09:15:31,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:15:31,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (716.13) for latency ExtremeClogL1U23
2025-05-13 09:15:31,619 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 7 hours, 16 minutes, 3 seconds)
2025-05-13 09:19:41,461 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:20:01,509 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 781.29376 ± 13.595
2025-05-13 09:20:01,509 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [777.75934, 780.4085, 776.31104, 780.04126, 787.20154, 781.43414, 786.3385, 812.5407, 753.837, 777.0653]
2025-05-13 09:20:01,509 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:20:01,509 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (781.29) for latency ExtremeClogL1U23
2025-05-13 09:20:01,512 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 7 hours, 13 minutes, 11 seconds)
2025-05-13 09:24:10,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:24:30,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 807.95978 ± 19.564
2025-05-13 09:24:30,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [758.21, 812.1419, 831.527, 824.8382, 815.2658, 805.1457, 805.75757, 824.2969, 795.27136, 807.1429]
2025-05-13 09:24:30,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:24:30,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (807.96) for latency ExtremeClogL1U23
2025-05-13 09:24:30,692 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 7 hours, 9 minutes, 12 seconds)
2025-05-13 09:28:40,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:29:00,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 838.99280 ± 7.689
2025-05-13 09:29:00,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [852.5358, 838.45105, 844.1839, 835.8936, 838.79205, 833.9716, 824.0411, 843.64734, 846.291, 832.12024]
2025-05-13 09:29:00,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:29:00,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (838.99) for latency ExtremeClogL1U23
2025-05-13 09:29:00,127 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 7 hours, 5 minutes, 6 seconds)
2025-05-13 09:33:09,829 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:33:29,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 820.25183 ± 12.531
2025-05-13 09:33:29,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [836.6033, 813.9101, 789.47046, 827.3859, 814.55066, 826.17224, 829.6397, 813.3133, 826.17645, 825.2966]
2025-05-13 09:33:29,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:33:29,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 7 hours, 2 minutes, 5 seconds)
2025-05-13 09:37:38,932 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:37:59,165 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 815.05609 ± 3.980
2025-05-13 09:37:59,165 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [811.59937, 810.3412, 819.13025, 813.2972, 814.9199, 814.4383, 824.9579, 812.8873, 815.1224, 813.867]
2025-05-13 09:37:59,165 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:37:59,172 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 6 hours, 57 minutes, 44 seconds)
2025-05-13 09:42:08,706 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:42:28,921 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 821.69012 ± 13.236
2025-05-13 09:42:28,921 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [829.3291, 835.56213, 794.5926, 829.2543, 831.6026, 807.15295, 805.92206, 832.0712, 828.0934, 823.3204]
2025-05-13 09:42:28,921 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:42:28,926 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 6 hours, 53 minutes, 12 seconds)
2025-05-13 09:46:38,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:46:58,433 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 821.42053 ± 18.852
2025-05-13 09:46:58,433 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [803.67285, 785.92285, 832.31573, 833.40894, 838.63495, 793.3501, 820.0472, 833.7154, 833.38824, 839.7487]
2025-05-13 09:46:58,433 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:46:58,439 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 48 minutes, 48 seconds)
2025-05-13 09:51:07,874 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:51:27,992 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 804.43445 ± 4.822
2025-05-13 09:51:27,992 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [799.2959, 802.4295, 802.0792, 803.08417, 804.60754, 804.2043, 804.8781, 814.29395, 797.7693, 811.7027]
2025-05-13 09:51:27,992 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:51:27,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 6 hours, 44 minutes, 21 seconds)
2025-05-13 09:55:37,932 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:55:57,891 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 835.77588 ± 13.125
2025-05-13 09:55:57,891 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [824.86194, 844.18884, 813.7188, 846.025, 813.6556, 854.3572, 844.3699, 840.52484, 837.8504, 838.2061]
2025-05-13 09:55:57,891 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:55:57,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 6 hours, 39 minutes, 58 seconds)
2025-05-13 10:00:07,600 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:00:27,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 859.37140 ± 5.957
2025-05-13 10:00:27,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [865.39264, 851.95593, 862.0733, 855.34827, 863.186, 855.2319, 855.08905, 868.5766, 865.6273, 851.23254]
2025-05-13 10:00:27,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:00:27,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (859.37) for latency ExtremeClogL1U23
2025-05-13 10:00:27,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 6 hours, 35 minutes, 27 seconds)
2025-05-13 10:04:36,353 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:04:56,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 864.34979 ± 5.168
2025-05-13 10:04:56,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [864.0223, 863.86285, 866.1873, 854.4305, 861.58044, 868.26746, 859.69745, 866.60205, 875.06854, 863.77905]
2025-05-13 10:04:56,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:04:56,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (864.35) for latency ExtremeClogL1U23
2025-05-13 10:04:56,137 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 6 hours, 30 minutes, 41 seconds)
2025-05-13 10:09:04,829 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:09:24,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 858.12146 ± 7.652
2025-05-13 10:09:24,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [860.7173, 861.4912, 859.55133, 838.0819, 856.8672, 859.5508, 869.45685, 860.663, 860.89984, 853.93506]
2025-05-13 10:09:24,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:09:24,426 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 6 hours, 25 minutes, 50 seconds)
2025-05-13 10:13:32,791 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:13:52,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 858.45667 ± 4.908
2025-05-13 10:13:52,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [861.5591, 862.52185, 851.73236, 855.62537, 854.6276, 864.0793, 861.3489, 861.3536, 862.37854, 849.33984]
2025-05-13 10:13:52,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:13:52,616 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 6 hours, 20 minutes, 58 seconds)
2025-05-13 10:18:00,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:18:20,193 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 859.24915 ± 13.098
2025-05-13 10:18:20,193 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [875.79877, 868.2452, 846.5694, 869.7328, 847.5866, 851.3125, 868.16833, 849.1088, 839.2033, 876.76556]
2025-05-13 10:18:20,193 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:18:20,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 6 hours, 15 minutes, 50 seconds)
2025-05-13 10:22:28,141 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:22:47,717 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 868.00574 ± 4.911
2025-05-13 10:22:47,717 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [875.68396, 861.6104, 869.83624, 873.7234, 862.4255, 868.26294, 871.9035, 869.9445, 860.9425, 865.72485]
2025-05-13 10:22:47,717 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:22:47,717 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (868.01) for latency ExtremeClogL1U23
2025-05-13 10:22:47,723 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 6 hours, 10 minutes, 50 seconds)
2025-05-13 10:26:55,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:27:15,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 873.30988 ± 4.814
2025-05-13 10:27:15,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [869.79, 880.091, 872.3985, 871.47723, 871.1978, 876.63464, 882.9616, 872.9006, 867.5694, 868.07806]
2025-05-13 10:27:15,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:27:15,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (873.31) for latency ExtremeClogL1U23
2025-05-13 10:27:15,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 6 hours, 6 minutes, 4 seconds)
2025-05-13 10:31:23,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:31:42,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 869.20178 ± 6.539
2025-05-13 10:31:42,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [873.58563, 878.42725, 871.2466, 865.6103, 873.7649, 875.6987, 869.4152, 863.232, 865.84204, 855.19543]
2025-05-13 10:31:42,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:31:42,847 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 6 hours, 1 minute, 22 seconds)
2025-05-13 10:35:50,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:36:10,029 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 860.36902 ± 7.190
2025-05-13 10:36:10,029 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [874.22363, 862.2371, 862.3745, 845.27576, 859.0157, 861.98065, 858.4482, 854.6968, 858.341, 867.09784]
2025-05-13 10:36:10,029 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:36:10,039 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 56 minutes, 38 seconds)
2025-05-13 10:40:17,706 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:40:37,465 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 885.54755 ± 3.341
2025-05-13 10:40:37,466 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [882.7368, 881.48706, 890.9956, 882.59216, 881.7285, 885.3158, 885.8816, 885.7476, 890.6847, 888.3061]
2025-05-13 10:40:37,466 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:40:37,466 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (885.55) for latency ExtremeClogL1U23
2025-05-13 10:40:37,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 52 minutes, 8 seconds)
2025-05-13 10:44:45,233 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:45:04,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 879.20392 ± 5.534
2025-05-13 10:45:04,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [887.65393, 882.101, 880.6328, 869.35925, 883.49036, 884.0621, 871.27905, 878.54004, 874.6131, 880.3072]
2025-05-13 10:45:04,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:45:04,652 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 47 minutes, 36 seconds)
2025-05-13 10:49:12,733 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:49:32,587 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 868.29657 ± 28.865
2025-05-13 10:49:32,587 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [887.9481, 896.81934, 793.17096, 898.6847, 878.4091, 880.6586, 857.00977, 869.67255, 865.40717, 855.1854]
2025-05-13 10:49:32,587 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:49:32,594 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 43 minutes, 12 seconds)
2025-05-13 10:53:40,362 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:54:00,127 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 860.27625 ± 6.261
2025-05-13 10:54:00,127 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [845.52966, 856.96545, 855.6343, 861.79144, 866.25305, 858.7056, 859.693, 866.7724, 865.75903, 865.6589]
2025-05-13 10:54:00,128 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:54:00,134 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 38 minutes, 46 seconds)
2025-05-13 10:57:56,186 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:58:15,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 875.12793 ± 9.589
2025-05-13 10:58:15,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [873.8209, 874.6332, 875.10236, 867.9061, 875.21515, 870.9104, 881.8667, 855.4334, 894.2278, 882.1635]
2025-05-13 10:58:15,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:58:15,644 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 31 minutes, 24 seconds)
2025-05-13 11:02:22,948 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:02:42,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 881.02490 ± 21.619
2025-05-13 11:02:42,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [819.73584, 890.7393, 881.77716, 880.96533, 886.70276, 885.9949, 896.1567, 904.07166, 880.7504, 883.3547]
2025-05-13 11:02:42,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:02:42,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 26 minutes, 49 seconds)
2025-05-13 11:06:50,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:07:09,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 893.73132 ± 4.230
2025-05-13 11:07:09,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [902.3457, 891.8736, 894.3284, 890.2586, 897.4074, 889.101, 889.1027, 895.5574, 897.48694, 889.85187]
2025-05-13 11:07:10,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:07:10,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (893.73) for latency ExtremeClogL1U23
2025-05-13 11:07:10,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 22 minutes, 30 seconds)
2025-05-13 11:11:17,892 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:11:37,682 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 900.05212 ± 5.185
2025-05-13 11:11:37,682 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [901.25684, 889.0296, 908.7281, 902.06573, 896.9725, 899.39343, 901.4232, 894.8028, 904.9251, 901.92395]
2025-05-13 11:11:37,682 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:11:37,682 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (900.05) for latency ExtremeClogL1U23
2025-05-13 11:11:37,688 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 5 hours, 18 minutes, 1 second)
2025-05-13 11:15:45,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:16:04,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 906.22217 ± 8.198
2025-05-13 11:16:04,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [912.74835, 899.65955, 910.03357, 899.74445, 907.18396, 910.28876, 890.80945, 902.7719, 922.4342, 906.54706]
2025-05-13 11:16:04,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:16:04,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (906.22) for latency ExtremeClogL1U23
2025-05-13 11:16:04,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 5 hours, 13 minutes, 26 seconds)
2025-05-13 11:20:12,373 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:20:32,074 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 901.20911 ± 7.390
2025-05-13 11:20:32,074 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [908.72614, 900.5789, 897.0591, 902.0557, 898.4593, 906.77655, 882.1663, 906.7595, 907.41, 902.09906]
2025-05-13 11:20:32,074 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:20:32,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 5 hours, 11 minutes, 50 seconds)
2025-05-13 11:24:39,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:24:59,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 883.00018 ± 38.189
2025-05-13 11:24:59,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [903.6937, 896.24115, 896.47327, 890.84015, 901.73444, 909.34125, 889.94037, 874.7265, 771.5508, 895.45984]
2025-05-13 11:24:59,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:24:59,143 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 5 hours, 7 minutes, 26 seconds)
2025-05-13 11:29:06,766 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:29:26,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 871.53333 ± 43.844
2025-05-13 11:29:26,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [887.3394, 878.1362, 863.2446, 893.229, 877.67206, 896.2056, 892.33453, 895.6614, 888.2342, 743.276]
2025-05-13 11:29:26,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:29:26,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 5 hours, 2 minutes, 56 seconds)
2025-05-13 11:33:34,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:33:51,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 827.19226 ± 252.784
2025-05-13 11:33:51,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [907.19995, 902.60876, 909.20715, 911.71063, 915.5135, 913.1583, 909.819, 912.06757, 68.97706, 921.6611]
2025-05-13 11:33:51,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 133.0, 1000.0]
2025-05-13 11:33:52,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 57 minutes, 59 seconds)
2025-05-13 11:38:07,450 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:38:27,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 928.50586 ± 9.859
2025-05-13 11:38:27,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [943.27014, 918.2107, 933.0744, 919.3934, 922.7403, 912.4034, 934.28516, 940.2838, 924.78705, 936.61035]
2025-05-13 11:38:27,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:38:27,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (928.51) for latency ExtremeClogL1U23
2025-05-13 11:38:27,014 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 55 minutes, 20 seconds)
2025-05-13 11:42:34,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:42:54,207 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 981.68323 ± 19.435
2025-05-13 11:42:54,208 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [957.6437, 976.0913, 961.70886, 993.85425, 967.29865, 1002.0394, 965.2396, 1022.7242, 981.8527, 988.37897]
2025-05-13 11:42:54,208 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:42:54,208 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (981.68) for latency ExtremeClogL1U23
2025-05-13 11:42:54,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 50 minutes, 47 seconds)
2025-05-13 11:47:01,726 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:47:21,490 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1001.92712 ± 55.529
2025-05-13 11:47:21,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [970.82733, 954.8903, 1045.9401, 1050.1154, 1014.9703, 965.3184, 902.3375, 965.2877, 1073.0219, 1076.563]
2025-05-13 11:47:21,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:47:21,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1001.93) for latency ExtremeClogL1U23
2025-05-13 11:47:21,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 46 minutes, 22 seconds)
2025-05-13 11:51:21,101 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:51:41,010 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 845.08240 ± 15.467
2025-05-13 11:51:41,010 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [854.2724, 840.4819, 847.7826, 845.3999, 839.5476, 839.959, 836.2942, 871.8858, 811.95984, 863.2417]
2025-05-13 11:51:41,010 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:51:41,018 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 40 minutes, 14 seconds)
2025-05-13 11:55:48,920 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:56:05,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 880.08899 ± 348.773
2025-05-13 11:56:05,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1067.4297, 1076.2926, 221.21182, 1146.4078, 1077.9629, 1217.7717, 331.91672, 1063.3518, 1061.4397, 537.1051]
2025-05-13 11:56:05,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 295.0, 1000.0, 1000.0, 1000.0, 520.0, 1000.0, 1000.0, 506.0]
2025-05-13 11:56:05,186 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 35 minutes, 31 seconds)
2025-05-13 12:00:23,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:00:42,517 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1250.79321 ± 235.186
2025-05-13 12:00:42,517 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1284.5883, 1401.7423, 1355.4865, 1359.2859, 1091.3058, 601.26807, 1421.9954, 1347.268, 1250.9197, 1394.0725]
2025-05-13 12:00:42,517 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:00:42,517 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1250.79) for latency ExtremeClogL1U23
2025-05-13 12:00:42,524 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 31 minutes, 33 seconds)
2025-05-13 12:04:49,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:05:09,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1182.47522 ± 167.544
2025-05-13 12:05:09,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1399.4893, 1284.9425, 994.5285, 1251.95, 1229.352, 1192.2252, 1345.346, 807.30115, 1238.4298, 1081.1875]
2025-05-13 12:05:09,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:05:09,165 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 26 minutes, 59 seconds)
2025-05-13 12:09:16,829 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:09:35,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1476.50427 ± 51.528
2025-05-13 12:09:35,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1545.4952, 1378.7537, 1511.5028, 1452.6831, 1456.8033, 1528.8423, 1432.9205, 1543.0828, 1446.0724, 1468.8867]
2025-05-13 12:09:35,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:09:35,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1476.50) for latency ExtremeClogL1U23
2025-05-13 12:09:35,984 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 22 minutes, 26 seconds)
2025-05-13 12:13:43,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:14:02,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1413.14050 ± 123.438
2025-05-13 12:14:02,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1481.3616, 1424.1145, 1308.3855, 1470.6577, 1081.135, 1478.626, 1447.737, 1517.7913, 1428.3534, 1493.2433]
2025-05-13 12:14:02,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:14:02,330 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 19 minutes, 19 seconds)
2025-05-13 12:18:09,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:18:29,308 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1550.34167 ± 35.801
2025-05-13 12:18:29,308 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1500.0973, 1530.3645, 1516.1862, 1539.4402, 1583.9856, 1598.5216, 1571.3159, 1602.8921, 1506.323, 1554.29]
2025-05-13 12:18:29,308 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:18:29,308 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1550.34) for latency ExtremeClogL1U23
2025-05-13 12:18:29,314 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 4 hours, 15 minutes, 23 seconds)
2025-05-13 12:22:36,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:22:56,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1462.54993 ± 41.708
2025-05-13 12:22:56,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1393.3181, 1447.3751, 1518.7631, 1473.3381, 1465.755, 1392.7863, 1449.6752, 1477.6349, 1487.8809, 1518.9713]
2025-05-13 12:22:56,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:22:56,352 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 4 hours, 8 minutes, 58 seconds)
2025-05-13 12:27:03,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:27:23,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1576.93872 ± 51.844
2025-05-13 12:27:23,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1679.8877, 1522.4458, 1529.484, 1623.8419, 1572.2145, 1598.665, 1589.1837, 1551.2932, 1495.3737, 1606.9974]
2025-05-13 12:27:23,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:27:23,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1576.94) for latency ExtremeClogL1U23
2025-05-13 12:27:23,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 4 hours, 4 minutes, 35 seconds)
2025-05-13 12:31:35,824 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:31:55,370 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1586.16992 ± 38.959
2025-05-13 12:31:55,370 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1584.4937, 1511.8927, 1589.9646, 1579.942, 1559.6355, 1636.306, 1646.9412, 1576.4576, 1624.0374, 1552.0295]
2025-05-13 12:31:55,370 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:31:55,370 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1586.17) for latency ExtremeClogL1U23
2025-05-13 12:31:55,376 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 4 hours, 1 minute, 5 seconds)
2025-05-13 12:36:02,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:36:21,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1595.60327 ± 36.792
2025-05-13 12:36:21,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1632.6443, 1627.4685, 1630.6377, 1618.6201, 1611.9287, 1557.7123, 1548.1985, 1594.3817, 1523.4622, 1610.9795]
2025-05-13 12:36:21,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:36:21,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1595.60) for latency ExtremeClogL1U23
2025-05-13 12:36:21,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 56 minutes, 37 seconds)
2025-05-13 12:40:27,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:40:46,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1597.62915 ± 52.554
2025-05-13 12:40:46,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1626.1677, 1644.7926, 1628.454, 1536.6893, 1491.1091, 1561.5471, 1654.0381, 1653.1171, 1569.0204, 1611.355]
2025-05-13 12:40:46,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:40:46,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1597.63) for latency ExtremeClogL1U23
2025-05-13 12:40:46,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 51 minutes, 48 seconds)
2025-05-13 12:44:54,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:45:13,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1363.78284 ± 618.191
2025-05-13 12:45:13,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1548.8871, 1620.9165, 1652.0803, 1692.353, 889.09454, 1623.0752, 1682.3793, -361.7507, 1644.5166, 1646.2765]
2025-05-13 12:45:13,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:45:13,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 47 minutes, 19 seconds)
2025-05-13 12:49:20,923 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:49:40,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1563.29797 ± 237.687
2025-05-13 12:49:40,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1674.9926, 1666.6078, 863.6939, 1651.0576, 1543.4388, 1637.0562, 1688.8499, 1679.8014, 1660.6179, 1566.8629]
2025-05-13 12:49:40,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:49:40,089 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 42 minutes, 47 seconds)
2025-05-13 12:53:47,372 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:54:06,912 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1618.24365 ± 50.951
2025-05-13 12:54:06,912 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1609.5316, 1672.0366, 1567.5543, 1591.8774, 1510.0082, 1605.2592, 1673.9589, 1620.4922, 1655.8861, 1675.832]
2025-05-13 12:54:06,913 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:54:06,913 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1618.24) for latency ExtremeClogL1U23
2025-05-13 12:54:06,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 37 minutes, 29 seconds)
2025-05-13 12:58:14,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:58:33,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1674.19006 ± 64.769
2025-05-13 12:58:33,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1739.7632, 1627.2548, 1733.5509, 1715.4397, 1553.0051, 1737.8837, 1607.1239, 1639.3336, 1644.4692, 1744.0759]
2025-05-13 12:58:33,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:58:33,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1674.19) for latency ExtremeClogL1U23
2025-05-13 12:58:33,702 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 33 minutes, 6 seconds)
2025-05-13 13:02:41,127 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:03:00,542 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1673.56775 ± 70.282
2025-05-13 13:03:00,542 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1636.394, 1719.4628, 1751.8364, 1650.2074, 1591.0183, 1750.0925, 1621.935, 1574.4432, 1789.9124, 1650.3749]
2025-05-13 13:03:00,542 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:03:00,549 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 28 minutes, 58 seconds)
2025-05-13 13:07:07,896 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:07:27,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1519.27747 ± 267.150
2025-05-13 13:07:27,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1525.4082, 1681.5635, 1577.7291, 1503.0079, 1621.7428, 1638.8962, 1594.0212, 735.43884, 1639.6619, 1675.3037]
2025-05-13 13:07:27,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:07:27,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 24 minutes, 29 seconds)
2025-05-13 13:11:34,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:11:54,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1719.23987 ± 42.248
2025-05-13 13:11:54,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1770.9901, 1704.0492, 1703.021, 1695.605, 1690.584, 1690.697, 1707.741, 1705.4524, 1827.2498, 1697.0078]
2025-05-13 13:11:54,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:11:54,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1719.24) for latency ExtremeClogL1U23
2025-05-13 13:11:54,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 20 minutes, 7 seconds)
2025-05-13 13:16:01,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:16:21,235 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1667.77759 ± 75.318
2025-05-13 13:16:21,235 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1656.4739, 1729.2377, 1685.2504, 1511.2301, 1643.2805, 1554.8071, 1752.111, 1693.0731, 1719.8301, 1732.4829]
2025-05-13 13:16:21,235 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:16:21,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 15 minutes, 42 seconds)
2025-05-13 13:20:04,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:20:23,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1629.73120 ± 41.188
2025-05-13 13:20:23,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1589.8168, 1649.2302, 1621.9635, 1581.7805, 1703.3523, 1568.9863, 1604.5498, 1669.3461, 1662.0833, 1646.2035]
2025-05-13 13:20:23,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:20:23,625 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 3 hours, 7 minutes, 45 seconds)
2025-05-13 13:24:31,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:24:50,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1726.03687 ± 67.844
2025-05-13 13:24:50,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1658.1091, 1752.2262, 1596.5492, 1748.4573, 1809.0986, 1701.7751, 1749.9336, 1659.7266, 1825.6716, 1758.8213]
2025-05-13 13:24:50,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:24:50,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1726.04) for latency ExtremeClogL1U23
2025-05-13 13:24:50,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 3 hours, 3 minutes, 25 seconds)
2025-05-13 13:28:58,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:29:17,450 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1668.09839 ± 46.223
2025-05-13 13:29:17,450 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1670.6821, 1708.2074, 1655.1825, 1657.7285, 1668.7927, 1720.791, 1543.3646, 1674.5339, 1696.49, 1685.2112]
2025-05-13 13:29:17,450 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:29:17,460 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 59 minutes, 4 seconds)
2025-05-13 13:33:25,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:33:44,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1643.63159 ± 50.506
2025-05-13 13:33:44,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1721.4491, 1620.8444, 1643.3993, 1574.1992, 1606.9961, 1606.0269, 1670.5039, 1610.3777, 1641.2993, 1741.2206]
2025-05-13 13:33:44,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:33:44,598 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 54 minutes, 42 seconds)
2025-05-13 13:37:52,373 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:38:11,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1706.06604 ± 41.029
2025-05-13 13:38:11,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1656.6783, 1721.5055, 1647.6393, 1710.6079, 1737.1344, 1790.3118, 1718.4985, 1709.0492, 1712.6881, 1656.5469]
2025-05-13 13:38:11,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:38:11,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 50 minutes, 20 seconds)
2025-05-13 13:42:30,652 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:42:49,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1592.56909 ± 305.192
2025-05-13 13:42:49,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [697.30475, 1617.3259, 1728.3192, 1777.355, 1740.3379, 1634.1542, 1562.2711, 1751.1747, 1701.2009, 1716.2477]
2025-05-13 13:42:49,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:42:49,835 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 50 minutes, 31 seconds)
2025-05-13 13:46:57,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:47:16,320 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1713.89905 ± 71.129
2025-05-13 13:47:16,320 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1847.7275, 1689.1869, 1730.928, 1708.3983, 1798.2683, 1599.5492, 1772.149, 1683.3438, 1644.0573, 1665.3816]
2025-05-13 13:47:16,320 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:47:16,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 45 minutes, 57 seconds)
2025-05-13 13:51:34,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:51:54,362 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1755.00745 ± 57.748
2025-05-13 13:51:54,362 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1729.2869, 1690.9567, 1754.5471, 1755.7485, 1808.2045, 1782.5093, 1830.1718, 1835.9906, 1712.4156, 1650.2433]
2025-05-13 13:51:54,362 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:51:54,362 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1755.01) for latency ExtremeClogL1U23
2025-05-13 13:51:54,373 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 42 minutes, 49 seconds)
2025-05-13 13:56:01,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:56:20,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1647.44080 ± 239.390
2025-05-13 13:56:20,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1798.2593, 949.61993, 1729.8447, 1689.8057, 1698.709, 1643.0874, 1784.2462, 1631.2085, 1793.7113, 1755.9177]
2025-05-13 13:56:20,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 568.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:56:20,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 38 minutes, 9 seconds)
2025-05-13 14:00:03,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:00:22,290 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1775.98669 ± 33.510
2025-05-13 14:00:22,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1770.0823, 1723.551, 1818.3065, 1831.1139, 1753.4393, 1809.7297, 1787.9287, 1768.9749, 1737.1763, 1759.5642]
2025-05-13 14:00:22,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:00:22,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1775.99) for latency ExtremeClogL1U23
2025-05-13 14:00:22,299 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 30 minutes, 48 seconds)
2025-05-13 14:04:30,367 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:04:49,462 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1571.75281 ± 501.730
2025-05-13 14:04:49,462 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1657.0648, 1750.5725, 1692.0598, 1793.2365, 1699.4325, 1698.2125, 1747.8905, 1803.8922, 1801.5864, 73.5809]
2025-05-13 14:04:49,462 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:04:49,471 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 25 minutes, 9 seconds)
2025-05-13 14:08:57,253 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:09:16,450 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1744.90918 ± 37.708
2025-05-13 14:09:16,450 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1710.2083, 1757.6201, 1756.4153, 1693.0068, 1748.9536, 1833.1053, 1736.2108, 1767.4528, 1704.4368, 1741.682]
2025-05-13 14:09:16,450 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:09:16,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 20 minutes, 48 seconds)
2025-05-13 14:13:23,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:13:43,041 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1738.91370 ± 55.104
2025-05-13 14:13:43,041 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1690.8689, 1736.5383, 1676.1345, 1840.6562, 1670.218, 1702.94, 1801.7498, 1713.2062, 1775.6757, 1781.149]
2025-05-13 14:13:43,041 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:13:43,056 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 15 minutes, 13 seconds)
2025-05-13 14:18:08,601 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:18:28,131 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1751.40942 ± 62.131
2025-05-13 14:18:28,131 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1757.8558, 1804.72, 1781.562, 1672.1035, 1776.076, 1603.5688, 1817.4075, 1746.6213, 1790.4703, 1763.7087]
2025-05-13 14:18:28,131 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:18:28,141 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 12 minutes, 47 seconds)
2025-05-13 14:22:36,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:22:55,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1673.08228 ± 172.981
2025-05-13 14:22:55,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1803.3403, 1736.2935, 1759.8822, 1787.2488, 1753.163, 1355.3315, 1767.7175, 1693.1798, 1765.4768, 1309.1903]
2025-05-13 14:22:55,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 788.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:22:55,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 10 minutes, 47 seconds)
2025-05-13 14:27:02,929 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:27:22,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1747.86450 ± 70.049
2025-05-13 14:27:22,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1714.5381, 1856.2473, 1756.8807, 1777.3961, 1816.6119, 1700.6453, 1766.5897, 1788.4103, 1589.7902, 1711.5343]
2025-05-13 14:27:22,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:27:22,159 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 2 hours, 6 minutes, 15 seconds)
2025-05-13 14:31:29,713 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:31:47,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1615.98865 ± 423.892
2025-05-13 14:31:47,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1689.7471, 1768.6343, 350.68973, 1722.4446, 1760.2917, 1848.1079, 1762.76, 1782.858, 1771.2527, 1703.1011]
2025-05-13 14:31:47,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 218.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:31:47,708 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 2 hours, 1 minute, 36 seconds)
2025-05-13 14:35:55,371 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:36:14,833 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1736.85388 ± 42.759
2025-05-13 14:36:14,833 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1791.9725, 1678.4207, 1729.5131, 1758.9462, 1672.4218, 1783.6233, 1760.857, 1783.9406, 1699.2719, 1709.5726]
2025-05-13 14:36:14,833 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:36:14,843 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 57 minutes, 9 seconds)
2025-05-13 14:40:22,684 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:40:41,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1727.74976 ± 43.184
2025-05-13 14:40:41,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1724.9403, 1779.9272, 1757.6235, 1691.6946, 1675.5554, 1802.1194, 1671.3004, 1762.7651, 1698.9689, 1712.6036]
2025-05-13 14:40:41,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:40:41,875 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 51 minutes, 8 seconds)
2025-05-13 14:44:49,422 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:45:08,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1660.38501 ± 28.567
2025-05-13 14:45:08,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1632.9468, 1640.1023, 1692.2917, 1626.5581, 1705.7994, 1618.4769, 1671.6343, 1686.386, 1674.1449, 1655.5099]
2025-05-13 14:45:08,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:45:08,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 46 minutes, 41 seconds)
2025-05-13 14:49:14,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:49:33,111 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1743.02808 ± 76.055
2025-05-13 14:49:33,112 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1787.5244, 1561.6857, 1710.2386, 1733.6473, 1706.0422, 1790.4608, 1766.8114, 1707.5238, 1825.4554, 1840.8916]
2025-05-13 14:49:33,112 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:49:33,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 42 minutes, 2 seconds)
2025-05-13 14:53:31,279 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:53:50,337 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1781.95337 ± 57.095
2025-05-13 14:53:50,337 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1847.3142, 1821.3005, 1710.7367, 1709.5419, 1741.2389, 1803.3733, 1815.4923, 1835.0302, 1697.0372, 1838.4683]
2025-05-13 14:53:50,337 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:53:50,337 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1781.95) for latency ExtremeClogL1U23
2025-05-13 14:53:50,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 36 minutes, 59 seconds)
2025-05-13 14:57:48,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:58:07,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1740.95996 ± 50.161
2025-05-13 14:58:07,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1761.3507, 1705.4629, 1658.8085, 1788.6708, 1680.9313, 1822.1213, 1752.4514, 1712.9302, 1729.2167, 1797.6555]
2025-05-13 14:58:07,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:58:07,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 31 minutes, 54 seconds)
2025-05-13 15:02:05,963 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:02:25,317 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1754.57300 ± 59.828
2025-05-13 15:02:25,317 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1751.7876, 1723.6794, 1714.4537, 1697.4207, 1773.6472, 1648.4946, 1816.7303, 1871.2009, 1781.4414, 1766.8738]
2025-05-13 15:02:25,317 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:02:25,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 26 minutes, 53 seconds)
2025-05-13 15:06:25,057 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:06:43,314 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1682.90942 ± 275.355
2025-05-13 15:06:43,314 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1756.7852, 860.42615, 1787.7273, 1728.2388, 1755.1807, 1804.4429, 1757.7308, 1802.35, 1763.2787, 1812.9335]
2025-05-13 15:06:43,314 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 518.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:06:43,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 21 minutes, 59 seconds)
2025-05-13 15:10:41,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:11:00,651 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1766.62537 ± 38.113
2025-05-13 15:11:00,651 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1742.2413, 1707.9838, 1784.3743, 1802.0809, 1755.1753, 1804.5597, 1765.4374, 1829.0562, 1707.883, 1767.463]
2025-05-13 15:11:00,651 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:11:00,661 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 17 minutes, 15 seconds)
2025-05-13 15:14:59,004 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:15:18,229 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1709.08691 ± 77.965
2025-05-13 15:15:18,229 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1796.7307, 1733.8015, 1705.5232, 1492.6975, 1700.9946, 1723.7301, 1739.8997, 1702.0175, 1721.387, 1774.0887]
2025-05-13 15:15:18,229 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:15:18,238 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 12 minutes, 58 seconds)
2025-05-13 15:19:16,481 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:19:35,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1725.54224 ± 62.681
2025-05-13 15:19:35,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1738.9755, 1668.088, 1619.6182, 1742.1515, 1799.247, 1662.1226, 1737.0298, 1806.1312, 1803.2814, 1678.7776]
2025-05-13 15:19:35,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:19:35,629 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 8 minutes, 40 seconds)
2025-05-13 15:23:34,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:23:53,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1750.72144 ± 34.071
2025-05-13 15:23:53,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1809.0137, 1717.512, 1736.8812, 1727.2092, 1691.8269, 1763.533, 1802.7852, 1754.2169, 1753.4725, 1750.764]
2025-05-13 15:23:53,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:23:53,503 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 4 minutes, 24 seconds)
2025-05-13 15:27:51,870 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:28:10,479 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1705.55103 ± 109.111
2025-05-13 15:28:10,479 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1779.3752, 1702.0876, 1760.2775, 1699.5786, 1409.4221, 1808.4971, 1644.9241, 1727.6436, 1791.5582, 1732.1462]
2025-05-13 15:28:10,479 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 799.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:28:10,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 1 hour, 4 seconds)
2025-05-13 15:32:15,013 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:32:34,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1741.53259 ± 69.764
2025-05-13 15:32:34,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1717.0654, 1780.0361, 1772.7562, 1810.2185, 1789.08, 1617.5054, 1841.4309, 1735.4343, 1723.9677, 1627.8313]
2025-05-13 15:32:34,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:32:34,794 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 56 minutes, 4 seconds)
2025-05-13 15:37:03,360 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:37:25,601 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1755.83179 ± 42.271
2025-05-13 15:37:25,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1775.5748, 1724.6396, 1679.0675, 1738.6116, 1752.9861, 1829.5719, 1738.7994, 1790.0295, 1804.8909, 1724.1469]
2025-05-13 15:37:25,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:37:25,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 53 minutes, 5 seconds)
2025-05-13 15:41:52,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:42:11,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1735.11951 ± 128.152
2025-05-13 15:42:11,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1754.5731, 1854.6967, 1749.0421, 1786.4282, 1752.4135, 1362.9142, 1745.7738, 1753.954, 1802.1094, 1789.2899]
2025-05-13 15:42:11,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:42:11,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 49 minutes, 43 seconds)
2025-05-13 15:45:58,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:46:15,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1540.91064 ± 495.720
2025-05-13 15:46:15,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1808.5431, 348.5902, 1803.5378, 1808.8346, 1711.2013, 1783.856, 1779.9711, 1775.0219, 1795.8889, 793.6621]
2025-05-13 15:46:15,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 250.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:46:15,668 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 44 minutes, 44 seconds)
2025-05-13 15:50:14,899 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:50:34,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1795.29272 ± 23.776
2025-05-13 15:50:34,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1768.699, 1828.8572, 1777.0195, 1812.3424, 1786.2333, 1763.5896, 1824.7805, 1825.5177, 1779.8286, 1786.0604]
2025-05-13 15:50:34,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:50:34,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1795.29) for latency ExtremeClogL1U23
2025-05-13 15:50:34,401 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 40 minutes, 19 seconds)
2025-05-13 15:55:19,513 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:55:39,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1557.22803 ± 513.320
2025-05-13 15:55:39,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1762.5408, 1818.3971, 1655.1418, 1797.6559, 1718.1713, 1742.2568, 28.12447, 1600.0017, 1704.9747, 1745.0143]
2025-05-13 15:55:39,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 37.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:55:39,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 36 minutes, 55 seconds)
2025-05-13 16:00:18,601 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:00:39,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1678.42651 ± 278.292
2025-05-13 16:00:39,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1809.4607, 1759.0729, 1720.1553, 1857.2532, 1748.7312, 853.21014, 1730.9883, 1731.0583, 1818.5138, 1755.8221]
2025-05-13 16:00:39,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:00:39,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 32 minutes, 30 seconds)
2025-05-13 16:04:37,052 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:04:56,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1724.31116 ± 78.103
2025-05-13 16:04:56,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1734.7941, 1805.2891, 1688.0632, 1681.338, 1740.9567, 1861.3431, 1702.7389, 1548.1622, 1732.5042, 1747.9213]
2025-05-13 16:04:56,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:04:56,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 27 minutes, 17 seconds)
2025-05-13 16:08:53,892 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:09:13,233 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1776.10327 ± 47.131
2025-05-13 16:09:13,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1809.2712, 1751.9043, 1725.9679, 1840.0264, 1676.453, 1774.5675, 1813.4017, 1825.995, 1777.8984, 1765.548]
2025-05-13 16:09:13,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:09:13,243 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 22 minutes, 57 seconds)
2025-05-13 16:13:10,762 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:13:29,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1773.65076 ± 39.945
2025-05-13 16:13:29,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1757.5234, 1755.5487, 1739.188, 1805.4734, 1768.7867, 1759.957, 1765.8745, 1753.9784, 1747.7274, 1882.4495]
2025-05-13 16:13:29,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:13:29,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 18 minutes, 20 seconds)
2025-05-13 16:17:27,289 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:17:46,188 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1732.71057 ± 44.209
2025-05-13 16:17:46,188 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1818.5686, 1744.223, 1765.1371, 1767.1321, 1670.2385, 1718.5898, 1725.7426, 1670.63, 1694.9009, 1751.9443]
2025-05-13 16:17:46,188 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:17:46,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 13 minutes, 16 seconds)
2025-05-13 16:21:43,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:22:02,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1752.01782 ± 40.848
2025-05-13 16:22:02,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1791.7335, 1756.4437, 1697.0343, 1783.3838, 1703.8956, 1711.9797, 1728.4006, 1732.8722, 1798.6991, 1815.7335]
2025-05-13 16:22:02,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:22:02,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 8 minutes, 33 seconds)
2025-05-13 16:26:23,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:26:42,520 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1772.74219 ± 49.418
2025-05-13 16:26:42,520 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1747.4224, 1735.2686, 1819.1681, 1779.7467, 1828.5472, 1775.7766, 1771.7882, 1672.6075, 1851.8807, 1745.2166]
2025-05-13 16:26:42,520 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:26:42,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 21 seconds)
2025-05-13 16:31:02,795 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:31:22,424 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1741.26367 ± 107.277
2025-05-13 16:31:22,424 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1779.7197, 1825.7579, 1795.1749, 1856.504, 1582.7449, 1772.5819, 1791.6482, 1494.0481, 1755.6455, 1758.8125]
2025-05-13 16:31:22,424 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:31:22,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1251 [DEBUG]: Training session finished
