2025-05-13 09:03:31,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mda-highdim-mem4
2025-05-13 09:03:31,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mda-highdim-mem4
2025-05-13 09:03:31,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14f32efe6290>}
2025-05-13 09:03:31,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1111 [DEBUG]: using device: cuda
2025-05-13 09:03:31,857 baseline-bpql-mda-noisy-ant:91 [WARNING]: args.assumed_delay != args.horizon: 4 != 32
2025-05-13 09:03:31,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1133 [INFO]: Creating new trainer
2025-05-13 09:03:31,881 baseline-bpql-mda-noisy-ant:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-05-13 09:03:31,881 baseline-bpql-mda-noisy-ant:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:03:31,888 baseline-bpql-mda-noisy-ant:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=27, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(8, 512, batch_first=True)
)
2025-05-13 09:03:32,496 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1194 [DEBUG]: Starting training session...
2025-05-13 09:03:32,496 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 1/100
2025-05-13 09:07:17,414 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:07:31,737 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -412.93179 ± 83.578
2025-05-13 09:07:31,737 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-397.5725, -400.49036, -255.64261, -407.4937, -409.12558, -387.18506, -621.00946, -437.4807, -410.98282, -402.3348]
2025-05-13 09:07:31,737 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:07:31,737 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (-412.93) for latency ExtremeSparseL4U32
2025-05-13 09:07:31,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 6 hours, 34 minutes, 45 seconds)
2025-05-13 09:11:22,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:11:36,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 551.61682 ± 28.981
2025-05-13 09:11:36,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [552.9356, 550.3589, 593.34784, 575.3428, 537.6768, 588.76605, 523.7584, 543.0938, 491.54752, 559.34094]
2025-05-13 09:11:36,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:11:36,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (551.62) for latency ExtremeSparseL4U32
2025-05-13 09:11:36,370 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 35 minutes, 9 seconds)
2025-05-13 09:15:25,838 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:15:40,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 703.00378 ± 145.937
2025-05-13 09:15:40,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [788.48157, 410.72086, 415.37314, 787.2056, 773.2537, 792.85077, 775.22046, 783.71893, 729.72565, 773.4874]
2025-05-13 09:15:40,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:15:40,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (703.00) for latency ExtremeSparseL4U32
2025-05-13 09:15:40,152 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 6 hours, 32 minutes, 7 seconds)
2025-05-13 09:19:29,465 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:19:43,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 707.53998 ± 37.494
2025-05-13 09:19:43,456 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [655.8317, 708.46454, 616.1031, 726.964, 726.3685, 728.49603, 729.3001, 733.62524, 731.42084, 718.8258]
2025-05-13 09:19:43,456 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:19:43,456 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (707.54) for latency ExtremeSparseL4U32
2025-05-13 09:19:43,461 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 6 hours, 28 minutes, 23 seconds)
2025-05-13 09:23:32,687 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:23:46,987 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 784.61407 ± 13.605
2025-05-13 09:23:46,987 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [792.28815, 789.791, 781.3078, 805.92194, 764.4478, 793.67944, 793.50146, 763.4368, 769.5673, 792.1991]
2025-05-13 09:23:46,987 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:23:46,987 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (784.61) for latency ExtremeSparseL4U32
2025-05-13 09:23:46,991 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 6 hours, 24 minutes, 35 seconds)
2025-05-13 09:27:36,004 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:27:50,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 714.20630 ± 157.413
2025-05-13 09:27:50,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [755.4099, 279.73843, 782.50397, 790.5752, 776.88196, 790.846, 581.9932, 797.5809, 800.234, 786.2995]
2025-05-13 09:27:50,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:27:50,223 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 6 hours, 21 minutes, 47 seconds)
2025-05-13 09:31:39,141 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:31:53,598 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 662.17310 ± 286.949
2025-05-13 09:31:53,598 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [814.35034, 766.4323, 738.09326, -121.33508, 813.39667, 799.5227, 804.0738, 809.63196, 796.3582, 401.20657]
2025-05-13 09:31:53,598 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:31:53,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 6 hours, 17 minutes, 20 seconds)
2025-05-13 09:35:42,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:35:56,513 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 570.60919 ± 416.447
2025-05-13 09:35:56,514 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [673.6645, 705.1231, 783.3041, 778.0432, 759.1436, 614.93164, -668.9495, 699.2354, 715.8189, 645.7766]
2025-05-13 09:35:56,514 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:35:56,518 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 6 hours, 13 minutes, 1 second)
2025-05-13 09:39:45,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:39:59,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 792.05835 ± 15.391
2025-05-13 09:39:59,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [777.9392, 800.0746, 769.5389, 766.75543, 794.44226, 810.99695, 788.08514, 806.14734, 811.253, 795.35114]
2025-05-13 09:39:59,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:39:59,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (792.06) for latency ExtremeSparseL4U32
2025-05-13 09:39:59,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 8 minutes, 50 seconds)
2025-05-13 09:43:47,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:44:01,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 787.93585 ± 86.408
2025-05-13 09:44:01,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [805.63416, 838.2633, 822.54626, 824.4328, 532.3563, 800.1319, 818.9029, 786.1997, 830.04364, 820.8476]
2025-05-13 09:44:01,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:44:01,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 6 hours, 4 minutes, 25 seconds)
2025-05-13 09:47:50,208 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:48:04,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 783.26849 ± 13.026
2025-05-13 09:48:04,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [794.1426, 781.55225, 786.94574, 784.1886, 776.6121, 802.2272, 802.0175, 768.29205, 760.0814, 776.6254]
2025-05-13 09:48:04,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:48:04,756 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 6 hours, 18 seconds)
2025-05-13 09:51:50,734 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:52:04,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 708.40417 ± 181.867
2025-05-13 09:52:04,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [681.69366, 795.7705, 759.82043, 177.96779, 807.77734, 808.8345, 775.73987, 802.5801, 778.21924, 695.6388]
2025-05-13 09:52:04,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:52:04,947 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 55 minutes, 19 seconds)
2025-05-13 09:55:51,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:56:05,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 734.01691 ± 98.192
2025-05-13 09:56:05,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [569.73303, 801.69476, 787.0702, 783.09735, 765.6055, 763.24396, 790.89386, 512.5029, 765.3001, 801.0276]
2025-05-13 09:56:05,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:56:05,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 50 minutes, 33 seconds)
2025-05-13 09:59:53,127 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:00:07,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 817.97021 ± 21.297
2025-05-13 10:00:07,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [822.3499, 803.6092, 800.92267, 830.83685, 828.1154, 822.5322, 831.4531, 836.051, 765.1319, 838.6995]
2025-05-13 10:00:07,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:00:07,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (817.97) for latency ExtremeSparseL4U32
2025-05-13 10:00:07,427 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 46 minutes, 17 seconds)
2025-05-13 10:04:10,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:04:24,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 806.05713 ± 9.675
2025-05-13 10:04:24,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [802.8388, 809.051, 782.7499, 818.9623, 799.53534, 811.3655, 808.51514, 808.7831, 816.47974, 802.2903]
2025-05-13 10:04:24,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:04:24,223 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 46 minutes, 21 seconds)
2025-05-13 10:08:11,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:08:25,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 683.99902 ± 142.044
2025-05-13 10:08:25,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [749.11804, 734.582, 314.27106, 761.054, 516.3929, 762.03973, 763.3282, 739.836, 757.0347, 742.33417]
2025-05-13 10:08:25,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:08:25,101 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 41 minutes, 41 seconds)
2025-05-13 10:12:11,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:12:25,585 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 543.91797 ± 142.644
2025-05-13 10:12:25,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [411.5038, 823.3958, 442.2876, 492.0951, 802.0365, 528.6777, 427.69418, 431.1909, 510.86206, 569.43646]
2025-05-13 10:12:25,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:12:25,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 37 minutes, 42 seconds)
2025-05-13 10:16:13,867 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:16:27,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 725.58734 ± 73.231
2025-05-13 10:16:27,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [735.88617, 826.33527, 649.2773, 669.3583, 830.1954, 642.5135, 734.71155, 657.013, 825.97266, 684.6104]
2025-05-13 10:16:27,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:16:27,854 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 34 minutes, 9 seconds)
2025-05-13 10:20:03,153 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:20:17,450 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 809.31769 ± 11.587
2025-05-13 10:20:17,450 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [809.1541, 823.39014, 817.95807, 779.49603, 811.5584, 807.0428, 816.05646, 812.55884, 815.4269, 800.5352]
2025-05-13 10:20:17,450 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:20:17,454 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 26 minutes, 42 seconds)
2025-05-13 10:24:11,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:24:22,540 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -179.88594 ± 381.273
2025-05-13 10:24:22,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-12.758443, 63.557583, 248.46112, -756.353, -732.1557, -301.6283, 413.21375, -258.25482, 58.04988, -520.9916]
2025-05-13 10:24:22,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [170.0, 1000.0, 681.0, 1000.0, 1000.0, 1000.0, 1000.0, 949.0, 164.0, 1000.0]
2025-05-13 10:24:22,548 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 19 minutes, 33 seconds)
2025-05-13 10:28:13,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:28:25,962 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 627.43384 ± 210.768
2025-05-13 10:28:25,962 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [722.8105, 657.21564, 710.9156, 728.1315, 623.6816, 735.2984, 693.48804, 735.3497, 4.3700233, 663.07745]
2025-05-13 10:28:25,962 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 43.0, 1000.0]
2025-05-13 10:28:25,967 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 16 minutes, 13 seconds)
2025-05-13 10:32:16,284 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:32:30,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 665.82379 ± 44.951
2025-05-13 10:32:30,344 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [650.05133, 669.3973, 610.59973, 597.1681, 683.65375, 696.474, 735.2266, 637.33215, 735.02356, 643.31134]
2025-05-13 10:32:30,344 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:32:30,349 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 13 minutes, 14 seconds)
2025-05-13 10:36:29,113 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:36:40,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -72.32325 ± 369.337
2025-05-13 10:36:40,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-965.9595, 591.1974, -187.16927, -176.64517, -83.583694, -83.762344, 55.331745, -75.777794, -8.294412, 211.4306]
2025-05-13 10:36:40,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 179.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 44.0, 1000.0]
2025-05-13 10:36:40,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 11 minutes, 15 seconds)
2025-05-13 10:40:04,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:40:15,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -531.99890 ± 314.428
2025-05-13 10:40:15,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-556.5213, -74.2656, -437.78485, -330.295, -513.6077, -624.08734, -48.442028, -741.6164, -1069.2004, -924.16895]
2025-05-13 10:40:15,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 171.0, 1000.0, 341.0, 1000.0, 1000.0, 424.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:40:15,826 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 3 minutes, 35 seconds)
2025-05-13 10:44:04,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:44:18,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 486.93979 ± 431.663
2025-05-13 10:44:18,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [675.0694, 660.2461, 652.92, 519.7872, 702.23914, 633.5511, 679.596, -796.2052, 614.0214, 528.17267]
2025-05-13 10:44:18,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:44:18,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 59 minutes, 4 seconds)
2025-05-13 10:48:11,835 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:48:25,843 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 125.57993 ± 298.010
2025-05-13 10:48:25,843 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [221.11311, -678.51843, 291.6085, 251.9498, 211.59409, 73.3549, 417.46478, -87.62346, 246.06213, 308.7939]
2025-05-13 10:48:25,843 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:48:25,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 55 minutes, 58 seconds)
2025-05-13 10:52:11,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:52:24,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -103.76630 ± 874.038
2025-05-13 10:52:24,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1092.0294, -1033.964, -925.6242, 658.26514, -1480.9314, 743.6818, 708.1264, 691.7714, 8.125308, 684.916]
2025-05-13 10:52:24,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 18.0, 1000.0]
2025-05-13 10:52:24,425 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 50 minutes, 33 seconds)
2025-05-13 10:56:10,769 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:56:21,243 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -242.49150 ± 771.169
2025-05-13 10:56:21,243 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-827.1282, 390.32236, 471.1031, -1.9381232, -53.410046, -8.69044, 508.62247, -2208.7954, -129.91092, -565.0897]
2025-05-13 10:56:21,243 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 115.0, 18.0, 1000.0, 1000.0, 376.0, 1000.0]
2025-05-13 10:56:21,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 43 minutes, 21 seconds)
2025-05-13 11:00:01,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:00:14,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -275.01654 ± 766.164
2025-05-13 11:00:14,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1500.7617, 462.39444, -137.57555, -1470.4373, 558.0367, -1099.3889, 473.8031, 376.15887, -269.9109, -142.48387]
2025-05-13 11:00:14,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 196.0]
2025-05-13 11:00:14,407 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 43 minutes, 39 seconds)
2025-05-13 11:04:16,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:04:27,012 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -22.44882 ± 403.716
2025-05-13 11:04:27,012 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-288.01212, -172.38527, 675.07416, -427.93057, 681.03424, -13.443423, -608.28143, 80.91244, -200.98506, 49.52888]
2025-05-13 11:04:27,012 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 40.0, 1000.0, 120.0, 1000.0, 74.0]
2025-05-13 11:04:27,016 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 41 minutes, 54 seconds)
2025-05-13 11:08:20,042 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:08:30,113 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -169.22391 ± 132.604
2025-05-13 11:08:30,113 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-97.60786, -356.36694, -9.639218, -392.34595, -294.46738, -5.2042103, -172.42992, -200.9376, -54.38315, -108.856865]
2025-05-13 11:08:30,113 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 63.0, 1000.0, 1000.0, 14.0, 1000.0, 1000.0, 48.0, 1000.0]
2025-05-13 11:08:30,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 36 minutes, 58 seconds)
2025-05-13 11:12:25,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:12:35,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: -256.38251 ± 312.132
2025-05-13 11:12:35,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-1027.4392, 27.965553, 11.944164, -4.8697977, -222.20996, -127.39005, -602.0575, -227.55434, -100.632256, -291.58163]
2025-05-13 11:12:35,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 49.0, 25.0, 24.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:12:35,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 34 minutes, 27 seconds)
2025-05-13 11:16:06,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:16:20,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 615.43976 ± 55.576
2025-05-13 11:16:20,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [616.52106, 538.2721, 663.8355, 670.60864, 646.26263, 497.13385, 594.5465, 648.6917, 608.2682, 670.2575]
2025-05-13 11:16:20,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:16:20,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 27 minutes, 51 seconds)
2025-05-13 11:20:12,059 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:20:26,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 757.65344 ± 25.777
2025-05-13 11:20:26,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [757.0154, 783.05865, 779.90894, 769.6604, 769.9354, 732.03284, 735.1824, 776.4854, 774.0987, 699.1569]
2025-05-13 11:20:26,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:20:26,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 26 minutes, 36 seconds)
2025-05-13 11:24:17,437 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:24:31,621 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 811.17078 ± 73.043
2025-05-13 11:24:31,621 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [845.3586, 854.03894, 835.727, 654.3075, 842.63806, 834.9077, 678.2779, 857.4632, 858.0825, 850.9063]
2025-05-13 11:24:31,621 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:24:31,625 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 20 minutes, 59 seconds)
2025-05-13 11:28:22,977 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:28:37,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 814.17053 ± 79.689
2025-05-13 11:28:37,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [875.0436, 872.1959, 885.8171, 729.7136, 751.6023, 683.5826, 878.141, 709.54517, 878.8214, 877.24255]
2025-05-13 11:28:37,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:28:37,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 17 minutes, 31 seconds)
2025-05-13 11:32:24,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:32:38,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 805.01306 ± 61.323
2025-05-13 11:32:38,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [722.24854, 847.9405, 871.38464, 719.5754, 844.36414, 868.38464, 741.40326, 857.9561, 742.36847, 834.5052]
2025-05-13 11:32:38,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:32:38,986 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 12 minutes, 47 seconds)
2025-05-13 11:36:26,599 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:36:40,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 825.22284 ± 70.878
2025-05-13 11:36:40,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [870.65784, 859.6103, 633.0837, 851.5213, 869.71454, 870.79224, 878.7888, 787.71313, 794.59033, 835.7566]
2025-05-13 11:36:40,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:36:40,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (825.22) for latency ExtremeSparseL4U32
2025-05-13 11:36:40,891 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 12 minutes, 11 seconds)
2025-05-13 11:40:30,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:40:44,456 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 874.89520 ± 26.214
2025-05-13 11:40:44,456 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [885.2365, 893.31836, 882.88904, 871.7792, 884.42926, 800.142, 885.5715, 894.3577, 885.0816, 866.1468]
2025-05-13 11:40:44,456 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:40:44,456 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (874.90) for latency ExtremeSparseL4U32
2025-05-13 11:40:44,462 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 7 minutes, 41 seconds)
2025-05-13 11:44:33,752 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:44:48,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 879.99725 ± 6.946
2025-05-13 11:44:48,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [882.01855, 884.14575, 894.92487, 874.7508, 868.89276, 873.19415, 878.1716, 877.5366, 880.7483, 885.58984]
2025-05-13 11:44:48,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:44:48,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (880.00) for latency ExtremeSparseL4U32
2025-05-13 11:44:48,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 3 minutes, 18 seconds)
2025-05-13 11:48:37,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:48:51,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 875.91296 ± 5.346
2025-05-13 11:48:51,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [877.5314, 872.7655, 876.7199, 877.2942, 888.2032, 873.2656, 875.5829, 878.9683, 866.37006, 872.4291]
2025-05-13 11:48:51,505 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:48:51,513 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 58 minutes, 47 seconds)
2025-05-13 11:52:38,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:52:52,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 879.95001 ± 12.336
2025-05-13 11:52:52,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [873.43243, 883.59796, 882.6589, 886.4231, 896.71265, 882.70667, 877.4107, 885.169, 884.24274, 847.14594]
2025-05-13 11:52:52,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:52:52,744 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 54 minutes, 39 seconds)
2025-05-13 11:56:40,052 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:56:54,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 878.69080 ± 11.675
2025-05-13 11:56:54,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [891.79785, 854.2413, 885.546, 867.096, 889.9399, 886.3099, 882.42737, 880.545, 865.14685, 883.85834]
2025-05-13 11:56:54,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:56:54,301 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 50 minutes, 32 seconds)
2025-05-13 12:00:41,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:00:55,789 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 892.68146 ± 6.483
2025-05-13 12:00:55,789 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [899.83563, 891.7201, 904.2378, 892.336, 890.7815, 889.553, 895.6878, 894.8945, 878.7636, 889.005]
2025-05-13 12:00:55,789 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:00:55,790 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (892.68) for latency ExtremeSparseL4U32
2025-05-13 12:00:55,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 46 minutes, 6 seconds)
2025-05-13 12:04:43,767 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:04:57,980 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 895.86993 ± 6.921
2025-05-13 12:04:57,981 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [898.53625, 883.74286, 900.71985, 893.7646, 890.0177, 886.5735, 896.4406, 902.5909, 899.6604, 906.65326]
2025-05-13 12:04:57,981 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:04:57,981 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (895.87) for latency ExtremeSparseL4U32
2025-05-13 12:04:57,988 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 41 minutes, 47 seconds)
2025-05-13 12:08:46,153 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:09:00,532 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 891.26550 ± 7.805
2025-05-13 12:09:00,532 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [894.9604, 893.94415, 891.59595, 898.8159, 897.1761, 884.75775, 891.30475, 870.63684, 896.2197, 893.2435]
2025-05-13 12:09:00,532 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:09:00,543 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 37 minutes, 37 seconds)
2025-05-13 12:12:48,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:13:02,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 902.09973 ± 4.250
2025-05-13 12:13:02,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [894.9259, 899.6606, 898.1603, 908.97345, 906.32996, 897.9227, 902.3676, 901.4451, 905.5028, 905.7083]
2025-05-13 12:13:02,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:13:02,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (902.10) for latency ExtremeSparseL4U32
2025-05-13 12:13:02,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 33 minutes, 40 seconds)
2025-05-13 12:16:49,565 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:17:03,706 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 891.37274 ± 4.862
2025-05-13 12:17:03,706 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [890.30585, 881.09485, 892.744, 899.3971, 894.72656, 892.276, 889.18805, 894.7456, 885.813, 893.43726]
2025-05-13 12:17:03,706 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:17:03,713 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 29 minutes, 37 seconds)
2025-05-13 12:20:51,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:21:05,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 897.45087 ± 3.887
2025-05-13 12:21:05,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [902.8705, 896.8386, 898.2248, 900.40295, 895.9061, 903.34863, 896.0796, 896.46765, 895.0603, 889.3099]
2025-05-13 12:21:05,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:21:05,293 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 25 minutes, 36 seconds)
2025-05-13 12:24:52,891 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:25:06,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 898.75555 ± 6.021
2025-05-13 12:25:06,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [898.6228, 898.04266, 893.0794, 887.3937, 898.5024, 899.2266, 909.643, 907.60095, 896.86554, 898.57874]
2025-05-13 12:25:06,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:25:06,861 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 21 minutes, 28 seconds)
2025-05-13 12:28:54,154 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:29:08,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 902.15710 ± 6.787
2025-05-13 12:29:08,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [896.9489, 896.92615, 908.14685, 896.0303, 902.8962, 909.0982, 913.8828, 890.6118, 901.3329, 905.6975]
2025-05-13 12:29:08,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:29:08,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (902.16) for latency ExtremeSparseL4U32
2025-05-13 12:29:08,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 17 minutes, 17 seconds)
2025-05-13 12:32:55,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:33:09,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 899.56708 ± 3.846
2025-05-13 12:33:09,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [893.8191, 898.66235, 899.2713, 898.66907, 906.0412, 896.3008, 904.3, 897.5725, 904.54126, 896.4933]
2025-05-13 12:33:09,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:33:09,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 13 minutes, 10 seconds)
2025-05-13 12:36:55,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:37:10,370 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 924.64227 ± 9.836
2025-05-13 12:37:10,370 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [935.279, 928.2855, 906.24133, 914.244, 936.6721, 935.2749, 931.93304, 920.0394, 917.8747, 920.57904]
2025-05-13 12:37:10,370 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:37:10,370 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (924.64) for latency ExtremeSparseL4U32
2025-05-13 12:37:10,376 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 9 minutes, 2 seconds)
2025-05-13 12:40:57,111 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:41:11,223 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 891.23767 ± 6.412
2025-05-13 12:41:11,223 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [887.52716, 895.5602, 898.4598, 876.7805, 899.79156, 890.8147, 888.84564, 890.07745, 896.44324, 888.07684]
2025-05-13 12:41:11,223 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:41:11,231 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 4 minutes, 54 seconds)
2025-05-13 12:44:57,717 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:45:11,960 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 906.65735 ± 7.218
2025-05-13 12:45:11,960 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [894.50415, 909.6019, 911.2982, 903.0904, 895.8493, 918.3822, 909.0363, 910.4479, 901.9115, 912.45166]
2025-05-13 12:45:11,960 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:45:11,970 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 45 seconds)
2025-05-13 12:48:58,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:49:13,430 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 892.34442 ± 6.852
2025-05-13 12:49:13,430 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [889.3044, 886.2644, 904.6497, 879.66956, 896.3816, 893.5638, 891.78986, 901.15845, 888.4102, 892.2522]
2025-05-13 12:49:13,430 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:49:13,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 56 minutes, 43 seconds)
2025-05-13 12:53:00,090 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:53:11,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 612.99579 ± 326.192
2025-05-13 12:53:11,973 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [109.13966, 303.89597, 885.5851, 52.203102, 656.9197, 897.36676, 897.2189, 895.86957, 541.183, 890.5761]
2025-05-13 12:53:11,973 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 357.0, 1000.0, 67.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:53:11,981 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 52 minutes, 20 seconds)
2025-05-13 12:57:01,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:57:16,166 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 905.19336 ± 9.391
2025-05-13 12:57:16,166 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [916.24054, 913.90497, 910.472, 906.1748, 910.9376, 899.70374, 886.42773, 890.95654, 905.86975, 911.2464]
2025-05-13 12:57:16,166 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:57:16,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 48 minutes, 48 seconds)
2025-05-13 13:01:03,209 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:01:17,551 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 920.63104 ± 9.383
2025-05-13 13:01:17,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [907.9392, 920.8405, 923.1549, 927.85876, 938.8121, 910.7638, 920.743, 930.8902, 913.05255, 912.2562]
2025-05-13 13:01:17,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:01:17,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 44 minutes, 51 seconds)
2025-05-13 13:05:04,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:05:18,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 897.53680 ± 17.982
2025-05-13 13:05:18,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [907.06036, 892.7598, 903.4601, 910.36896, 908.22876, 893.81836, 893.0373, 914.62756, 903.86444, 848.1426]
2025-05-13 13:05:18,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:05:18,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 40 minutes, 54 seconds)
2025-05-13 13:09:05,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:09:19,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 926.15125 ± 9.194
2025-05-13 13:09:19,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [933.9766, 924.6195, 939.69775, 923.3361, 920.8639, 927.8782, 917.43085, 938.5486, 927.1359, 908.025]
2025-05-13 13:09:19,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:09:19,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (926.15) for latency ExtremeSparseL4U32
2025-05-13 13:09:19,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 36 minutes, 48 seconds)
2025-05-13 13:13:06,661 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:13:20,994 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 908.82556 ± 9.303
2025-05-13 13:13:20,994 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [909.3549, 921.8247, 920.76965, 898.40094, 896.4946, 898.2393, 911.3534, 908.9615, 902.461, 920.3952]
2025-05-13 13:13:20,994 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:13:21,004 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 33 minutes, 8 seconds)
2025-05-13 13:17:07,764 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:17:22,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 917.31348 ± 7.603
2025-05-13 13:17:22,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [912.4437, 914.57196, 931.87396, 916.90045, 922.4003, 917.26996, 912.3553, 915.9705, 926.2602, 903.08905]
2025-05-13 13:17:22,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:17:22,283 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 28 minutes, 45 seconds)
2025-05-13 13:21:08,833 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:21:22,891 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 938.16956 ± 27.669
2025-05-13 13:21:22,891 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [933.0042, 886.16125, 948.24426, 964.00885, 964.99695, 945.67804, 932.2391, 973.0777, 892.2318, 942.0542]
2025-05-13 13:21:22,891 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:21:22,891 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (938.17) for latency ExtremeSparseL4U32
2025-05-13 13:21:22,903 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 24 minutes, 38 seconds)
2025-05-13 13:25:09,707 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:25:23,893 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 976.78113 ± 23.491
2025-05-13 13:25:23,893 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [980.97766, 951.9757, 985.7085, 994.1636, 995.7183, 950.9218, 977.10156, 933.5743, 980.9933, 1016.677]
2025-05-13 13:25:23,893 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:25:23,893 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (976.78) for latency ExtremeSparseL4U32
2025-05-13 13:25:23,901 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 20 minutes, 36 seconds)
2025-05-13 13:29:11,016 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:29:25,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 969.86444 ± 19.384
2025-05-13 13:29:25,437 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [939.9176, 992.81256, 962.57043, 973.502, 951.4532, 985.3568, 957.7212, 998.49774, 987.33856, 949.4733]
2025-05-13 13:29:25,437 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:29:25,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 16 minutes, 38 seconds)
2025-05-13 13:33:13,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:33:27,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 966.87079 ± 14.927
2025-05-13 13:33:27,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [997.26245, 967.7647, 949.96014, 966.49854, 953.64655, 947.75085, 961.4355, 978.878, 983.5766, 961.93414]
2025-05-13 13:33:27,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:33:27,225 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 12 minutes, 41 seconds)
2025-05-13 13:37:13,773 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:37:27,870 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 990.11829 ± 25.898
2025-05-13 13:37:27,870 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1008.53613, 1020.8829, 1002.3665, 951.0388, 1015.831, 1005.2555, 940.9615, 980.6107, 1000.8314, 974.86774]
2025-05-13 13:37:27,870 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:37:27,870 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (990.12) for latency ExtremeSparseL4U32
2025-05-13 13:37:27,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 8 minutes, 35 seconds)
2025-05-13 13:41:14,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:41:28,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 961.23987 ± 40.625
2025-05-13 13:41:28,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [988.7903, 976.28595, 978.8652, 979.03406, 911.3725, 1011.67725, 906.7506, 1017.5158, 936.7773, 905.33057]
2025-05-13 13:41:28,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:41:28,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 4 minutes, 35 seconds)
2025-05-13 13:45:15,101 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:45:29,300 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1028.33606 ± 14.453
2025-05-13 13:45:29,300 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1023.19806, 1018.9968, 1029.5139, 1014.7815, 1011.11, 1040.1616, 1062.0573, 1018.0523, 1026.7535, 1038.735]
2025-05-13 13:45:29,300 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:45:29,300 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1028.34) for latency ExtremeSparseL4U32
2025-05-13 13:45:29,310 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 32 seconds)
2025-05-13 13:49:15,526 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:49:29,483 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1059.59399 ± 27.981
2025-05-13 13:49:29,483 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1066.0416, 1082.794, 1051.9907, 1063.7908, 994.0784, 1088.0427, 1067.0485, 1024.4998, 1083.5846, 1074.0685]
2025-05-13 13:49:29,483 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:49:29,483 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1059.59) for latency ExtremeSparseL4U32
2025-05-13 13:49:29,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 56 minutes, 23 seconds)
2025-05-13 13:53:15,766 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:53:29,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1161.74097 ± 24.778
2025-05-13 13:53:29,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1131.0945, 1124.5856, 1177.21, 1166.6093, 1154.565, 1159.8834, 1180.0686, 1132.767, 1201.3142, 1189.3119]
2025-05-13 13:53:29,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:53:29,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1161.74) for latency ExtremeSparseL4U32
2025-05-13 13:53:29,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 52 minutes, 14 seconds)
2025-05-13 13:57:17,382 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:57:31,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1134.39331 ± 32.952
2025-05-13 13:57:31,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1127.329, 1158.9788, 1075.3965, 1120.8446, 1134.3734, 1138.1306, 1141.1694, 1101.7081, 1138.1393, 1207.8632]
2025-05-13 13:57:31,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:57:31,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 48 minutes, 18 seconds)
2025-05-13 14:01:18,957 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:01:33,136 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1105.40601 ± 330.254
2025-05-13 14:01:33,137 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1246.2699, 1245.6516, 624.4152, 1292.1139, 1263.8955, 300.48703, 1274.0547, 1307.3514, 1228.1012, 1271.7197]
2025-05-13 14:01:33,137 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:01:33,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 44 minutes, 23 seconds)
2025-05-13 14:05:18,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:05:32,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1188.62329 ± 249.098
2025-05-13 14:05:32,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [449.9235, 1328.2812, 1198.2396, 1284.6841, 1282.1383, 1266.7346, 1304.8124, 1262.5187, 1212.4698, 1296.4315]
2025-05-13 14:05:32,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:05:32,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1188.62) for latency ExtremeSparseL4U32
2025-05-13 14:05:32,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 40 minutes, 16 seconds)
2025-05-13 14:09:19,934 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:09:33,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1273.00171 ± 80.842
2025-05-13 14:09:33,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1217.1752, 1394.6072, 1244.0837, 1165.1187, 1276.4199, 1237.7659, 1382.286, 1380.2312, 1172.462, 1259.8684]
2025-05-13 14:09:33,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:09:33,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1273.00) for latency ExtremeSparseL4U32
2025-05-13 14:09:33,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 36 minutes, 20 seconds)
2025-05-13 14:13:20,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:13:34,736 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1356.99243 ± 320.686
2025-05-13 14:13:34,737 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1595.4642, 1512.6431, 1478.2128, 1504.5792, 1221.4114, 1355.9993, 443.3611, 1419.2928, 1496.7828, 1542.1777]
2025-05-13 14:13:34,737 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:13:34,737 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1356.99) for latency ExtremeSparseL4U32
2025-05-13 14:13:34,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 32 minutes, 22 seconds)
2025-05-13 14:17:22,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:17:36,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1493.89209 ± 62.618
2025-05-13 14:17:36,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1559.6718, 1461.1952, 1438.9441, 1598.789, 1494.9436, 1508.1666, 1524.2554, 1497.526, 1498.0453, 1357.3845]
2025-05-13 14:17:36,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:17:36,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1493.89) for latency ExtremeSparseL4U32
2025-05-13 14:17:36,127 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 28 minutes, 21 seconds)
2025-05-13 14:21:29,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:21:41,914 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1283.39233 ± 349.617
2025-05-13 14:21:41,914 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1039.5507, 1477.6289, 1667.6439, 1361.5729, 1354.7627, 1436.1676, 1538.9125, 1547.0886, 969.61005, 440.98434]
2025-05-13 14:21:41,914 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 772.0, 291.0]
2025-05-13 14:21:41,923 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 24 minutes, 36 seconds)
2025-05-13 14:25:29,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:25:43,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1611.46265 ± 302.896
2025-05-13 14:25:43,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1597.1428, 1601.9337, 1814.5348, 1686.7955, 1895.3271, 1552.0948, 1507.4414, 1764.9685, 1900.4047, 793.9834]
2025-05-13 14:25:43,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [917.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:25:43,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1611.46) for latency ExtremeSparseL4U32
2025-05-13 14:25:43,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 20 minutes, 41 seconds)
2025-05-13 14:29:30,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:29:44,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1583.40198 ± 120.244
2025-05-13 14:29:44,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1438.8059, 1743.0262, 1628.4624, 1642.4537, 1674.8212, 1682.7344, 1571.229, 1511.5883, 1322.7435, 1618.1554]
2025-05-13 14:29:44,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:29:44,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 16 minutes, 40 seconds)
2025-05-13 14:33:31,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:33:45,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1477.84351 ± 163.417
2025-05-13 14:33:45,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1640.2921, 1442.7435, 1431.0138, 1577.0798, 1155.1919, 1530.7856, 1528.3112, 1272.3335, 1449.196, 1751.4875]
2025-05-13 14:33:45,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:33:45,590 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 12 minutes, 39 seconds)
2025-05-13 14:37:33,301 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:37:46,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1182.07849 ± 621.886
2025-05-13 14:37:46,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1603.4648, 1826.3539, 665.0442, 1916.0856, -30.539248, 1496.9666, 1568.8206, 950.9254, 1436.2379, 387.42438]
2025-05-13 14:37:46,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 304.0]
2025-05-13 14:37:46,283 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 8 minutes, 34 seconds)
2025-05-13 14:41:26,478 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:41:39,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1551.38538 ± 297.102
2025-05-13 14:41:39,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1163.0905, 1633.1989, 1655.219, 1673.1722, 2005.2283, 890.09796, 1744.5648, 1506.2057, 1681.2886, 1561.7872]
2025-05-13 14:41:39,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [666.0, 1000.0, 1000.0, 1000.0, 1000.0, 678.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:41:39,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 3 minutes, 52 seconds)
2025-05-13 14:45:26,920 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:45:40,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1607.29639 ± 179.405
2025-05-13 14:45:40,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1745.6023, 1594.1644, 1474.8251, 1753.3112, 1222.6246, 1744.0967, 1847.0625, 1415.3523, 1657.899, 1618.025]
2025-05-13 14:45:40,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:45:40,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 59 minutes, 53 seconds)
2025-05-13 14:49:32,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:49:46,792 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1687.71289 ± 333.218
2025-05-13 14:49:46,793 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1615.076, 1944.0146, 1841.2128, 1800.7627, 769.70123, 1919.2872, 1909.5388, 1783.8717, 1796.6073, 1497.0562]
2025-05-13 14:49:46,793 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:49:46,793 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1687.71) for latency ExtremeSparseL4U32
2025-05-13 14:49:46,802 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 56 minutes, 6 seconds)
2025-05-13 14:53:34,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:53:47,470 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1667.02173 ± 342.367
2025-05-13 14:53:47,470 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [782.842, 1923.9697, 1893.2535, 1782.8395, 2056.013, 1403.8981, 1756.6235, 1809.1295, 1566.1245, 1695.5227]
2025-05-13 14:53:47,470 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [469.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:53:47,479 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 52 minutes, 4 seconds)
2025-05-13 14:57:29,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:57:43,202 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1630.78296 ± 373.669
2025-05-13 14:57:43,202 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [852.52576, 1891.8092, 1817.9792, 973.48315, 1658.8054, 1989.051, 1755.5673, 1658.8043, 1786.2643, 1923.54]
2025-05-13 14:57:43,202 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [510.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:57:43,210 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 47 minutes, 52 seconds)
2025-05-13 15:01:22,861 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:01:36,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1652.56799 ± 366.413
2025-05-13 15:01:36,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1742.1492, 1724.082, 1923.6774, 1563.9171, 1773.7869, 1928.9474, 773.9581, 2074.601, 1798.7648, 1221.7961]
2025-05-13 15:01:36,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 494.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:01:36,065 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 43 minutes, 52 seconds)
2025-05-13 15:05:23,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:05:37,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1757.55664 ± 212.453
2025-05-13 15:05:37,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1854.5265, 1932.5842, 1655.5326, 1239.2177, 2059.0752, 1903.152, 1801.5737, 1637.8912, 1703.56, 1788.4525]
2025-05-13 15:05:37,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:05:37,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1757.56) for latency ExtremeSparseL4U32
2025-05-13 15:05:37,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 39 minutes, 53 seconds)
2025-05-13 15:09:40,278 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:09:54,066 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1751.41760 ± 351.207
2025-05-13 15:09:54,066 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1859.9349, 2089.984, 1745.4498, 1736.4945, 2107.1614, 1899.0548, 810.5426, 1630.6443, 1659.5043, 1975.4061]
2025-05-13 15:09:54,066 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:09:54,076 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 36 minutes, 13 seconds)
2025-05-13 15:13:40,615 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:13:54,563 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1654.98206 ± 459.108
2025-05-13 15:13:54,563 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2088.4192, 1958.349, 608.2359, 1912.2012, 1962.5898, 977.7473, 1625.528, 1744.9099, 1697.4277, 1974.4131]
2025-05-13 15:13:54,563 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:13:54,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 32 minutes, 11 seconds)
2025-05-13 15:17:41,274 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:17:53,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1606.47192 ± 520.330
2025-05-13 15:17:53,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1608.756, 2045.7905, 1890.0927, 508.0731, 1639.4619, 1929.317, 1870.3065, 1967.4952, 1910.043, 695.383]
2025-05-13 15:17:53,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [801.0, 1000.0, 1000.0, 348.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:17:53,828 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 28 minutes, 14 seconds)
2025-05-13 15:21:23,990 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:21:37,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1713.91272 ± 605.854
2025-05-13 15:21:37,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2004.3168, 1771.0917, 1915.0985, -81.03668, 2070.8462, 1754.8024, 1905.7527, 1945.9563, 1999.8516, 1852.4481]
2025-05-13 15:21:37,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:21:37,930 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 24 minutes, 2 seconds)
2025-05-13 15:25:24,463 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:25:38,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1942.32031 ± 211.183
2025-05-13 15:25:38,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2129.9014, 2297.0981, 2106.6565, 1570.6375, 1687.5846, 1890.231, 1956.2911, 1786.6747, 2102.2876, 1895.8423]
2025-05-13 15:25:38,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 999.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:25:38,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1942.32) for latency ExtremeSparseL4U32
2025-05-13 15:25:38,136 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 20 minutes)
2025-05-13 15:29:24,704 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:29:37,443 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1871.49194 ± 517.601
2025-05-13 15:29:37,443 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1742.8068, 1892.5292, 1756.1415, 410.57837, 2215.2468, 2169.6824, 1968.8748, 2170.7866, 2203.8174, 2184.4575]
2025-05-13 15:29:37,443 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 235.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:29:37,453 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 15 minutes, 46 seconds)
2025-05-13 15:33:20,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:33:34,875 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1680.85779 ± 319.942
2025-05-13 15:33:34,875 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [839.1418, 1951.6523, 1862.2943, 2002.9645, 1423.5745, 1767.1234, 1740.7092, 1685.9711, 1687.3899, 1847.7572]
2025-05-13 15:33:34,876 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:33:34,885 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 11 minutes, 48 seconds)
2025-05-13 15:37:22,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:37:36,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1646.62366 ± 417.722
2025-05-13 15:37:36,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2086.6162, 1666.2008, 903.9482, 1169.5875, 1151.2426, 2078.5103, 1959.5885, 1685.7922, 2136.58, 1628.1694]
2025-05-13 15:37:36,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 985.0, 1000.0, 1000.0, 1000.0, 691.0]
2025-05-13 15:37:36,015 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 7 minutes, 52 seconds)
2025-05-13 15:41:23,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:41:36,796 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 2155.46729 ± 84.224
2025-05-13 15:41:36,796 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2095.4592, 1982.9857, 2123.3777, 2204.056, 2181.1365, 2263.8247, 2149.7166, 2268.4636, 2076.9724, 2208.6814]
2025-05-13 15:41:36,796 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:41:36,796 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (2155.47) for latency ExtremeSparseL4U32
2025-05-13 15:41:36,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 59 seconds)
2025-05-13 15:45:33,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:45:47,766 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1936.86584 ± 601.821
2025-05-13 15:45:47,766 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [2016.9492, 158.65253, 2104.796, 2203.912, 2011.5594, 2090.0513, 2227.298, 1999.6836, 2233.2202, 2322.5376]
2025-05-13 15:45:47,766 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:45:47,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1251 [DEBUG]: Training session finished
