2025-05-13 09:06:23,231 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mda-highdim-mem32
2025-05-13 09:06:23,231 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-bpql-mda-highdim-mem32
2025-05-13 09:06:23,231 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x154e75982250>}
2025-05-13 09:06:23,231 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:23,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1133 [INFO]: Creating new trainer
2025-05-13 09:06:23,255 baseline-bpql-mda-noisy-ant:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-05-13 09:06:23,255 baseline-bpql-mda-noisy-ant:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:23,262 baseline-bpql-mda-noisy-ant:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=27, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(8, 512, batch_first=True)
)
2025-05-13 09:06:24,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:24,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 1/100
2025-05-13 09:10:42,147 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:11:05,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 231.86333 ± 26.300
2025-05-13 09:11:05,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [199.50235, 273.09134, 202.29611, 209.99446, 245.90497, 274.63727, 232.7558, 208.08112, 245.853, 226.51683]
2025-05-13 09:11:05,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:11:05,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (231.86) for latency ExtremeSparseL4U32
2025-05-13 09:11:05,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 7 hours, 44 minutes)
2025-05-13 09:15:30,513 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:15:53,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 786.63245 ± 86.518
2025-05-13 09:15:53,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [817.1496, 813.1754, 812.51685, 823.4859, 815.5662, 809.61084, 527.6482, 820.5484, 822.73285, 803.89075]
2025-05-13 09:15:53,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:15:53,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (786.63) for latency ExtremeSparseL4U32
2025-05-13 09:15:53,565 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 7 hours, 45 minutes, 6 seconds)
2025-05-13 09:20:07,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:20:30,290 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 835.40936 ± 7.255
2025-05-13 09:20:30,290 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [844.3797, 838.18854, 840.0554, 826.09296, 836.1412, 845.4013, 841.19434, 827.90857, 825.09125, 829.6403]
2025-05-13 09:20:30,290 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:20:30,290 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (835.41) for latency ExtremeSparseL4U32
2025-05-13 09:20:30,294 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 7 hours, 36 minutes, 1 second)
2025-05-13 09:24:43,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:25:06,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 867.48651 ± 5.482
2025-05-13 09:25:06,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [865.0461, 871.514, 869.79846, 875.43427, 865.52295, 871.16693, 874.28296, 858.70337, 860.66876, 862.72736]
2025-05-13 09:25:06,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:25:06,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (867.49) for latency ExtremeSparseL4U32
2025-05-13 09:25:06,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 7 hours, 28 minutes, 52 seconds)
2025-05-13 09:29:20,306 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:29:43,073 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 852.68036 ± 9.168
2025-05-13 09:29:43,073 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [853.11127, 859.971, 851.5506, 864.46124, 830.46844, 850.0799, 848.56067, 863.642, 850.2644, 854.69415]
2025-05-13 09:29:43,073 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:29:43,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 7 hours, 23 minutes, 1 second)
2025-05-13 09:33:56,643 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:34:14,939 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 591.87073 ± 284.075
2025-05-13 09:34:14,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [707.01624, 746.1229, 600.09784, 758.41516, 35.16688, 776.0103, 704.9286, 811.46344, 747.2251, 32.260506]
2025-05-13 09:34:14,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 38.0, 1000.0, 1000.0, 1000.0, 1000.0, 38.0]
2025-05-13 09:34:14,944 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 7 hours, 15 minutes, 26 seconds)
2025-05-13 09:38:28,536 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:38:51,398 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 883.23224 ± 4.024
2025-05-13 09:38:51,398 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [885.3152, 881.7023, 880.23865, 888.4023, 886.5296, 888.4607, 881.07, 874.75085, 881.42126, 884.43115]
2025-05-13 09:38:51,398 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:38:51,398 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (883.23) for latency ExtremeSparseL4U32
2025-05-13 09:38:51,403 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 7 hours, 7 minutes, 7 seconds)
2025-05-13 09:43:05,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:43:27,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 876.31232 ± 17.823
2025-05-13 09:43:27,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [886.50745, 892.6816, 843.52295, 872.10596, 842.7516, 884.5074, 883.5416, 896.8957, 877.99207, 882.6162]
2025-05-13 09:43:27,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:43:27,870 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 7 hours, 2 minutes, 27 seconds)
2025-05-13 09:47:41,502 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:48:04,010 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 863.08557 ± 15.491
2025-05-13 09:48:04,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [865.5061, 879.0214, 844.71106, 847.96936, 867.0099, 870.0214, 871.6509, 878.6716, 875.4682, 830.8256]
2025-05-13 09:48:04,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:48:04,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 57 minutes, 55 seconds)
2025-05-13 09:52:17,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:52:40,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 511.55695 ± 163.028
2025-05-13 09:52:40,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [339.246, 328.078, 648.3356, 210.96342, 682.47473, 663.0529, 503.56354, 662.12946, 446.74512, 630.98035]
2025-05-13 09:52:40,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:52:40,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 6 hours, 53 minutes, 12 seconds)
2025-05-13 09:56:54,229 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:57:16,838 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 797.05560 ± 50.130
2025-05-13 09:57:16,838 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [823.59686, 649.68994, 819.3232, 814.3732, 799.0183, 797.4611, 802.6186, 823.3383, 814.1498, 826.9864]
2025-05-13 09:57:16,838 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:57:16,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 6 hours, 49 minutes, 57 seconds)
2025-05-13 10:01:30,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:01:53,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 874.68964 ± 6.751
2025-05-13 10:01:53,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [882.41534, 872.7895, 881.3121, 862.68274, 867.53516, 877.4858, 873.47015, 880.8489, 881.443, 866.914]
2025-05-13 10:01:53,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:01:53,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 6 hours, 45 minutes, 19 seconds)
2025-05-13 10:06:06,700 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:06:29,465 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 863.69043 ± 6.638
2025-05-13 10:06:29,465 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [849.75, 871.0915, 866.02734, 862.393, 862.6829, 864.84625, 871.0572, 855.0668, 870.96655, 863.0226]
2025-05-13 10:06:29,465 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:06:29,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 6 hours, 40 minutes, 39 seconds)
2025-05-13 10:10:43,003 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:11:05,897 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 893.43567 ± 4.294
2025-05-13 10:11:05,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [889.5006, 901.2226, 892.48364, 895.6263, 899.4101, 892.46136, 894.2161, 891.0603, 885.75507, 892.6204]
2025-05-13 10:11:05,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:11:05,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (893.44) for latency ExtremeSparseL4U32
2025-05-13 10:11:05,903 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 6 hours, 36 minutes, 8 seconds)
2025-05-13 10:15:19,121 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:15:41,902 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 904.80927 ± 8.259
2025-05-13 10:15:41,903 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [911.50415, 902.5368, 896.02826, 921.08484, 909.9088, 909.08984, 904.26306, 906.02716, 894.9798, 892.6694]
2025-05-13 10:15:41,903 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:15:41,903 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (904.81) for latency ExtremeSparseL4U32
2025-05-13 10:15:41,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 6 hours, 31 minutes, 24 seconds)
2025-05-13 10:19:55,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:20:17,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 886.18176 ± 7.758
2025-05-13 10:20:17,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [891.1476, 881.42236, 888.24225, 878.7703, 880.0104, 898.453, 889.0458, 895.4455, 871.6702, 887.6103]
2025-05-13 10:20:17,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:20:17,717 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 6 hours, 26 minutes, 38 seconds)
2025-05-13 10:24:31,128 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:24:53,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 899.22949 ± 6.486
2025-05-13 10:24:53,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [894.45026, 903.5275, 885.68866, 898.0297, 899.8841, 899.3777, 901.1211, 894.4482, 910.8708, 904.8968]
2025-05-13 10:24:53,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:24:53,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 6 hours, 21 minutes, 55 seconds)
2025-05-13 10:28:45,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:29:07,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 880.30243 ± 87.132
2025-05-13 10:29:07,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [619.43427, 913.8908, 902.1239, 900.71576, 907.5706, 913.87384, 907.7872, 918.17194, 904.5132, 914.94257]
2025-05-13 10:29:07,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:29:07,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 6 hours, 11 minutes, 13 seconds)
2025-05-13 10:33:21,453 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:33:44,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 929.10321 ± 8.637
2025-05-13 10:33:44,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [926.54706, 928.2944, 936.9079, 935.5082, 928.11444, 937.6939, 928.7228, 933.4425, 929.9865, 905.8146]
2025-05-13 10:33:44,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:33:44,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (929.10) for latency ExtremeSparseL4U32
2025-05-13 10:33:44,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 6 hours, 6 minutes, 41 seconds)
2025-05-13 10:37:57,672 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:38:20,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 898.67346 ± 13.790
2025-05-13 10:38:20,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [883.40674, 879.70465, 912.60876, 902.5435, 892.74786, 902.25836, 918.7042, 878.3854, 913.5588, 902.816]
2025-05-13 10:38:20,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:38:20,352 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 6 hours, 2 minutes, 15 seconds)
2025-05-13 10:42:34,216 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:42:56,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 891.35223 ± 9.823
2025-05-13 10:42:56,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [891.97485, 901.0006, 881.8465, 888.0059, 897.3219, 896.3642, 898.63135, 891.1468, 899.9904, 867.2405]
2025-05-13 10:42:56,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:42:56,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 57 minutes, 56 seconds)
2025-05-13 10:47:11,009 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:47:33,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 909.70605 ± 9.999
2025-05-13 10:47:33,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [893.611, 903.76874, 903.4652, 914.3377, 922.44275, 908.89075, 911.9386, 915.4008, 926.3939, 896.81177]
2025-05-13 10:47:33,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:47:33,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 53 minutes, 38 seconds)
2025-05-13 10:51:47,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:52:10,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 725.58429 ± 546.098
2025-05-13 10:52:10,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [-912.66376, 908.28986, 909.85693, 907.89813, 910.53406, 911.464, 896.0891, 906.3823, 908.02496, 909.9674]
2025-05-13 10:52:10,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:52:10,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 54 minutes, 53 seconds)
2025-05-13 10:56:24,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:56:46,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 907.45752 ± 8.185
2025-05-13 10:56:46,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [907.9562, 902.02234, 915.1327, 909.3199, 913.089, 916.48956, 891.01355, 896.4548, 915.99536, 907.1014]
2025-05-13 10:56:46,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:56:46,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 50 minutes, 17 seconds)
2025-05-13 11:01:00,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:01:23,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 922.16943 ± 7.581
2025-05-13 11:01:23,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [915.7494, 915.4429, 922.047, 914.9653, 940.99445, 918.32935, 929.8951, 923.3948, 920.28394, 920.5916]
2025-05-13 11:01:23,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:01:23,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 45 minutes, 40 seconds)
2025-05-13 11:05:36,776 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:05:59,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 921.85266 ± 24.355
2025-05-13 11:05:59,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [930.3483, 880.22687, 944.1679, 945.89703, 907.97845, 946.3104, 949.59155, 887.8629, 922.5794, 903.56305]
2025-05-13 11:05:59,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:05:59,565 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 41 minutes, 2 seconds)
2025-05-13 11:10:13,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:10:35,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 935.24255 ± 61.144
2025-05-13 11:10:35,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1027.5197, 1002.4824, 878.27423, 919.3721, 860.2961, 911.2717, 922.5605, 1025.4346, 856.58954, 948.6255]
2025-05-13 11:10:35,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:10:35,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (935.24) for latency ExtremeSparseL4U32
2025-05-13 11:10:35,702 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 36 minutes, 15 seconds)
2025-05-13 11:14:49,410 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:15:12,179 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 861.60010 ± 38.430
2025-05-13 11:15:12,179 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [915.83325, 886.8169, 878.2523, 865.1321, 835.7861, 822.8192, 865.81946, 893.83417, 876.4459, 775.26154]
2025-05-13 11:15:12,179 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:15:12,188 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 5 hours, 31 minutes, 39 seconds)
2025-05-13 11:19:25,775 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:19:48,609 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 922.41095 ± 36.735
2025-05-13 11:19:48,609 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [902.156, 888.8602, 990.2507, 886.004, 955.15533, 933.4426, 872.92566, 901.88916, 966.5395, 926.8862]
2025-05-13 11:19:48,609 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:19:48,613 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 5 hours, 27 minutes, 2 seconds)
2025-05-13 11:24:02,152 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:24:24,890 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1001.79456 ± 97.391
2025-05-13 11:24:24,890 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [932.3076, 1079.3356, 962.5421, 1049.1708, 1088.3889, 1006.9537, 955.7761, 983.7824, 1165.9801, 793.708]
2025-05-13 11:24:24,890 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:24:24,890 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1001.79) for latency ExtremeSparseL4U32
2025-05-13 11:24:24,896 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 5 hours, 22 minutes, 25 seconds)
2025-05-13 11:28:38,727 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:29:01,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1067.14441 ± 55.444
2025-05-13 11:29:01,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1075.1016, 1011.5299, 1098.3177, 949.56824, 1074.4314, 1091.7484, 1123.5963, 1108.9789, 1129.024, 1009.1471]
2025-05-13 11:29:01,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:29:01,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1067.14) for latency ExtremeSparseL4U32
2025-05-13 11:29:01,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 5 hours, 17 minutes, 44 seconds)
2025-05-13 11:33:14,835 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:33:37,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 998.46204 ± 85.028
2025-05-13 11:33:37,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1101.0844, 1070.4491, 908.8765, 1024.9926, 823.941, 1097.1893, 1020.57214, 1033.1769, 981.0678, 923.2709]
2025-05-13 11:33:37,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:33:37,065 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 5 hours, 13 minutes, 6 seconds)
2025-05-13 11:37:50,749 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:38:13,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1085.67175 ± 107.530
2025-05-13 11:38:13,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1040.2738, 1128.8529, 1167.3146, 903.5266, 1186.7482, 1055.741, 1111.7994, 887.9606, 1174.2654, 1200.2355]
2025-05-13 11:38:13,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:38:13,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1085.67) for latency ExtremeSparseL4U32
2025-05-13 11:38:13,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 5 hours, 8 minutes, 27 seconds)
2025-05-13 11:42:27,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:42:49,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1157.31616 ± 53.020
2025-05-13 11:42:49,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1184.7637, 1138.6251, 1156.9193, 1174.4962, 1118.831, 1182.3495, 1172.5458, 1021.6118, 1217.9701, 1205.0491]
2025-05-13 11:42:49,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:42:49,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1157.32) for latency ExtremeSparseL4U32
2025-05-13 11:42:49,202 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 5 hours, 3 minutes, 43 seconds)
2025-05-13 11:47:14,717 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:47:37,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1252.48694 ± 115.005
2025-05-13 11:47:37,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1265.9811, 1247.0402, 1341.6028, 1311.7972, 1295.9341, 916.79205, 1280.872, 1313.9945, 1261.8246, 1289.0308]
2025-05-13 11:47:37,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:47:37,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1252.49) for latency ExtremeSparseL4U32
2025-05-13 11:47:37,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 5 hours, 1 minute, 39 seconds)
2025-05-13 11:51:50,688 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:52:11,381 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1065.35522 ± 392.723
2025-05-13 11:52:11,381 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1269.3707, 1326.5182, 1291.5132, 1296.2141, 261.3956, 1254.9816, 1079.112, 1259.3452, 320.21722, 1294.884]
2025-05-13 11:52:11,382 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 220.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:52:11,389 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 56 minutes, 36 seconds)
2025-05-13 11:56:24,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:56:46,131 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1187.67029 ± 241.985
2025-05-13 11:56:46,131 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1266.0166, 1321.786, 1328.2969, 1403.3984, 1019.244, 1342.5204, 1330.858, 817.3431, 1363.7008, 683.5383]
2025-05-13 11:56:46,131 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 564.0]
2025-05-13 11:56:46,139 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 51 minutes, 42 seconds)
2025-05-13 12:00:59,675 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:01:21,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1303.75513 ± 48.931
2025-05-13 12:01:21,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1371.1233, 1309.1405, 1183.9802, 1265.5059, 1322.0935, 1325.7947, 1336.7822, 1296.738, 1339.8937, 1286.4993]
2025-05-13 12:01:21,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:01:21,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1303.76) for latency ExtremeSparseL4U32
2025-05-13 12:01:21,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 46 minutes, 57 seconds)
2025-05-13 12:05:34,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:05:56,502 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1311.07104 ± 61.630
2025-05-13 12:05:56,502 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1316.2386, 1309.1748, 1446.7957, 1368.9385, 1228.0164, 1299.7395, 1226.9541, 1298.9052, 1338.9056, 1277.0433]
2025-05-13 12:05:56,502 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 984.0, 1000.0, 1000.0, 896.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:05:56,502 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1311.07) for latency ExtremeSparseL4U32
2025-05-13 12:05:56,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 42 minutes, 5 seconds)
2025-05-13 12:10:10,225 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:10:29,226 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1154.94360 ± 375.273
2025-05-13 12:10:29,226 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1391.919, 1373.2346, 1380.9141, 1229.2688, 1377.7339, 1324.5216, 1291.771, 117.53243, 882.9905, 1179.5515]
2025-05-13 12:10:29,226 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 93.0, 654.0, 835.0]
2025-05-13 12:10:29,233 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 34 minutes, 24 seconds)
2025-05-13 12:14:37,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:14:59,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1329.12170 ± 51.808
2025-05-13 12:14:59,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1296.7625, 1428.999, 1281.4128, 1317.6677, 1386.7742, 1245.1737, 1311.3137, 1303.2449, 1368.7026, 1351.1663]
2025-05-13 12:14:59,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:14:59,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1329.12) for latency ExtremeSparseL4U32
2025-05-13 12:14:59,496 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 29 minutes, 3 seconds)
2025-05-13 12:19:11,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:19:33,426 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1258.84692 ± 187.749
2025-05-13 12:19:33,426 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1374.5886, 1337.791, 1368.4615, 1001.17694, 1411.0281, 1385.6522, 1333.2051, 1384.07, 1163.72, 828.7773]
2025-05-13 12:19:33,426 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 829.0, 1000.0]
2025-05-13 12:19:33,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 24 minutes, 20 seconds)
2025-05-13 12:23:59,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:24:22,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1364.00037 ± 41.931
2025-05-13 12:24:22,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1381.2432, 1382.6858, 1381.3464, 1338.372, 1304.4152, 1364.257, 1327.6913, 1465.382, 1335.6189, 1358.9922]
2025-05-13 12:24:22,299 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:24:22,299 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1364.00) for latency ExtremeSparseL4U32
2025-05-13 12:24:22,305 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 4 hours, 22 minutes, 17 seconds)
2025-05-13 12:28:22,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:28:44,519 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1386.65454 ± 98.095
2025-05-13 12:28:44,519 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1387.0337, 1360.3973, 1427.3136, 1408.589, 1429.8629, 1489.575, 1412.7798, 1107.8217, 1427.6094, 1415.5631]
2025-05-13 12:28:44,519 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:28:44,519 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1386.65) for latency ExtremeSparseL4U32
2025-05-13 12:28:44,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 4 hours, 15 minutes, 21 seconds)
2025-05-13 12:32:50,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:33:13,153 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1347.68896 ± 81.914
2025-05-13 12:33:13,153 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1325.5515, 1408.4209, 1328.573, 1344.7058, 1119.3054, 1385.4336, 1404.408, 1360.4591, 1392.7058, 1407.3262]
2025-05-13 12:33:13,153 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:33:13,158 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 4 hours, 10 minutes, 3 seconds)
2025-05-13 12:37:48,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:38:10,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1360.82898 ± 22.606
2025-05-13 12:38:10,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1346.045, 1361.8907, 1386.8455, 1355.9067, 1389.4114, 1338.3053, 1351.9913, 1372.2991, 1388.3412, 1317.254]
2025-05-13 12:38:10,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:38:10,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 4 hours, 10 minutes, 23 seconds)
2025-05-13 12:42:13,702 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:42:35,855 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1431.29565 ± 39.505
2025-05-13 12:42:35,855 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1436.3668, 1474.941, 1479.5596, 1424.8855, 1460.6039, 1386.9841, 1391.6835, 1477.4984, 1363.7175, 1416.7156]
2025-05-13 12:42:35,855 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:42:35,855 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1431.30) for latency ExtremeSparseL4U32
2025-05-13 12:42:35,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 4 hours, 4 minutes, 13 seconds)
2025-05-13 12:46:36,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:46:59,023 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1470.28918 ± 36.329
2025-05-13 12:46:59,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1401.8248, 1416.3726, 1489.9153, 1496.8167, 1472.3503, 1435.6052, 1485.7186, 1493.4247, 1511.0798, 1499.7843]
2025-05-13 12:46:59,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:46:59,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1470.29) for latency ExtremeSparseL4U32
2025-05-13 12:46:59,032 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 55 minutes, 9 seconds)
2025-05-13 12:51:11,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:51:31,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1280.57239 ± 383.442
2025-05-13 12:51:31,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1413.2203, 1414.0267, 1433.9094, 1409.9124, 1475.2222, 1406.9243, 1343.068, 1377.6091, 134.36353, 1397.4675]
2025-05-13 12:51:31,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 107.0, 1000.0]
2025-05-13 12:51:31,914 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 52 minutes, 27 seconds)
2025-05-13 12:56:02,308 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:56:24,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1431.42603 ± 83.938
2025-05-13 12:56:24,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1456.1266, 1453.4838, 1424.1305, 1409.7915, 1487.6139, 1194.3328, 1498.0433, 1450.8228, 1498.9786, 1440.9363]
2025-05-13 12:56:24,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:56:24,768 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 51 minutes, 56 seconds)
2025-05-13 13:00:29,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:00:52,112 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1441.03967 ± 75.568
2025-05-13 13:00:52,112 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1265.2957, 1466.3176, 1548.7178, 1518.5804, 1472.0305, 1467.5411, 1385.1533, 1437.8079, 1388.2219, 1460.7292]
2025-05-13 13:00:52,112 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:00:52,121 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 42 minutes, 23 seconds)
2025-05-13 13:04:57,306 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:05:19,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1424.41614 ± 59.702
2025-05-13 13:05:19,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1405.603, 1492.0894, 1420.8351, 1403.9651, 1411.0638, 1484.1633, 1275.5012, 1434.2667, 1489.6158, 1427.0565]
2025-05-13 13:05:19,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:05:19,738 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 38 minutes, 13 seconds)
2025-05-13 13:09:33,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:09:53,108 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1230.40271 ± 428.276
2025-05-13 13:09:53,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1515.7421, 624.77264, 1453.7701, 1460.577, 1079.9436, 233.86938, 1479.8837, 1490.3687, 1554.7633, 1410.3354]
2025-05-13 13:09:53,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 698.0, 177.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:09:53,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 35 minutes, 16 seconds)
2025-05-13 13:14:06,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:14:28,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1469.31543 ± 45.966
2025-05-13 13:14:28,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1512.9706, 1502.0873, 1430.8008, 1461.952, 1520.2324, 1394.3503, 1526.1147, 1496.3477, 1437.6014, 1410.6978]
2025-05-13 13:14:28,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:14:28,815 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 31 minutes, 7 seconds)
2025-05-13 13:18:42,700 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:19:04,775 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1455.56323 ± 38.677
2025-05-13 13:19:04,775 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1439.553, 1499.4286, 1467.7163, 1422.0817, 1389.1964, 1420.9248, 1442.2964, 1498.5278, 1458.0808, 1517.8274]
2025-05-13 13:19:04,775 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:19:04,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 24 minutes)
2025-05-13 13:23:18,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:23:41,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1486.00269 ± 24.226
2025-05-13 13:23:41,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1451.8035, 1513.908, 1465.022, 1486.0892, 1525.9761, 1498.0054, 1508.8655, 1453.7397, 1483.6361, 1472.9812]
2025-05-13 13:23:41,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:23:41,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1486.00) for latency ExtremeSparseL4U32
2025-05-13 13:23:41,293 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 20 minutes, 48 seconds)
2025-05-13 13:27:55,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:28:17,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1456.25146 ± 42.074
2025-05-13 13:28:17,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1520.4554, 1479.7152, 1486.0486, 1480.9359, 1488.1998, 1458.4664, 1444.5231, 1426.9587, 1402.0065, 1375.2037]
2025-05-13 13:28:17,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:28:17,717 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 3 hours, 17 minutes, 30 seconds)
2025-05-13 13:32:31,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:32:52,428 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1369.31494 ± 291.971
2025-05-13 13:32:52,428 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [508.30063, 1471.3225, 1508.5385, 1462.5679, 1444.1329, 1545.5004, 1500.1298, 1339.8782, 1494.2125, 1418.5656]
2025-05-13 13:32:52,428 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [370.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:32:52,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 3 hours, 13 minutes, 6 seconds)
2025-05-13 13:37:04,983 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:37:27,101 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1490.71436 ± 30.578
2025-05-13 13:37:27,101 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1540.5309, 1460.6152, 1490.9386, 1524.4855, 1496.9158, 1493.6589, 1480.96, 1431.3633, 1469.9894, 1517.6852]
2025-05-13 13:37:27,101 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:37:27,101 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1490.71) for latency ExtremeSparseL4U32
2025-05-13 13:37:27,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 3 hours, 8 minutes, 21 seconds)
2025-05-13 13:41:40,926 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:42:03,365 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1525.00842 ± 30.813
2025-05-13 13:42:03,365 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1587.2349, 1531.6538, 1538.8687, 1492.5175, 1535.5023, 1481.2335, 1508.8192, 1502.3995, 1561.348, 1510.507]
2025-05-13 13:42:03,365 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:42:03,365 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1525.01) for latency ExtremeSparseL4U32
2025-05-13 13:42:03,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 3 hours, 3 minutes, 48 seconds)
2025-05-13 13:46:17,233 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:46:39,373 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1480.42847 ± 109.946
2025-05-13 13:46:39,373 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1493.0554, 1532.1677, 1477.034, 1574.4243, 1165.1659, 1556.5863, 1512.1246, 1475.2471, 1534.8988, 1483.58]
2025-05-13 13:46:39,373 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:46:39,389 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 59 minutes, 9 seconds)
2025-05-13 13:50:52,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:51:13,550 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1326.76050 ± 345.695
2025-05-13 13:51:13,550 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1410.3932, 1367.1987, 303.7895, 1429.2357, 1549.2756, 1520.2441, 1353.0931, 1446.6843, 1446.1527, 1441.5377]
2025-05-13 13:51:13,550 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 229.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:51:13,557 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 54 minutes, 16 seconds)
2025-05-13 13:55:27,076 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:55:49,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1536.97534 ± 57.196
2025-05-13 13:55:49,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1482.4037, 1594.889, 1620.5184, 1481.1187, 1543.9645, 1582.8988, 1527.3861, 1548.3341, 1565.6401, 1422.5984]
2025-05-13 13:55:49,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:55:49,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1536.98) for latency ExtremeSparseL4U32
2025-05-13 13:55:49,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 49 minutes, 49 seconds)
2025-05-13 14:00:17,523 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:00:39,660 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1496.15369 ± 45.752
2025-05-13 14:00:39,660 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1582.8367, 1459.7393, 1445.3193, 1519.3854, 1487.9106, 1482.8396, 1452.4247, 1466.4049, 1573.0032, 1491.6735]
2025-05-13 14:00:39,660 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:00:39,667 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 47 minutes, 6 seconds)
2025-05-13 14:04:53,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:05:13,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1422.92798 ± 444.953
2025-05-13 14:05:13,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [94.85424, 1575.592, 1452.8702, 1602.3265, 1615.3424, 1594.1527, 1591.4182, 1540.1659, 1558.4418, 1604.1154]
2025-05-13 14:05:13,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [102.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:05:13,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 42 minutes, 10 seconds)
2025-05-13 14:09:26,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:09:47,837 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1376.27991 ± 389.572
2025-05-13 14:09:47,837 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1531.3546, 527.3814, 1607.3616, 675.81146, 1569.2924, 1606.3889, 1573.4685, 1534.9612, 1589.0415, 1547.7378]
2025-05-13 14:09:47,837 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 478.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:09:47,847 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 37 minutes, 21 seconds)
2025-05-13 14:13:50,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:14:10,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1417.97754 ± 247.994
2025-05-13 14:14:10,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1609.0496, 907.1546, 968.9679, 1567.7349, 1633.2703, 1546.962, 1542.1317, 1522.7454, 1480.2296, 1401.5298]
2025-05-13 14:14:10,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 636.0, 661.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:14:10,932 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 31 minutes, 30 seconds)
2025-05-13 14:18:28,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:18:50,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1547.81030 ± 39.331
2025-05-13 14:18:50,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1535.2605, 1561.9464, 1499.7902, 1540.3026, 1580.7819, 1547.9553, 1543.0435, 1589.682, 1469.8337, 1609.5073]
2025-05-13 14:18:50,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:18:50,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1547.81) for latency ExtremeSparseL4U32
2025-05-13 14:18:51,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 27 minutes, 22 seconds)
2025-05-13 14:23:03,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:23:25,797 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1576.48267 ± 33.145
2025-05-13 14:23:25,797 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1574.4324, 1591.2751, 1625.8767, 1584.8818, 1519.3314, 1566.078, 1619.5537, 1583.5935, 1521.9791, 1577.824]
2025-05-13 14:23:25,797 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:23:25,797 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1576.48) for latency ExtremeSparseL4U32
2025-05-13 14:23:25,806 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 21 minutes, 10 seconds)
2025-05-13 14:27:30,424 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:27:50,776 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1439.73853 ± 438.776
2025-05-13 14:27:50,776 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1542.178, 1569.0238, 1615.3561, 1576.1455, 1596.2838, 1595.4641, 1636.7695, 1529.2697, 1610.1512, 126.74352]
2025-05-13 14:27:50,776 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 113.0]
2025-05-13 14:27:50,784 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 15 minutes, 43 seconds)
2025-05-13 14:32:04,536 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:32:25,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1342.50146 ± 427.073
2025-05-13 14:32:25,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1621.7018, 1622.9672, 1577.115, 1588.4072, 1415.714, 1505.1962, 553.35443, 1502.7612, 443.53604, 1594.2614]
2025-05-13 14:32:25,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 352.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:32:25,201 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 11 minutes, 12 seconds)
2025-05-13 14:36:35,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:36:57,183 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1615.84326 ± 52.444
2025-05-13 14:36:57,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1667.704, 1597.2404, 1616.8887, 1626.3229, 1642.2264, 1671.2196, 1621.6206, 1484.149, 1655.6427, 1575.4193]
2025-05-13 14:36:57,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 921.0, 1000.0, 1000.0]
2025-05-13 14:36:57,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1615.84) for latency ExtremeSparseL4U32
2025-05-13 14:36:57,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 2 hours, 7 minutes, 31 seconds)
2025-05-13 14:41:18,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:41:40,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1580.24573 ± 43.204
2025-05-13 14:41:40,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1514.89, 1580.3685, 1558.7085, 1622.4718, 1590.7471, 1639.7842, 1587.009, 1617.0386, 1594.2404, 1497.1987]
2025-05-13 14:41:40,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:41:40,609 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 2 hours, 3 minutes, 15 seconds)
2025-05-13 14:46:04,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:46:20,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1202.51770 ± 544.061
2025-05-13 14:46:20,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1614.0446, 956.0728, 1565.9008, 1704.0038, 422.1764, 1649.664, 1699.9357, 1541.0194, 458.21008, 414.1492]
2025-05-13 14:46:20,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 630.0, 1000.0, 1000.0, 294.0, 1000.0, 1000.0, 1000.0, 289.0, 272.0]
2025-05-13 14:46:20,481 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 59 minutes, 8 seconds)
2025-05-13 14:50:23,717 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:50:46,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1420.60181 ± 183.502
2025-05-13 14:50:46,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1419.1598, 1491.69, 1412.9592, 1525.8545, 1469.207, 1543.2518, 1482.431, 1512.915, 1465.3344, 883.21466]
2025-05-13 14:50:46,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:50:46,154 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 54 minutes, 36 seconds)
2025-05-13 14:54:59,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:55:20,669 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1527.33813 ± 240.055
2025-05-13 14:55:20,669 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [837.9791, 1626.944, 1540.0953, 1519.7058, 1526.6294, 1596.3197, 1541.9382, 1719.7821, 1705.0537, 1658.933]
2025-05-13 14:55:20,669 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [526.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:55:20,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 50 minutes, 2 seconds)
2025-05-13 14:59:31,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:59:49,212 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1256.77734 ± 498.539
2025-05-13 14:59:49,212 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1528.5262, 1314.2012, 66.22863, 1474.6632, 1485.0076, 1659.8723, 1499.5759, 520.5487, 1507.1371, 1512.0134]
2025-05-13 14:59:49,212 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 818.0, 49.0, 1000.0, 1000.0, 1000.0, 1000.0, 319.0, 1000.0, 1000.0]
2025-05-13 14:59:49,221 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 45 minutes, 11 seconds)
2025-05-13 15:03:47,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:04:09,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1598.76929 ± 36.529
2025-05-13 15:04:09,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1606.1055, 1608.7649, 1628.1382, 1563.286, 1675.629, 1593.8785, 1604.1033, 1530.4119, 1577.9111, 1599.4644]
2025-05-13 15:04:09,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:04:09,505 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 38 minutes, 55 seconds)
2025-05-13 15:08:30,099 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:08:51,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1527.38794 ± 107.119
2025-05-13 15:08:51,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1528.5408, 1560.7894, 1589.1847, 1530.2645, 1512.2284, 1639.7892, 1523.7404, 1226.603, 1599.0345, 1563.7048]
2025-05-13 15:08:51,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 794.0, 1000.0, 1000.0]
2025-05-13 15:08:51,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 34 minutes, 35 seconds)
2025-05-13 15:12:55,490 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:13:17,505 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1612.04443 ± 53.099
2025-05-13 15:13:17,505 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1550.9954, 1667.8914, 1663.3057, 1606.7565, 1627.2738, 1542.2948, 1658.4679, 1661.923, 1623.4296, 1518.1061]
2025-05-13 15:13:17,505 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:13:17,518 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 30 minutes, 5 seconds)
2025-05-13 15:17:13,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:17:35,022 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1539.62207 ± 207.550
2025-05-13 15:17:35,022 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1637.3712, 1575.49, 1645.761, 1587.7876, 928.22424, 1683.2977, 1627.4028, 1567.383, 1545.7611, 1597.7421]
2025-05-13 15:17:35,023 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:17:35,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 24 minutes, 30 seconds)
2025-05-13 15:21:43,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:22:04,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1595.44006 ± 57.922
2025-05-13 15:22:04,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1721.5859, 1552.582, 1638.2609, 1505.1492, 1644.124, 1613.4087, 1585.1523, 1575.6033, 1560.7751, 1557.7583]
2025-05-13 15:22:04,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:22:04,960 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 20 minutes, 8 seconds)
2025-05-13 15:26:05,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:26:27,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1556.11108 ± 61.061
2025-05-13 15:26:27,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1587.2235, 1602.7576, 1392.5643, 1579.5878, 1554.1746, 1567.4459, 1575.7902, 1524.5039, 1628.6671, 1548.3961]
2025-05-13 15:26:27,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:26:27,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 15 minutes, 47 seconds)
2025-05-13 15:30:39,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:31:07,201 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1627.76782 ± 59.162
2025-05-13 15:31:07,201 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1632.4426, 1730.9297, 1643.2516, 1630.5972, 1474.8064, 1625.4921, 1628.6302, 1642.9105, 1620.68, 1647.9384]
2025-05-13 15:31:07,201 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 998.0, 1000.0]
2025-05-13 15:31:07,201 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1627.77) for latency ExtremeSparseL4U32
2025-05-13 15:31:07,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 11 minutes, 13 seconds)
2025-05-13 15:35:59,425 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:36:21,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1567.57397 ± 79.975
2025-05-13 15:36:21,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1505.9047, 1660.5771, 1605.7197, 1629.0051, 1553.9751, 1599.6417, 1644.1102, 1580.0878, 1375.4655, 1521.2521]
2025-05-13 15:36:21,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:36:21,311 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 9 minutes, 11 seconds)
2025-05-13 15:40:48,382 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:41:08,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1480.74927 ± 407.698
2025-05-13 15:41:08,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1599.0509, 260.4906, 1629.2675, 1572.8438, 1601.1383, 1668.5637, 1612.1694, 1601.2078, 1660.3687, 1602.3912]
2025-05-13 15:41:08,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 220.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:41:08,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 1 hour, 5 minutes, 57 seconds)
2025-05-13 15:45:29,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:45:52,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1447.23657 ± 496.220
2025-05-13 15:45:52,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1637.3615, -40.762714, 1609.5518, 1585.6556, 1622.6804, 1611.4062, 1596.387, 1623.7007, 1599.9789, 1626.406]
2025-05-13 15:45:52,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:45:52,112 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 1 hour, 1 minute, 50 seconds)
2025-05-13 15:49:38,644 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:50:00,479 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1630.66748 ± 69.228
2025-05-13 15:50:00,479 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1579.3057, 1576.0093, 1656.1306, 1541.0525, 1632.011, 1669.142, 1608.9873, 1601.1267, 1636.8646, 1806.0458]
2025-05-13 15:50:00,479 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:50:00,479 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1630.67) for latency ExtremeSparseL4U32
2025-05-13 15:50:00,490 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 56 minutes, 32 seconds)
2025-05-13 15:54:58,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:55:23,826 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1611.51489 ± 40.859
2025-05-13 15:55:23,826 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1705.8561, 1626.8846, 1566.0852, 1574.7709, 1558.2017, 1605.0431, 1641.1637, 1609.2477, 1600.4625, 1627.4324]
2025-05-13 15:55:23,826 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:55:23,836 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 53 minutes, 24 seconds)
2025-05-13 15:59:48,929 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:00:10,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1589.33667 ± 31.458
2025-05-13 16:00:10,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1620.267, 1649.9877, 1571.2517, 1542.7096, 1586.7568, 1596.3116, 1555.9686, 1577.5732, 1622.3108, 1570.2307]
2025-05-13 16:00:10,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:00:10,738 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 47 minutes, 38 seconds)
2025-05-13 16:04:15,016 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:04:36,834 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1568.33533 ± 44.554
2025-05-13 16:04:36,834 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1617.2588, 1628.8414, 1510.8896, 1526.6997, 1571.1913, 1513.045, 1582.2849, 1594.3566, 1520.2834, 1618.5029]
2025-05-13 16:04:36,834 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:04:36,841 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 42 minutes, 15 seconds)
2025-05-13 16:08:41,197 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:09:02,453 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1473.38477 ± 373.353
2025-05-13 16:09:02,453 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1521.7815, 1170.3278, 1586.671, 1572.9315, 1707.989, 451.6374, 1649.3909, 1663.664, 1661.3429, 1748.1118]
2025-05-13 16:09:02,453 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [910.0, 727.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:09:02,464 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 37 minutes, 4 seconds)
2025-05-13 16:12:50,368 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:13:10,303 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1452.11316 ± 412.485
2025-05-13 16:13:10,303 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1551.5156, 1567.398, 1593.5927, 1662.3372, 220.38113, 1540.8228, 1589.7113, 1553.8005, 1583.044, 1658.5295]
2025-05-13 16:13:10,303 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 140.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:13:10,310 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 32 minutes, 25 seconds)
2025-05-13 16:17:20,176 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:17:42,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1656.07739 ± 33.955
2025-05-13 16:17:42,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1636.3088, 1673.8812, 1620.1865, 1653.9359, 1616.2296, 1636.8422, 1731.4172, 1649.818, 1697.5919, 1644.5626]
2025-05-13 16:17:42,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:17:42,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1656.08) for latency ExtremeSparseL4U32
2025-05-13 16:17:42,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 26 minutes, 46 seconds)
2025-05-13 16:21:45,881 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:22:05,433 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1477.40405 ± 485.462
2025-05-13 16:22:05,433 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1673.4065, 1658.4569, 1624.796, 1626.5247, 1646.0499, 1699.9327, 1388.6831, 45.172634, 1691.5913, 1719.4281]
2025-05-13 16:22:05,433 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 905.0, 48.0, 1000.0, 1000.0]
2025-05-13 16:22:05,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 21 minutes, 54 seconds)
2025-05-13 16:26:30,470 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:26:52,233 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1652.39282 ± 57.119
2025-05-13 16:26:52,233 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1659.6436, 1627.8925, 1718.3582, 1557.3124, 1687.5499, 1650.2439, 1563.4415, 1650.6301, 1749.2014, 1659.655]
2025-05-13 16:26:52,233 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 962.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:26:52,241 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 17 minutes, 48 seconds)
2025-05-13 16:31:33,470 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:31:55,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1594.92725 ± 42.314
2025-05-13 16:31:55,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1663.3319, 1649.2877, 1630.0736, 1598.1562, 1518.5696, 1588.1625, 1571.1417, 1547.7133, 1578.5957, 1604.2402]
2025-05-13 16:31:55,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:31:55,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 13 minutes, 43 seconds)
2025-05-13 16:36:31,802 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:36:54,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1501.36938 ± 298.609
2025-05-13 16:36:54,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1517.2397, 1569.0508, 1664.601, 1657.532, 1632.6667, 1633.6313, 1580.9817, 1545.0231, 1596.8743, 616.0941]
2025-05-13 16:36:54,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 395.0]
2025-05-13 16:36:54,620 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 9 minutes, 29 seconds)
2025-05-13 16:41:10,766 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:41:33,611 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1597.10107 ± 85.667
2025-05-13 16:41:33,611 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1683.7058, 1435.0895, 1586.198, 1738.0618, 1603.1593, 1509.3986, 1569.453, 1540.4341, 1623.1458, 1682.3651]
2025-05-13 16:41:33,611 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 878.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:41:33,626 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 46 seconds)
2025-05-13 16:45:51,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:46:13,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1221 [DEBUG]: Total Reward: 1661.40698 ± 43.420
2025-05-13 16:46:13,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1222 [DEBUG]: All rewards: [1697.76, 1692.5448, 1686.6692, 1703.809, 1685.9447, 1638.0488, 1583.5741, 1660.8861, 1683.5991, 1581.2346]
2025-05-13 16:46:13,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 943.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:46:13,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1226 [INFO]: New best (1661.41) for latency ExtremeSparseL4U32
2025-05-13 16:46:13,410 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1251 [DEBUG]: Training session finished
