2025-05-13 09:06:33,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mda-mem32
2025-05-13 09:06:33,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mda-mem32
2025-05-13 09:06:33,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x145b1830d5d0>}
2025-05-13 09:06:33,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:33,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1133 [INFO]: Creating new trainer
2025-05-13 09:06:33,579 baseline-bpql-mda-noisy-walker2d:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-13 09:06:33,579 baseline-bpql-mda-noisy-walker2d:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:33,585 baseline-bpql-mda-noisy-walker2d:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(6, 384, batch_first=True)
)
2025-05-13 09:06:34,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:34,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 1/100
2025-05-13 09:10:30,346 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:10:31,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 15.87014 ± 0.992
2025-05-13 09:10:31,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [16.84419, 16.303356, 15.366737, 15.806807, 15.121841, 17.815365, 16.24664, 14.512419, 14.459814, 16.224228]
2025-05-13 09:10:31,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 42.0, 43.0, 41.0, 42.0, 43.0, 42.0, 26.0, 43.0, 41.0]
2025-05-13 09:10:31,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (15.87) for latency ExtremeSparseL4U32
2025-05-13 09:10:31,220 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 6 hours, 30 minutes, 59 seconds)
2025-05-13 09:14:36,270 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:14:40,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 60.97961 ± 110.713
2025-05-13 09:14:40,536 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [24.337965, 7.099827, 20.729448, 363.56696, 14.195465, -27.873533, 37.238586, -2.2323, 154.24835, 18.485357]
2025-05-13 09:14:40,536 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [81.0, 115.0, 29.0, 544.0, 175.0, 230.0, 140.0, 169.0, 453.0, 29.0]
2025-05-13 09:14:40,536 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (60.98) for latency ExtremeSparseL4U32
2025-05-13 09:14:40,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 37 minutes, 7 seconds)
2025-05-13 09:18:42,764 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:18:44,010 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 42.60964 ± 24.809
2025-05-13 09:18:44,010 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [46.03918, 21.999771, 16.084747, 46.472168, 39.1735, 16.030298, 20.832935, 95.79769, 72.770325, 50.895824]
2025-05-13 09:18:44,010 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [69.0, 33.0, 28.0, 71.0, 63.0, 29.0, 32.0, 97.0, 90.0, 72.0]
2025-05-13 09:18:44,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 6 hours, 33 minutes, 15 seconds)
2025-05-13 09:22:48,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:22:50,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 138.77237 ± 93.687
2025-05-13 09:22:50,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [183.4386, 245.33257, 211.94026, 18.131464, 260.03134, 20.09043, 164.3449, 208.02943, 33.57057, 42.81416]
2025-05-13 09:22:50,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [121.0, 160.0, 121.0, 30.0, 168.0, 32.0, 110.0, 127.0, 50.0, 61.0]
2025-05-13 09:22:50,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (138.77) for latency ExtremeSparseL4U32
2025-05-13 09:22:50,306 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 6 hours, 30 minutes, 25 seconds)
2025-05-13 09:26:54,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:26:57,482 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 243.73067 ± 109.580
2025-05-13 09:26:57,482 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [247.65622, 243.42061, 431.88992, 16.493225, 237.08131, 335.2639, 200.00528, 214.04387, 152.00279, 359.44983]
2025-05-13 09:26:57,482 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [151.0, 150.0, 256.0, 29.0, 162.0, 221.0, 114.0, 120.0, 99.0, 196.0]
2025-05-13 09:26:57,482 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (243.73) for latency ExtremeSparseL4U32
2025-05-13 09:26:57,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 6 hours, 27 minutes, 21 seconds)
2025-05-13 09:31:03,737 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:31:06,776 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 155.17906 ± 78.753
2025-05-13 09:31:06,776 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [292.79538, 152.86642, 77.74035, 204.14508, 89.10464, 20.49551, 239.76552, 121.28348, 215.12282, 138.47144]
2025-05-13 09:31:06,776 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [193.0, 141.0, 139.0, 221.0, 114.0, 31.0, 140.0, 156.0, 113.0, 142.0]
2025-05-13 09:31:06,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 6 hours, 27 minutes, 8 seconds)
2025-05-13 09:35:13,752 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:35:16,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 247.76535 ± 168.891
2025-05-13 09:35:16,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [428.05524, 18.30051, 210.02142, 388.6982, 444.64975, 420.40588, 283.7684, 14.277738, 254.87416, 14.602041]
2025-05-13 09:35:16,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [188.0, 29.0, 120.0, 204.0, 244.0, 225.0, 150.0, 30.0, 131.0, 26.0]
2025-05-13 09:35:16,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (247.77) for latency ExtremeSparseL4U32
2025-05-13 09:35:16,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 6 hours, 23 minutes, 11 seconds)
2025-05-13 09:39:19,702 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:39:22,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 224.70247 ± 70.934
2025-05-13 09:39:22,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [270.36032, 256.4871, 366.8017, 145.86513, 313.0685, 199.7128, 185.93214, 135.66624, 182.01804, 191.11269]
2025-05-13 09:39:22,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [154.0, 147.0, 227.0, 84.0, 142.0, 114.0, 104.0, 80.0, 111.0, 105.0]
2025-05-13 09:39:22,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 6 hours, 19 minutes, 48 seconds)
2025-05-13 09:43:25,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:43:27,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 227.86487 ± 193.557
2025-05-13 09:43:27,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [257.7951, 18.980455, 19.422703, 168.98212, 213.76334, 18.444054, 242.47037, 665.53955, 445.51083, 227.74034]
2025-05-13 09:43:27,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [138.0, 31.0, 32.0, 100.0, 117.0, 29.0, 116.0, 320.0, 211.0, 134.0]
2025-05-13 09:43:27,858 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 15 minutes, 23 seconds)
2025-05-13 09:47:33,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:47:36,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 228.09732 ± 162.090
2025-05-13 09:47:36,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [234.74295, 206.28282, 19.823357, 214.98436, 170.77425, 14.040615, 194.91307, 573.8221, 447.93826, 203.65134]
2025-05-13 09:47:36,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [146.0, 93.0, 32.0, 142.0, 122.0, 27.0, 106.0, 241.0, 237.0, 133.0]
2025-05-13 09:47:36,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 6 hours, 11 minutes, 40 seconds)
2025-05-13 09:51:36,444 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:51:40,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 313.26849 ± 174.264
2025-05-13 09:51:40,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [346.06787, 311.9295, 555.25507, 192.15779, 713.6749, 167.8724, 214.03467, 186.39458, 170.00806, 275.29022]
2025-05-13 09:51:40,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [213.0, 175.0, 304.0, 109.0, 408.0, 127.0, 121.0, 103.0, 119.0, 154.0]
2025-05-13 09:51:40,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (313.27) for latency ExtremeSparseL4U32
2025-05-13 09:51:40,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 6 hours, 5 minutes, 58 seconds)
2025-05-13 09:55:39,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:55:41,404 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 128.03287 ± 82.900
2025-05-13 09:55:41,404 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [194.70303, 309.089, 21.014711, 176.61047, 85.67896, 172.74994, 84.439445, 22.913048, 109.67618, 103.45396]
2025-05-13 09:55:41,404 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [160.0, 164.0, 33.0, 108.0, 87.0, 121.0, 73.0, 33.0, 143.0, 148.0]
2025-05-13 09:55:41,410 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 59 minutes, 15 seconds)
2025-05-13 09:59:43,597 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:59:46,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 236.26289 ± 144.177
2025-05-13 09:59:46,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [138.19934, 193.6726, 176.77542, 168.12097, 233.62265, 206.26874, 532.27155, 461.57376, 14.984975, 237.13913]
2025-05-13 09:59:46,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [137.0, 104.0, 111.0, 110.0, 130.0, 121.0, 264.0, 217.0, 26.0, 168.0]
2025-05-13 09:59:46,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 54 minutes, 59 seconds)
2025-05-13 10:03:51,028 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:03:54,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 281.34195 ± 193.298
2025-05-13 10:03:54,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [36.50147, 306.72305, 257.34033, 229.21114, 755.4038, 256.77228, 279.83057, 408.89417, 21.030325, 261.71228]
2025-05-13 10:03:54,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [51.0, 136.0, 129.0, 138.0, 484.0, 157.0, 129.0, 185.0, 33.0, 137.0]
2025-05-13 10:03:54,422 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 51 minutes, 36 seconds)
2025-05-13 10:07:53,726 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:08:01,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 482.28183 ± 342.718
2025-05-13 10:08:01,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [245.72813, 331.87292, 1009.2944, 20.33621, 496.77277, 376.069, 417.10898, 1042.2306, 72.74098, 810.66437]
2025-05-13 10:08:01,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [171.0, 226.0, 1000.0, 31.0, 235.0, 194.0, 219.0, 751.0, 142.0, 432.0]
2025-05-13 10:08:01,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (482.28) for latency ExtremeSparseL4U32
2025-05-13 10:08:01,329 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 47 minutes, 3 seconds)
2025-05-13 10:12:05,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:12:11,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 399.63626 ± 307.412
2025-05-13 10:12:11,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [321.17303, 160.0569, 16.474361, 411.43893, 255.9352, 1074.5459, 809.60236, 233.52864, 528.60345, 185.00357]
2025-05-13 10:12:11,344 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [182.0, 100.0, 30.0, 187.0, 119.0, 1000.0, 343.0, 211.0, 242.0, 155.0]
2025-05-13 10:12:11,351 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 44 minutes, 39 seconds)
2025-05-13 10:16:10,396 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:16:13,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 196.26657 ± 180.755
2025-05-13 10:16:13,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [18.097149, 463.18994, 261.91385, 473.83255, 81.32366, 88.50013, 17.978539, 11.398128, 132.24203, 414.1898]
2025-05-13 10:16:13,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [32.0, 257.0, 124.0, 261.0, 77.0, 138.0, 29.0, 23.0, 117.0, 215.0]
2025-05-13 10:16:13,121 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 40 minutes, 46 seconds)
2025-05-13 10:20:13,101 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:20:16,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 292.16077 ± 138.710
2025-05-13 10:20:16,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [154.81192, 183.47006, 244.44955, 477.87497, 422.808, 287.19666, 424.5425, 18.71874, 416.9347, 290.80045]
2025-05-13 10:20:16,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [94.0, 111.0, 110.0, 294.0, 181.0, 143.0, 168.0, 30.0, 177.0, 180.0]
2025-05-13 10:20:16,347 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 36 minutes, 7 seconds)
2025-05-13 10:24:18,482 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:24:22,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 347.59372 ± 295.105
2025-05-13 10:24:22,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [702.40533, 192.74956, 1072.2809, 357.7241, 250.0049, 112.36612, 228.21295, 30.013603, 297.86743, 232.31224]
2025-05-13 10:24:22,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [350.0, 224.0, 530.0, 164.0, 121.0, 151.0, 129.0, 50.0, 166.0, 130.0]
2025-05-13 10:24:22,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 31 minutes, 39 seconds)
2025-05-13 10:28:29,453 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:28:33,688 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 267.63318 ± 150.052
2025-05-13 10:28:33,688 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [271.55344, 292.0957, 173.88623, 136.83159, 199.32462, 464.70956, 99.41541, 613.5003, 236.37927, 188.63556]
2025-05-13 10:28:33,688 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [258.0, 156.0, 174.0, 126.0, 160.0, 275.0, 137.0, 399.0, 135.0, 143.0]
2025-05-13 10:28:33,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 28 minutes, 37 seconds)
2025-05-13 10:32:30,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:32:33,433 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 283.59286 ± 152.698
2025-05-13 10:32:33,433 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [220.99867, 22.007982, 326.5063, 193.1254, 595.35175, 226.36066, 189.37796, 436.27332, 403.55432, 222.37251]
2025-05-13 10:32:33,433 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [107.0, 32.0, 146.0, 128.0, 276.0, 133.0, 117.0, 204.0, 222.0, 121.0]
2025-05-13 10:32:33,439 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 21 minutes, 48 seconds)
2025-05-13 10:36:37,388 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:36:41,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 341.73434 ± 223.118
2025-05-13 10:36:41,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [118.55275, 275.84384, 421.4378, 421.5094, 440.69583, 20.013914, 204.17519, 854.5478, 197.08046, 463.48648]
2025-05-13 10:36:41,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [114.0, 134.0, 232.0, 182.0, 245.0, 33.0, 168.0, 369.0, 125.0, 196.0]
2025-05-13 10:36:41,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 19 minutes, 19 seconds)
2025-05-13 10:40:42,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:40:45,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 302.87860 ± 186.253
2025-05-13 10:40:45,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [334.03653, 19.403427, 348.9, 264.38712, 714.72437, 321.12228, 17.566122, 336.51062, 275.7148, 396.42062]
2025-05-13 10:40:45,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [139.0, 29.0, 164.0, 125.0, 311.0, 130.0, 30.0, 148.0, 129.0, 178.0]
2025-05-13 10:40:45,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 15 minutes, 22 seconds)
2025-05-13 10:44:46,837 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:44:50,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 343.52121 ± 219.760
2025-05-13 10:44:50,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [598.3607, 267.63733, 56.22647, 812.4869, 125.11897, 265.75754, 512.22784, 198.55223, 335.89627, 262.9477]
2025-05-13 10:44:50,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [335.0, 120.0, 86.0, 385.0, 110.0, 122.0, 200.0, 201.0, 177.0, 174.0]
2025-05-13 10:44:50,994 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 11 minutes, 8 seconds)
2025-05-13 10:48:53,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:48:58,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 482.30475 ± 410.519
2025-05-13 10:48:58,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [335.97162, 421.63626, 245.27376, 331.90836, 195.76001, 1557.141, 317.11047, 146.52745, 355.87216, 915.8464]
2025-05-13 10:48:58,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [154.0, 278.0, 134.0, 216.0, 160.0, 798.0, 149.0, 143.0, 174.0, 386.0]
2025-05-13 10:48:58,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (482.30) for latency ExtremeSparseL4U32
2025-05-13 10:48:58,949 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 6 minutes, 18 seconds)
2025-05-13 10:53:07,108 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:53:11,167 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 389.22186 ± 300.227
2025-05-13 10:53:11,167 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [883.59955, 615.1112, 137.02235, 792.0759, 232.41919, 20.077944, 625.9124, 21.800724, 333.11823, 231.08139]
2025-05-13 10:53:11,167 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [364.0, 269.0, 142.0, 417.0, 124.0, 30.0, 260.0, 32.0, 138.0, 126.0]
2025-05-13 10:53:11,173 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 5 minutes, 18 seconds)
2025-05-13 10:57:11,013 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:57:15,758 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 427.30499 ± 151.727
2025-05-13 10:57:15,758 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [563.94, 302.6399, 772.0625, 269.3449, 311.14758, 446.14087, 522.8463, 462.80673, 289.50693, 332.61423]
2025-05-13 10:57:15,758 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [219.0, 150.0, 343.0, 253.0, 151.0, 202.0, 305.0, 256.0, 149.0, 159.0]
2025-05-13 10:57:15,766 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 23 seconds)
2025-05-13 11:01:21,210 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:01:26,424 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 468.53857 ± 225.443
2025-05-13 11:01:26,424 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [701.2764, 580.8627, 181.24559, 503.2958, 784.18896, 178.80887, 187.90451, 752.62024, 347.37015, 467.81232]
2025-05-13 11:01:26,424 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [275.0, 266.0, 114.0, 217.0, 436.0, 197.0, 111.0, 471.0, 163.0, 182.0]
2025-05-13 11:01:26,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 57 minutes, 55 seconds)
2025-05-13 11:05:28,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:05:34,469 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 594.37512 ± 373.039
2025-05-13 11:05:34,469 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1010.9098, 418.90933, 564.60785, 19.437424, 588.54626, 684.42365, 1326.3285, 238.36505, 846.57135, 245.65196]
2025-05-13 11:05:34,469 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [412.0, 176.0, 219.0, 30.0, 286.0, 391.0, 521.0, 123.0, 372.0, 113.0]
2025-05-13 11:05:34,469 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (594.38) for latency ExtremeSparseL4U32
2025-05-13 11:05:34,475 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 54 minutes, 17 seconds)
2025-05-13 11:09:33,303 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:09:41,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 871.38739 ± 858.983
2025-05-13 11:09:41,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [393.15466, 627.8389, 338.40503, 539.66797, 1490.8491, 14.233211, 2382.8992, 2435.6328, 67.84672, 423.34705]
2025-05-13 11:09:41,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [199.0, 296.0, 159.0, 227.0, 563.0, 25.0, 1000.0, 1000.0, 96.0, 216.0]
2025-05-13 11:09:41,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (871.39) for latency ExtremeSparseL4U32
2025-05-13 11:09:41,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 49 minutes, 57 seconds)
2025-05-13 11:13:49,612 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:13:51,944 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 189.78690 ± 121.091
2025-05-13 11:13:51,944 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [63.911198, 259.68503, 463.92346, 245.9451, 212.4748, 204.1965, 19.61397, 165.28465, 58.960808, 203.87335]
2025-05-13 11:13:51,944 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [62.0, 124.0, 264.0, 121.0, 126.0, 93.0, 32.0, 98.0, 58.0, 98.0]
2025-05-13 11:13:51,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 45 minutes, 22 seconds)
2025-05-13 11:17:50,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:17:56,843 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 621.28503 ± 410.364
2025-05-13 11:17:56,843 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [891.1766, 1079.3014, 913.58673, 1285.5573, 842.1914, 195.01128, 390.9102, 15.340672, 392.95178, 206.82272]
2025-05-13 11:17:56,843 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [312.0, 564.0, 308.0, 517.0, 363.0, 159.0, 180.0, 30.0, 177.0, 95.0]
2025-05-13 11:17:56,851 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 41 minutes, 18 seconds)
2025-05-13 11:22:13,503 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:22:25,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1401.83264 ± 958.850
2025-05-13 11:22:25,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2502.3538, 1136.535, 2408.6384, 196.43623, 100.07811, 416.14728, 1101.3007, 2615.269, 1083.5289, 2458.0378]
2025-05-13 11:22:25,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [923.0, 463.0, 803.0, 102.0, 105.0, 216.0, 455.0, 945.0, 376.0, 1000.0]
2025-05-13 11:22:25,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1401.83) for latency ExtremeSparseL4U32
2025-05-13 11:22:25,465 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 41 minutes, 11 seconds)
2025-05-13 11:26:21,265 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:26:28,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 787.35779 ± 691.916
2025-05-13 11:26:28,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [990.713, 218.5223, 900.85345, 514.26685, 341.7801, 727.6114, 598.38544, 2739.7378, 329.11154, 512.59644]
2025-05-13 11:26:28,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [431.0, 111.0, 352.0, 306.0, 281.0, 336.0, 230.0, 1000.0, 170.0, 182.0]
2025-05-13 11:26:28,646 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 35 minutes, 55 seconds)
2025-05-13 11:30:25,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:30:32,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 871.84863 ± 748.519
2025-05-13 11:30:32,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [19.417, 481.28833, 11.599355, 2207.8008, 1308.1836, 1130.1132, 505.73987, 1179.4564, 1860.6113, 14.276314]
2025-05-13 11:30:32,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [32.0, 212.0, 23.0, 715.0, 547.0, 416.0, 204.0, 452.0, 653.0, 26.0]
2025-05-13 11:30:32,812 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 31 minutes, 5 seconds)
2025-05-13 11:34:36,131 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:34:44,677 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1027.40564 ± 912.837
2025-05-13 11:34:44,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [423.13672, 21.795557, 274.18933, 477.56824, 1155.6368, 2177.3108, 3143.5227, 1188.1573, 844.2484, 568.4917]
2025-05-13 11:34:44,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [236.0, 33.0, 162.0, 220.0, 514.0, 698.0, 1000.0, 450.0, 317.0, 227.0]
2025-05-13 11:34:44,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 27 minutes, 14 seconds)
2025-05-13 11:38:42,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:38:50,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1002.45496 ± 1016.888
2025-05-13 11:38:50,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [86.68358, 468.71722, 1720.1697, 2826.2686, 474.64963, 959.73126, 19.975193, 203.7299, 2793.91, 470.71518]
2025-05-13 11:38:50,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [105.0, 194.0, 608.0, 903.0, 217.0, 385.0, 33.0, 104.0, 1000.0, 189.0]
2025-05-13 11:38:50,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 23 minutes, 19 seconds)
2025-05-13 11:43:09,401 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:43:14,395 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 631.28644 ± 741.571
2025-05-13 11:43:14,396 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [235.40608, 1408.2467, 306.52435, 276.40454, 779.35187, 219.94025, 2546.1284, 280.71545, 19.619791, 240.52682]
2025-05-13 11:43:14,396 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [104.0, 501.0, 130.0, 125.0, 255.0, 117.0, 773.0, 121.0, 32.0, 108.0]
2025-05-13 11:43:14,403 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 18 minutes, 6 seconds)
2025-05-13 11:47:02,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:47:08,884 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 739.72095 ± 734.915
2025-05-13 11:47:08,884 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [424.50586, 567.67224, 927.85065, 332.12732, 443.47202, 16.70543, 1469.7739, 614.3299, 2581.5188, 19.253197]
2025-05-13 11:47:08,884 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [185.0, 268.0, 338.0, 159.0, 190.0, 28.0, 543.0, 221.0, 865.0, 28.0]
2025-05-13 11:47:08,891 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 12 minutes, 10 seconds)
2025-05-13 11:51:20,252 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:51:27,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 887.76300 ± 1011.282
2025-05-13 11:51:27,861 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [458.3077, 413.62967, 2018.0588, 21.28671, 2362.9768, 2775.4368, 225.68254, 20.312836, 561.7146, 20.22253]
2025-05-13 11:51:27,861 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [180.0, 166.0, 725.0, 32.0, 956.0, 1000.0, 115.0, 32.0, 233.0, 32.0]
2025-05-13 11:51:27,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 11 minutes)
2025-05-13 11:55:24,581 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:55:29,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 549.06757 ± 697.205
2025-05-13 11:55:29,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [12.985768, 2518.242, 16.00159, 438.5338, 308.7319, 181.38531, 298.0502, 821.19116, 291.25015, 604.3041]
2025-05-13 11:55:29,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 802.0, 28.0, 255.0, 133.0, 141.0, 148.0, 353.0, 144.0, 256.0]
2025-05-13 11:55:29,509 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 4 minutes, 48 seconds)
2025-05-13 11:59:31,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:59:38,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 873.50732 ± 1057.400
2025-05-13 11:59:38,841 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [484.8443, 498.20383, 2985.5486, 366.96848, 2916.1565, 17.084976, 786.1167, 196.4084, 211.6452, 272.09537]
2025-05-13 11:59:38,841 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [185.0, 194.0, 1000.0, 143.0, 1000.0, 33.0, 257.0, 113.0, 122.0, 126.0]
2025-05-13 11:59:38,847 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 1 minute, 17 seconds)
2025-05-13 12:03:44,029 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:03:51,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 779.80261 ± 726.519
2025-05-13 12:03:51,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [443.0121, 807.65485, 860.0655, 259.63788, 895.82404, 328.19205, 14.599864, 21.922586, 2318.2817, 1848.8354]
2025-05-13 12:03:51,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [204.0, 344.0, 537.0, 116.0, 354.0, 200.0, 25.0, 45.0, 808.0, 638.0]
2025-05-13 12:03:51,177 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 54 minutes, 59 seconds)
2025-05-13 12:08:08,946 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:08:20,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1595.91724 ± 1001.450
2025-05-13 12:08:20,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1703.2712, 469.62375, 1314.3601, 3343.1978, 1259.6107, 2813.5308, 2811.7878, 434.7427, 559.3523, 1249.6943]
2025-05-13 12:08:20,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [552.0, 217.0, 448.0, 1000.0, 430.0, 832.0, 1000.0, 172.0, 225.0, 457.0]
2025-05-13 12:08:20,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1595.92) for latency ExtremeSparseL4U32
2025-05-13 12:08:20,737 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 57 minutes, 24 seconds)
2025-05-13 12:12:14,476 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:12:27,177 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1791.08887 ± 1319.138
2025-05-13 12:12:27,177 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3170.1902, 1213.5679, 3173.2847, 3400.6138, 1456.8417, 20.251688, 1855.4078, 3262.2305, 341.63245, 16.868671]
2025-05-13 12:12:27,177 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 424.0, 1000.0, 1000.0, 467.0, 32.0, 657.0, 1000.0, 150.0, 28.0]
2025-05-13 12:12:27,177 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1791.09) for latency ExtremeSparseL4U32
2025-05-13 12:12:27,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 50 minutes, 52 seconds)
2025-05-13 12:16:29,587 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:16:38,358 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1116.50818 ± 1111.445
2025-05-13 12:16:38,358 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [802.58997, 2492.7979, 16.80266, 310.9272, 263.3967, 3066.029, 195.27126, 1579.5288, 2420.8706, 16.867373]
2025-05-13 12:16:38,358 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [370.0, 859.0, 27.0, 141.0, 112.0, 1000.0, 96.0, 550.0, 768.0, 32.0]
2025-05-13 12:16:38,366 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 48 minutes, 23 seconds)
2025-05-13 12:21:00,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:21:10,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1403.69226 ± 955.018
2025-05-13 12:21:10,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [329.2674, 1740.9918, 2627.4224, 773.7081, 2350.3748, 3058.3071, 162.75496, 1294.5715, 1131.3308, 568.19324]
2025-05-13 12:21:10,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [151.0, 563.0, 827.0, 269.0, 641.0, 927.0, 102.0, 409.0, 407.0, 224.0]
2025-05-13 12:21:10,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 48 minutes, 7 seconds)
2025-05-13 12:25:05,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:25:17,426 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1513.34155 ± 1336.539
2025-05-13 12:25:17,426 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3140.6016, 844.2537, 3112.021, 18.197708, 368.9664, 150.79146, 3088.9783, 3136.2612, 356.71014, 916.6349]
2025-05-13 12:25:17,427 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 306.0, 1000.0, 30.0, 171.0, 204.0, 1000.0, 1000.0, 227.0, 304.0]
2025-05-13 12:25:17,433 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 42 minutes, 57 seconds)
2025-05-13 12:29:18,875 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:29:29,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1288.35303 ± 1198.887
2025-05-13 12:29:29,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [650.6126, 306.31573, 12.229853, 247.09828, 2900.079, 2182.2817, 2999.5642, 2812.0964, 477.35056, 295.90134]
2025-05-13 12:29:29,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [268.0, 161.0, 30.0, 121.0, 1000.0, 830.0, 1000.0, 1000.0, 179.0, 194.0]
2025-05-13 12:29:29,567 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 35 minutes, 42 seconds)
2025-05-13 12:33:41,643 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:33:54,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1802.93811 ± 1456.289
2025-05-13 12:33:54,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3214.5195, 23.72093, 171.80576, 3347.743, 391.97787, 663.5326, 3273.0786, 3268.6067, 3141.3643, 533.0316]
2025-05-13 12:33:54,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 33.0, 114.0, 1000.0, 171.0, 265.0, 1000.0, 1000.0, 1000.0, 222.0]
2025-05-13 12:33:54,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1802.94) for latency ExtremeSparseL4U32
2025-05-13 12:33:54,482 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 34 minutes, 32 seconds)
2025-05-13 12:37:51,496 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:38:03,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1608.19812 ± 1299.463
2025-05-13 12:38:03,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3154.2175, 742.6133, 3184.9116, 1046.3589, 583.6061, 256.70117, 370.938, 3415.367, 385.7852, 2941.4824]
2025-05-13 12:38:03,468 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 292.0, 1000.0, 398.0, 240.0, 110.0, 172.0, 1000.0, 196.0, 1000.0]
2025-05-13 12:38:03,476 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 29 minutes, 54 seconds)
2025-05-13 12:41:57,317 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:42:06,183 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1190.71033 ± 1236.730
2025-05-13 12:42:06,183 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [18.804865, 318.69095, 544.3631, 3306.717, 3395.8486, 22.995008, 588.87103, 2062.998, 254.87593, 1392.9382]
2025-05-13 12:42:06,183 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [31.0, 205.0, 242.0, 1000.0, 1000.0, 46.0, 280.0, 647.0, 125.0, 426.0]
2025-05-13 12:42:06,194 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 20 minutes, 58 seconds)
2025-05-13 12:46:19,611 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:46:24,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 497.76669 ± 785.201
2025-05-13 12:46:24,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [451.98154, 200.41544, 17.86672, 2799.5042, 459.39944, 16.610523, 376.7772, 381.6302, 25.38713, 248.09448]
2025-05-13 12:46:24,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [218.0, 112.0, 31.0, 947.0, 197.0, 28.0, 169.0, 172.0, 45.0, 126.0]
2025-05-13 12:46:24,044 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 18 minutes, 26 seconds)
2025-05-13 12:50:26,991 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:50:36,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1362.82874 ± 1273.224
2025-05-13 12:50:36,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [18.65409, 463.89624, 2439.2253, 20.039227, 494.36353, 2250.1433, 3372.6401, 726.71533, 3339.9438, 502.66562]
2025-05-13 12:50:36,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 180.0, 771.0, 31.0, 215.0, 661.0, 1000.0, 334.0, 995.0, 223.0]
2025-05-13 12:50:36,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 14 minutes, 19 seconds)
2025-05-13 12:54:43,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:54:53,786 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1548.41309 ± 1393.491
2025-05-13 12:54:53,786 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [306.29135, 3521.3706, 3424.727, 15.120098, 585.73114, 2999.8652, 17.41377, 192.05072, 2301.861, 2119.6995]
2025-05-13 12:54:53,786 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [118.0, 957.0, 1000.0, 27.0, 232.0, 1000.0, 31.0, 93.0, 713.0, 589.0]
2025-05-13 12:54:53,794 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 8 minutes, 53 seconds)
2025-05-13 12:58:51,570 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:59:02,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1593.55920 ± 1398.957
2025-05-13 12:59:02,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [188.62358, 1705.267, 2802.171, 251.33098, 217.90532, 3550.2488, 3342.1182, 18.25695, 3120.6116, 739.0581]
2025-05-13 12:59:02,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [88.0, 534.0, 783.0, 128.0, 98.0, 1000.0, 1000.0, 30.0, 942.0, 279.0]
2025-05-13 12:59:02,272 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 4 minutes, 37 seconds)
2025-05-13 13:02:53,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:03:01,278 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1134.32349 ± 1137.140
2025-05-13 13:03:01,278 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2353.665, 593.64667, 2487.0256, 482.96274, 288.11523, 537.12225, 3566.353, 489.3447, 338.5318, 206.46738]
2025-05-13 13:03:01,278 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [669.0, 269.0, 734.0, 224.0, 154.0, 231.0, 1000.0, 238.0, 148.0, 109.0]
2025-05-13 13:03:01,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 59 minutes, 53 seconds)
2025-05-13 13:07:27,316 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:07:39,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1862.97229 ± 1471.417
2025-05-13 13:07:39,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3349.1448, 3354.6643, 337.23093, 12.075734, 3318.62, 368.00638, 276.8496, 1116.2289, 2934.7847, 3562.1174]
2025-05-13 13:07:39,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 917.0, 163.0, 26.0, 1000.0, 202.0, 127.0, 397.0, 846.0, 1000.0]
2025-05-13 13:07:39,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1862.97) for latency ExtremeSparseL4U32
2025-05-13 13:07:39,979 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 58 minutes, 37 seconds)
2025-05-13 13:11:31,483 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:11:42,569 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1629.12268 ± 1390.441
2025-05-13 13:11:42,569 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [392.898, 560.21326, 509.8516, 1492.0393, 3586.3433, 426.19788, 1751.5315, 266.29266, 3495.992, 3809.8677]
2025-05-13 13:11:42,569 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [170.0, 265.0, 267.0, 476.0, 1000.0, 174.0, 507.0, 127.0, 1000.0, 1000.0]
2025-05-13 13:11:42,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 52 minutes, 58 seconds)
2025-05-13 13:15:45,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:15:51,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 850.64099 ± 882.430
2025-05-13 13:15:51,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [361.37506, 19.137962, 321.16617, 1591.1075, 932.1383, 296.81546, 106.61255, 573.7202, 3072.9958, 1231.3406]
2025-05-13 13:15:51,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [155.0, 32.0, 140.0, 477.0, 369.0, 135.0, 191.0, 206.0, 833.0, 380.0]
2025-05-13 13:15:51,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 47 minutes, 45 seconds)
2025-05-13 13:20:09,165 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:20:26,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2785.84814 ± 1288.927
2025-05-13 13:20:26,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3652.7778, 3518.0803, 415.08383, 2505.0918, 3557.6465, 3666.0618, 3767.2043, 3041.8901, 3531.6238, 203.02065]
2025-05-13 13:20:26,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 162.0, 752.0, 1000.0, 1000.0, 1000.0, 845.0, 1000.0, 106.0]
2025-05-13 13:20:26,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (2785.85) for latency ExtremeSparseL4U32
2025-05-13 13:20:26,770 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 46 minutes, 59 seconds)
2025-05-13 13:24:18,920 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:24:29,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1703.13318 ± 1385.471
2025-05-13 13:24:29,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [553.85706, 410.65985, 2731.993, 258.7764, 2787.5789, 557.3705, 3742.9783, 284.3188, 3789.2183, 1914.5825]
2025-05-13 13:24:29,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [203.0, 174.0, 750.0, 198.0, 773.0, 201.0, 1000.0, 136.0, 1000.0, 564.0]
2025-05-13 13:24:29,890 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 43 minutes, 13 seconds)
2025-05-13 13:28:21,890 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:28:32,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1471.04150 ± 1507.780
2025-05-13 13:28:32,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [20.725185, 827.18384, 404.74072, 3126.3467, 182.59012, 152.46002, 17.151075, 3767.3916, 2783.7234, 3428.1018]
2025-05-13 13:28:32,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [32.0, 262.0, 215.0, 1000.0, 105.0, 100.0, 31.0, 1000.0, 822.0, 1000.0]
2025-05-13 13:28:32,029 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 34 minutes, 25 seconds)
2025-05-13 13:32:48,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:33:01,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2138.99097 ± 1490.726
2025-05-13 13:33:01,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3557.0652, 327.61566, 2492.383, 2920.083, 19.40361, 3657.747, 3696.518, 642.4269, 3573.015, 503.6514]
2025-05-13 13:33:01,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 164.0, 688.0, 836.0, 33.0, 1000.0, 1000.0, 234.0, 1000.0, 199.0]
2025-05-13 13:33:01,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 33 minutes, 29 seconds)
2025-05-13 13:36:51,762 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:36:58,221 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 910.16321 ± 1023.682
2025-05-13 13:36:58,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1244.4849, 3560.4832, 322.6024, 460.76022, 313.4917, 732.42645, 14.072698, 1772.6378, 14.572373, 666.10016]
2025-05-13 13:36:58,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [411.0, 1000.0, 144.0, 189.0, 136.0, 262.0, 26.0, 522.0, 31.0, 254.0]
2025-05-13 13:36:58,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 27 minutes, 44 seconds)
2025-05-13 13:41:04,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:41:15,454 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1580.29480 ± 1021.216
2025-05-13 13:41:15,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2259.8015, 1824.8909, 484.0742, 1170.1887, 1020.8886, 2845.2886, 101.26951, 447.3552, 2656.3782, 2992.8132]
2025-05-13 13:41:15,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [671.0, 526.0, 203.0, 401.0, 378.0, 770.0, 121.0, 263.0, 778.0, 794.0]
2025-05-13 13:41:15,465 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 21 minutes, 31 seconds)
2025-05-13 13:45:38,386 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:45:55,381 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2552.76196 ± 1332.643
2025-05-13 13:45:55,381 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3512.3982, 3374.3079, 3589.293, 3278.7695, 19.249321, 3420.537, 3425.236, 3220.8665, 1255.4784, 431.48514]
2025-05-13 13:45:55,381 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 33.0, 1000.0, 1000.0, 1000.0, 421.0, 208.0]
2025-05-13 13:45:55,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 21 minutes, 24 seconds)
2025-05-13 13:49:45,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:49:55,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1637.16113 ± 1621.516
2025-05-13 13:49:55,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [27.046343, 3662.26, 3539.5012, 257.6268, 443.50003, 3509.2317, 235.09572, 19.629288, 1000.7278, 3676.9934]
2025-05-13 13:49:55,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [47.0, 1000.0, 1000.0, 134.0, 169.0, 1000.0, 120.0, 31.0, 331.0, 1000.0]
2025-05-13 13:49:55,763 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 16 minutes, 55 seconds)
2025-05-13 13:54:09,743 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:54:21,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1701.72400 ± 1585.371
2025-05-13 13:54:21,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [410.1871, 365.3542, 3611.083, 459.54407, 3656.2598, 3681.6729, 3621.606, 320.9634, 419.95602, 470.6147]
2025-05-13 13:54:21,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [179.0, 168.0, 1000.0, 191.0, 1000.0, 1000.0, 1000.0, 155.0, 193.0, 195.0]
2025-05-13 13:54:21,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 12 minutes, 12 seconds)
2025-05-13 13:58:23,311 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:58:33,358 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1414.28589 ± 1546.570
2025-05-13 13:58:33,358 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [18.305788, 19.865274, 15.744665, 196.14713, 224.7494, 3197.333, 466.9749, 3479.403, 3254.0955, 3270.2395]
2025-05-13 13:58:33,358 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [31.0, 30.0, 31.0, 181.0, 107.0, 1000.0, 185.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:58:33,366 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 9 minutes, 30 seconds)
2025-05-13 14:02:21,484 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:02:30,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1268.48535 ± 1468.264
2025-05-13 14:02:30,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3478.738, 16.29382, 210.27135, 362.2155, 550.7164, 310.58655, 3561.6975, 3467.066, 383.59537, 343.67303]
2025-05-13 14:02:30,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 32.0, 96.0, 157.0, 208.0, 140.0, 1000.0, 1000.0, 178.0, 137.0]
2025-05-13 14:02:30,165 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 3 minutes, 13 seconds)
2025-05-13 14:06:53,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:06:57,466 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 549.51141 ± 1005.791
2025-05-13 14:06:57,466 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [20.076437, 21.807726, 19.192822, 505.7702, 3495.1277, 407.8237, 19.397081, 443.39087, 547.06476, 15.462826]
2025-05-13 14:06:57,466 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [33.0, 33.0, 32.0, 218.0, 1000.0, 171.0, 32.0, 192.0, 234.0, 28.0]
2025-05-13 14:06:57,477 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 57 minutes, 47 seconds)
2025-05-13 14:10:44,663 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:10:52,389 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1148.00024 ± 1470.536
2025-05-13 14:10:52,389 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3823.2415, 16.46558, 150.56665, 412.99637, 337.549, 296.02423, 330.52267, 2314.0688, 13.436134, 3785.1326]
2025-05-13 14:10:52,389 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 29.0, 159.0, 181.0, 177.0, 134.0, 183.0, 633.0, 25.0, 979.0]
2025-05-13 14:10:52,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 53 minutes, 5 seconds)
2025-05-13 14:15:07,415 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:15:17,641 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1582.50330 ± 1668.216
2025-05-13 14:15:17,642 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3601.0789, 20.841637, 3658.5703, 518.8208, 21.407303, 14.261552, 437.90002, 350.18353, 3580.259, 3621.7102]
2025-05-13 14:15:17,642 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 32.0, 1000.0, 197.0, 32.0, 27.0, 172.0, 157.0, 1000.0, 1000.0]
2025-05-13 14:15:17,653 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 48 minutes, 53 seconds)
2025-05-13 14:19:14,482 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:19:29,639 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2428.78540 ± 1477.980
2025-05-13 14:19:29,639 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3614.9597, 3818.3499, 437.40176, 1870.4524, 2820.827, 21.292051, 3870.4873, 596.37805, 3691.4678, 3546.2356]
2025-05-13 14:19:29,639 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 996.0, 199.0, 571.0, 760.0, 33.0, 1000.0, 226.0, 1000.0, 1000.0]
2025-05-13 14:19:29,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 44 minutes, 41 seconds)
2025-05-13 14:23:41,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:23:52,598 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1815.33496 ± 1676.771
2025-05-13 14:23:52,598 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3937.2798, 614.1883, 18.009275, 773.1094, 3870.0925, 3775.7292, 3840.0918, 504.73502, 513.76447, 306.3519]
2025-05-13 14:23:52,598 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 232.0, 30.0, 288.0, 1000.0, 1000.0, 1000.0, 211.0, 189.0, 138.0]
2025-05-13 14:23:52,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 42 minutes, 35 seconds)
2025-05-13 14:27:51,926 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:28:05,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2015.58728 ± 1422.612
2025-05-13 14:28:05,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [482.2396, 3356.8398, 369.96786, 190.70595, 2288.5334, 3680.9622, 3528.2285, 2573.4321, 335.3803, 3349.5815]
2025-05-13 14:28:05,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [190.0, 1000.0, 161.0, 105.0, 660.0, 1000.0, 1000.0, 1000.0, 155.0, 1000.0]
2025-05-13 14:28:05,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 37 minutes, 14 seconds)
2025-05-13 14:32:15,936 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:32:30,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2356.93140 ± 1262.871
2025-05-13 14:32:30,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [883.7981, 3016.1453, 579.69055, 4005.7683, 2319.9275, 2331.5994, 3487.9846, 3880.7466, 413.43735, 2650.2166]
2025-05-13 14:32:30,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [300.0, 807.0, 212.0, 1000.0, 614.0, 679.0, 880.0, 1000.0, 162.0, 761.0]
2025-05-13 14:32:30,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 35 minutes, 9 seconds)
2025-05-13 14:36:16,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:36:30,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2004.77441 ± 1650.004
2025-05-13 14:36:30,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3457.8162, 3761.1973, 3823.3035, 202.48354, 495.63165, 130.63205, 580.97186, 3800.1875, 3382.6758, 412.84546]
2025-05-13 14:36:30,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 156.0, 205.0, 150.0, 220.0, 1000.0, 1000.0, 189.0]
2025-05-13 14:36:30,092 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 29 minutes, 4 seconds)
2025-05-13 14:40:38,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:40:49,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1739.45154 ± 1632.772
2025-05-13 14:40:49,957 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3692.8972, 974.6495, 384.15848, 417.256, 493.90027, 18.577984, 227.30026, 3727.8342, 3683.56, 3774.3806]
2025-05-13 14:40:49,957 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 338.0, 157.0, 184.0, 200.0, 31.0, 118.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:40:49,967 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 25 minutes, 21 seconds)
2025-05-13 14:45:01,792 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:45:12,533 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1641.16699 ± 1445.970
2025-05-13 14:45:12,533 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [384.13132, 486.3501, 3743.1091, 1593.3293, 1632.7157, 317.91934, 3637.5327, 242.2591, 561.6767, 3812.6472]
2025-05-13 14:45:12,533 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [168.0, 188.0, 1000.0, 461.0, 558.0, 142.0, 1000.0, 115.0, 228.0, 1000.0]
2025-05-13 14:45:12,543 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 21 minutes, 3 seconds)
2025-05-13 14:49:06,264 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:49:20,629 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2273.02222 ± 1643.646
2025-05-13 14:49:20,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3641.3303, 225.06065, 3745.468, 3674.9988, 3384.8425, 18.199875, 433.79218, 3635.8633, 3580.586, 390.0819]
2025-05-13 14:49:20,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 114.0, 1000.0, 1000.0, 1000.0, 30.0, 169.0, 1000.0, 1000.0, 159.0]
2025-05-13 14:49:20,641 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 16 minutes, 29 seconds)
2025-05-13 14:53:23,481 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:53:32,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1452.90576 ± 1566.619
2025-05-13 14:53:32,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3788.9834, 426.74396, 3911.5918, 545.74963, 1000.66974, 16.678885, 907.67883, 15.413079, 3693.2698, 222.27907]
2025-05-13 14:53:32,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 179.0, 1000.0, 219.0, 328.0, 28.0, 332.0, 30.0, 1000.0, 102.0]
2025-05-13 14:53:32,706 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 11 minutes, 32 seconds)
2025-05-13 14:57:50,505 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:58:07,403 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2696.08130 ± 1398.129
2025-05-13 14:58:07,403 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3637.1252, 3626.6982, 3493.012, 3710.5625, 511.58713, 757.0988, 426.52936, 3627.803, 3579.313, 3591.085]
2025-05-13 14:58:07,403 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 204.0, 285.0, 176.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:58:07,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 9 minutes, 11 seconds)
2025-05-13 15:01:50,888 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:02:05,430 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2354.43677 ± 1677.083
2025-05-13 15:02:05,430 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3787.7312, 3665.4353, 18.628712, 17.241922, 3581.652, 17.535141, 3578.5671, 3812.0618, 3711.8687, 1353.644]
2025-05-13 15:02:05,430 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 32.0, 32.0, 1000.0, 27.0, 1000.0, 1000.0, 1000.0, 424.0]
2025-05-13 15:02:05,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 3 minutes, 46 seconds)
2025-05-13 15:06:24,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:06:33,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1535.44482 ± 1459.513
2025-05-13 15:06:33,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1013.7451, 3786.203, 14.783869, 19.41348, 426.44272, 270.59573, 2863.268, 3872.5762, 2349.2488, 738.17126]
2025-05-13 15:06:33,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [304.0, 1000.0, 26.0, 31.0, 162.0, 131.0, 735.0, 1000.0, 638.0, 266.0]
2025-05-13 15:06:33,508 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 59 minutes, 46 seconds)
2025-05-13 15:10:41,194 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:10:59,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3015.45459 ± 1428.054
2025-05-13 15:10:59,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3513.2134, 3838.4238, 344.97714, 3767.233, 13.993866, 3890.815, 3766.0852, 3373.3032, 3830.8682, 3815.6338]
2025-05-13 15:10:59,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 194.0, 1000.0, 25.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:10:59,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (3015.45) for latency ExtremeSparseL4U32
2025-05-13 15:10:59,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 56 minutes, 16 seconds)
2025-05-13 15:14:41,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:14:52,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1705.96619 ± 1502.886
2025-05-13 15:14:52,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [23.03291, 41.07437, 3699.3684, 822.48114, 381.09528, 3831.5525, 3724.3145, 2401.4841, 1607.315, 527.94385]
2025-05-13 15:14:52,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [45.0, 57.0, 1000.0, 313.0, 175.0, 1000.0, 1000.0, 711.0, 475.0, 208.0]
2025-05-13 15:14:52,246 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 51 minutes, 10 seconds)
2025-05-13 15:19:07,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:19:18,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1950.88110 ± 1892.236
2025-05-13 15:19:18,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [224.55078, 13.786007, 3817.2944, 3861.0168, 19.773333, 19.023563, 21.447409, 3878.1863, 3870.3535, 3783.3784]
2025-05-13 15:19:18,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [122.0, 25.0, 1000.0, 1000.0, 30.0, 31.0, 32.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:19:18,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 46 minutes, 37 seconds)
2025-05-13 15:23:22,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:23:36,076 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2224.70947 ± 1649.700
2025-05-13 15:23:36,076 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [378.0453, 4063.9988, 3787.6008, 3865.9192, 21.139935, 1197.4059, 845.61127, 3771.8943, 3749.9106, 565.5694]
2025-05-13 15:23:36,076 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [175.0, 1000.0, 1000.0, 1000.0, 32.0, 348.0, 329.0, 1000.0, 1000.0, 237.0]
2025-05-13 15:23:36,087 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 43 minutes, 1 second)
2025-05-13 15:27:39,065 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:27:57,482 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2907.06006 ± 1294.464
2025-05-13 15:27:57,482 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3321.1052, 3577.9353, 3498.5906, 3517.4539, 3581.1624, 326.89468, 3588.096, 322.87845, 3660.5437, 3675.941]
2025-05-13 15:27:57,482 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 144.0, 1000.0, 136.0, 1000.0, 1000.0]
2025-05-13 15:27:57,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 38 minutes, 31 seconds)
2025-05-13 15:32:00,128 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:32:08,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1328.36401 ± 1642.155
2025-05-13 15:32:08,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [447.5997, 332.1391, 15.947416, 3770.5864, 11.320269, 224.84155, 3946.4832, 487.23547, 3760.136, 287.353]
2025-05-13 15:32:08,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [224.0, 165.0, 29.0, 1000.0, 24.0, 113.0, 1000.0, 222.0, 1000.0, 130.0]
2025-05-13 15:32:08,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 33 minutes, 50 seconds)
2025-05-13 15:36:07,004 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:36:16,134 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1388.89014 ± 1552.212
2025-05-13 15:36:16,134 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3817.3528, 3646.3784, 11.429034, 773.1662, 3693.6172, 754.99115, 16.371902, 15.043519, 456.07675, 704.47455]
2025-05-13 15:36:16,134 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 22.0, 257.0, 1000.0, 285.0, 26.0, 25.0, 191.0, 271.0]
2025-05-13 15:36:16,144 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 29 minutes, 57 seconds)
2025-05-13 15:40:19,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:40:24,090 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 532.42035 ± 523.250
2025-05-13 15:40:24,090 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [13.069356, 217.0079, 300.37552, 1311.4413, 26.704952, 597.76843, 806.5934, 450.59332, 1582.1636, 18.485699]
2025-05-13 15:40:24,090 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 107.0, 141.0, 403.0, 46.0, 219.0, 291.0, 251.0, 569.0, 29.0]
2025-05-13 15:40:24,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 25 minutes, 18 seconds)
2025-05-13 15:44:48,596 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:45:04,710 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2620.22412 ± 1591.887
2025-05-13 15:45:04,710 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3760.5386, 3897.423, 3783.5137, 2822.3132, 448.2318, 3726.6753, 229.42299, 3759.0754, 18.42953, 3756.6172]
2025-05-13 15:45:04,710 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 819.0, 193.0, 1000.0, 107.0, 1000.0, 30.0, 1000.0]
2025-05-13 15:45:04,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 21 minutes, 28 seconds)
2025-05-13 15:48:51,186 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:48:55,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 572.14307 ± 1039.271
2025-05-13 15:48:55,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [628.8015, 18.241919, 313.52844, 254.23903, 449.2327, 17.708364, 19.394957, 361.40607, 28.209352, 3630.6685]
2025-05-13 15:48:55,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [269.0, 31.0, 129.0, 110.0, 176.0, 39.0, 31.0, 168.0, 48.0, 1000.0]
2025-05-13 15:48:55,601 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 16 minutes, 46 seconds)
2025-05-13 15:53:44,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:53:59,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2561.40259 ± 1694.539
2025-05-13 15:53:59,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [548.06836, 376.39725, 3870.6523, 3994.1653, 4079.5652, 586.74316, 3942.7043, 3796.986, 443.9674, 3974.777]
2025-05-13 15:53:59,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [230.0, 155.0, 1000.0, 1000.0, 1000.0, 223.0, 1000.0, 1000.0, 204.0, 1000.0]
2025-05-13 15:53:59,832 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 13 minutes, 6 seconds)
2025-05-13 15:58:19,932 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:58:40,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2972.43359 ± 1419.335
2025-05-13 15:58:40,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [18.715183, 296.5537, 3550.4636, 3780.6458, 3747.5854, 3677.8855, 3224.9246, 3794.5352, 3746.0603, 3886.9666]
2025-05-13 15:58:40,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [31.0, 145.0, 906.0, 1000.0, 1000.0, 1000.0, 862.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:58:40,990 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 8 minutes, 57 seconds)
2025-05-13 16:03:00,009 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:03:15,245 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2546.60107 ± 1349.318
2025-05-13 16:03:15,245 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3827.38, 862.1433, 3860.3257, 3819.9749, 3837.8171, 3238.0664, 562.63965, 2267.8608, 496.98788, 2692.8137]
2025-05-13 16:03:15,245 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 293.0, 1000.0, 1000.0, 1000.0, 842.0, 221.0, 625.0, 226.0, 723.0]
2025-05-13 16:03:15,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 34 seconds)
2025-05-13 16:07:11,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:07:19,484 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1218.30396 ± 1303.333
2025-05-13 16:07:19,484 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3441.379, 1557.3092, 436.82605, 330.60397, 361.37363, 3989.7717, 764.83527, 411.3815, 268.3385, 621.2205]
2025-05-13 16:07:19,484 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [840.0, 436.0, 176.0, 148.0, 152.0, 1000.0, 278.0, 215.0, 117.0, 292.0]
2025-05-13 16:07:19,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1251 [DEBUG]: Training session finished
