2025-05-13 09:06:34,841 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mda-mem4
2025-05-13 09:06:34,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-bpql-mda-mem4
2025-05-13 09:06:34,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1477cdf2d590>}
2025-05-13 09:06:34,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:34,845 baseline-bpql-mda-noisy-walker2d:91 [WARNING]: args.assumed_delay != args.horizon: 4 != 32
2025-05-13 09:06:34,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1133 [INFO]: Creating new trainer
2025-05-13 09:06:34,862 baseline-bpql-mda-noisy-walker2d:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-13 09:06:34,862 baseline-bpql-mda-noisy-walker2d:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:34,867 baseline-bpql-mda-noisy-walker2d:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(6, 384, batch_first=True)
)
2025-05-13 09:06:35,557 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:35,557 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 1/100
2025-05-13 09:10:11,272 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:10:13,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 34.67316 ± 110.891
2025-05-13 09:10:13,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [-37.544975, 15.894595, -34.887074, -23.885979, -26.605883, -41.22882, -15.052287, 5.442742, 224.30519, 280.2941]
2025-05-13 09:10:13,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [101.0, 115.0, 105.0, 128.0, 117.0, 83.0, 65.0, 144.0, 246.0, 263.0]
2025-05-13 09:10:13,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (34.67) for latency ExtremeSparseL4U32
2025-05-13 09:10:13,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 58 minutes, 49 seconds)
2025-05-13 09:14:00,401 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:14:02,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 135.94742 ± 174.954
2025-05-13 09:14:02,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3.22377, 611.21875, 40.35305, 27.240473, 176.09286, 74.57183, 30.11734, 17.895414, 137.36621, 241.39444]
2025-05-13 09:14:02,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [45.0, 841.0, 48.0, 38.0, 251.0, 82.0, 135.0, 42.0, 205.0, 253.0]
2025-05-13 09:14:02,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (135.95) for latency ExtremeSparseL4U32
2025-05-13 09:14:02,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 5 minutes, 23 seconds)
2025-05-13 09:17:50,700 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:17:53,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 260.27124 ± 179.903
2025-05-13 09:17:53,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [37.684944, 89.10652, 172.0439, 427.94696, 547.6317, 504.55148, 141.37161, 350.25342, 281.5517, 50.570072]
2025-05-13 09:17:53,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [55.0, 262.0, 86.0, 274.0, 227.0, 302.0, 248.0, 184.0, 160.0, 100.0]
2025-05-13 09:17:53,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (260.27) for latency ExtremeSparseL4U32
2025-05-13 09:17:53,177 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 6 hours, 5 minutes, 9 seconds)
2025-05-13 09:21:41,916 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:21:44,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 305.92877 ± 144.447
2025-05-13 09:21:44,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [269.34595, 347.06683, 172.71222, 417.70667, 451.77365, 53.666935, 425.8324, 82.269325, 407.34705, 431.5663]
2025-05-13 09:21:44,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [129.0, 179.0, 257.0, 250.0, 244.0, 225.0, 249.0, 161.0, 235.0, 236.0]
2025-05-13 09:21:44,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (305.93) for latency ExtremeSparseL4U32
2025-05-13 09:21:44,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 6 hours, 3 minutes, 39 seconds)
2025-05-13 09:25:32,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:25:35,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 404.92108 ± 52.848
2025-05-13 09:25:35,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [389.12244, 401.9389, 376.93628, 366.742, 395.21918, 401.25665, 414.45572, 556.75464, 383.1242, 363.66098]
2025-05-13 09:25:35,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [193.0, 196.0, 176.0, 179.0, 193.0, 179.0, 199.0, 236.0, 202.0, 175.0]
2025-05-13 09:25:35,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (404.92) for latency ExtremeSparseL4U32
2025-05-13 09:25:35,478 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 6 hours, 58 seconds)
2025-05-13 09:29:24,437 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:29:27,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 250.29790 ± 204.495
2025-05-13 09:29:27,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [553.05896, 153.30406, 729.01117, 242.19989, 56.124165, 132.31332, 144.95332, 136.81248, 206.82645, 148.37527]
2025-05-13 09:29:27,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [330.0, 218.0, 341.0, 308.0, 59.0, 78.0, 240.0, 154.0, 110.0, 196.0]
2025-05-13 09:29:27,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 6 hours, 1 minute, 35 seconds)
2025-05-13 09:33:26,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:33:30,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 520.56213 ± 250.976
2025-05-13 09:33:30,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [597.43036, 504.48978, 652.2617, 585.15106, 199.1799, 35.423607, 521.0683, 424.52576, 995.26843, 690.8222]
2025-05-13 09:33:30,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [296.0, 228.0, 301.0, 270.0, 141.0, 48.0, 239.0, 197.0, 1000.0, 286.0]
2025-05-13 09:33:30,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (520.56) for latency ExtremeSparseL4U32
2025-05-13 09:33:30,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 6 hours, 1 minute, 55 seconds)
2025-05-13 09:37:12,056 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:37:14,973 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 410.97192 ± 203.990
2025-05-13 09:37:14,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [333.67426, 157.95773, 583.21783, 507.81955, 663.4605, 652.0586, 536.53864, 385.7089, 260.6941, 28.588884]
2025-05-13 09:37:14,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [211.0, 101.0, 388.0, 219.0, 298.0, 438.0, 215.0, 216.0, 145.0, 44.0]
2025-05-13 09:37:14,979 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 56 minutes, 17 seconds)
2025-05-13 09:41:04,027 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:41:08,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 628.84113 ± 134.801
2025-05-13 09:41:08,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [823.5636, 749.6437, 534.0694, 487.71353, 525.89526, 581.7249, 889.258, 499.2901, 575.6109, 621.64197]
2025-05-13 09:41:08,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [805.0, 452.0, 253.0, 221.0, 265.0, 311.0, 583.0, 228.0, 321.0, 313.0]
2025-05-13 09:41:08,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (628.84) for latency ExtremeSparseL4U32
2025-05-13 09:41:08,874 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 53 minutes, 7 seconds)
2025-05-13 09:44:51,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:44:53,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 322.73679 ± 221.707
2025-05-13 09:44:53,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [547.32324, 353.09082, 225.04645, 33.09915, 21.683784, 13.80258, 492.96918, 629.3953, 490.20844, 420.74878]
2025-05-13 09:44:53,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [226.0, 322.0, 130.0, 54.0, 44.0, 27.0, 221.0, 249.0, 197.0, 177.0]
2025-05-13 09:44:53,306 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 47 minutes, 20 seconds)
2025-05-13 09:48:37,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:48:40,994 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 520.29260 ± 261.289
2025-05-13 09:48:40,995 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [779.0004, 404.03616, 65.88597, 266.06082, 612.2247, 696.54694, 786.45026, 794.1439, 640.6585, 157.91824]
2025-05-13 09:48:40,995 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [327.0, 214.0, 106.0, 149.0, 316.0, 296.0, 350.0, 335.0, 311.0, 119.0]
2025-05-13 09:48:41,001 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 42 minutes, 19 seconds)
2025-05-13 09:52:29,141 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:52:33,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 720.52533 ± 282.875
2025-05-13 09:52:33,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1233.4943, 779.06354, 709.4082, 519.6387, 881.901, 496.7313, 1069.1395, 563.716, 200.95547, 751.205]
2025-05-13 09:52:33,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [530.0, 368.0, 289.0, 285.0, 316.0, 194.0, 415.0, 241.0, 101.0, 331.0]
2025-05-13 09:52:33,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (720.53) for latency ExtremeSparseL4U32
2025-05-13 09:52:33,113 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 35 minutes, 10 seconds)
2025-05-13 09:56:16,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:56:19,932 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 553.20325 ± 592.396
2025-05-13 09:56:19,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1326.1619, 1565.8041, 227.3989, 1409.8744, 405.6673, -1.1010227, 20.678942, 30.345825, 319.06525, 228.13672]
2025-05-13 09:56:19,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [451.0, 662.0, 191.0, 537.0, 176.0, 11.0, 35.0, 53.0, 167.0, 119.0]
2025-05-13 09:56:19,939 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 32 minutes, 2 seconds)
2025-05-13 10:00:04,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:00:06,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 251.67102 ± 147.122
2025-05-13 10:00:06,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [383.9921, 116.38424, 377.70215, 439.08502, 271.27606, 25.823118, 164.17618, 34.952824, 309.109, 394.2095]
2025-05-13 10:00:06,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [173.0, 96.0, 183.0, 189.0, 151.0, 45.0, 86.0, 44.0, 237.0, 178.0]
2025-05-13 10:00:06,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 26 minutes, 3 seconds)
2025-05-13 10:03:54,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:03:56,173 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 356.56659 ± 301.677
2025-05-13 10:03:56,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1007.6841, 669.1299, 477.71637, 17.525867, 518.66235, 243.7213, 47.34579, 327.37878, 30.582632, 225.91895]
2025-05-13 10:03:56,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [361.0, 274.0, 179.0, 33.0, 214.0, 130.0, 63.0, 163.0, 52.0, 118.0]
2025-05-13 10:03:56,181 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 23 minutes, 48 seconds)
2025-05-13 10:07:37,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:07:41,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 817.71606 ± 520.604
2025-05-13 10:07:41,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [179.45348, 35.21776, 702.5917, 1173.2509, 927.8638, 424.7455, 885.5907, 1204.0156, 721.58496, 1922.8461]
2025-05-13 10:07:41,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [117.0, 60.0, 482.0, 433.0, 306.0, 171.0, 358.0, 550.0, 297.0, 653.0]
2025-05-13 10:07:41,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (817.72) for latency ExtremeSparseL4U32
2025-05-13 10:07:41,829 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 19 minutes, 25 seconds)
2025-05-13 10:11:26,348 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:11:29,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 592.37164 ± 518.619
2025-05-13 10:11:29,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [895.1019, 175.97444, 21.55728, 627.1867, 35.705284, 852.8516, 1751.3232, 30.799812, 749.1684, 784.048]
2025-05-13 10:11:29,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [338.0, 143.0, 38.0, 246.0, 51.0, 281.0, 570.0, 42.0, 292.0, 316.0]
2025-05-13 10:11:29,258 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 14 minutes, 20 seconds)
2025-05-13 10:15:33,155 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:15:37,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 793.65607 ± 538.131
2025-05-13 10:15:37,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1007.4353, 28.604094, 30.019976, 13.352913, 1156.689, 970.2002, 1060.6383, 1659.1276, 1007.45496, 1003.03784]
2025-05-13 10:15:37,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [340.0, 47.0, 55.0, 24.0, 397.0, 349.0, 389.0, 598.0, 385.0, 408.0]
2025-05-13 10:15:37,016 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 16 minutes, 16 seconds)
2025-05-13 10:19:06,527 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:19:10,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 623.00354 ± 337.692
2025-05-13 10:19:10,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [304.74295, 37.980217, 867.0549, 865.4139, 21.210676, 901.8361, 787.0148, 806.9325, 842.31116, 795.5385]
2025-05-13 10:19:10,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [144.0, 54.0, 318.0, 327.0, 375.0, 317.0, 314.0, 318.0, 316.0, 302.0]
2025-05-13 10:19:10,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 8 minutes, 48 seconds)
2025-05-13 10:22:54,367 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:22:57,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 578.46033 ± 481.318
2025-05-13 10:22:57,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [989.3039, 223.89894, 1131.4105, 559.66235, 821.57996, 527.80365, 1443.723, 41.007996, 20.325897, 25.886978]
2025-05-13 10:22:57,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [407.0, 134.0, 401.0, 221.0, 323.0, 216.0, 631.0, 65.0, 31.0, 49.0]
2025-05-13 10:22:57,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 4 minutes, 22 seconds)
2025-05-13 10:26:50,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:26:53,915 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 603.65710 ± 487.616
2025-05-13 10:26:53,915 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [937.7115, 1408.2639, 21.708647, 54.925552, 29.90023, 843.7296, 91.051796, 657.60205, 898.82605, 1092.852]
2025-05-13 10:26:53,915 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [383.0, 545.0, 35.0, 77.0, 132.0, 318.0, 78.0, 254.0, 356.0, 425.0]
2025-05-13 10:26:53,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 3 minutes, 23 seconds)
2025-05-13 10:30:33,521 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:30:37,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 853.58673 ± 557.957
2025-05-13 10:30:37,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [512.95087, 818.14276, 1609.7802, 196.53996, 15.769825, 1726.3563, 1204.8026, 880.50684, 1230.1022, 340.91635]
2025-05-13 10:30:37,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [294.0, 315.0, 565.0, 100.0, 35.0, 593.0, 439.0, 357.0, 443.0, 188.0]
2025-05-13 10:30:37,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (853.59) for latency ExtremeSparseL4U32
2025-05-13 10:30:37,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 58 minutes, 37 seconds)
2025-05-13 10:34:25,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:34:29,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 666.67456 ± 487.073
2025-05-13 10:34:29,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [880.0853, 38.219604, 262.40402, 191.62561, 946.9555, 1588.4031, 1079.6833, 39.833447, 865.9908, 773.5448]
2025-05-13 10:34:29,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [376.0, 48.0, 136.0, 126.0, 409.0, 623.0, 393.0, 56.0, 338.0, 319.0]
2025-05-13 10:34:29,299 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 50 minutes, 37 seconds)
2025-05-13 10:38:10,704 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:38:14,131 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 645.71558 ± 300.560
2025-05-13 10:38:14,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [204.33038, 1089.6287, 657.89813, 705.4086, 867.06335, 823.7769, 677.37683, 708.78467, 722.32654, 0.56154376]
2025-05-13 10:38:14,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [105.0, 394.0, 288.0, 276.0, 343.0, 327.0, 289.0, 296.0, 386.0, 11.0]
2025-05-13 10:38:14,141 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 49 minutes, 50 seconds)
2025-05-13 10:42:02,706 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:42:05,814 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 578.26239 ± 296.406
2025-05-13 10:42:05,814 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [404.79675, 207.85834, 302.6378, 973.3155, 365.83624, 971.1838, 573.78455, 869.20874, 868.57306, 245.42911]
2025-05-13 10:42:05,814 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [161.0, 106.0, 135.0, 336.0, 149.0, 420.0, 267.0, 384.0, 347.0, 133.0]
2025-05-13 10:42:05,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 47 minutes, 3 seconds)
2025-05-13 10:45:50,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:45:54,001 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 640.72076 ± 543.948
2025-05-13 10:45:54,001 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [163.04787, 226.47165, 1000.3466, 518.55817, 1266.7428, 244.5525, 120.3496, 27.06089, 1376.0049, 1464.0728]
2025-05-13 10:45:54,001 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [100.0, 119.0, 659.0, 227.0, 551.0, 121.0, 87.0, 40.0, 506.0, 641.0]
2025-05-13 10:45:54,010 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 41 minutes, 13 seconds)
2025-05-13 10:49:36,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:49:40,921 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 839.21912 ± 296.335
2025-05-13 10:49:40,921 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [815.34894, 908.6362, 596.815, 1085.3356, 1039.3782, 256.06683, 803.2432, 1073.6624, 1299.2098, 514.49475]
2025-05-13 10:49:40,921 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [376.0, 383.0, 244.0, 409.0, 348.0, 125.0, 366.0, 433.0, 536.0, 289.0]
2025-05-13 10:49:40,927 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 38 minutes, 9 seconds)
2025-05-13 10:53:31,064 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:53:34,487 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 697.23370 ± 394.455
2025-05-13 10:53:34,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [25.41489, 515.3065, 14.984842, 880.42487, 919.86707, 1021.63763, 1044.4117, 464.95544, 994.98596, 1090.3483]
2025-05-13 10:53:34,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [37.0, 220.0, 25.0, 314.0, 333.0, 396.0, 350.0, 206.0, 404.0, 356.0]
2025-05-13 10:53:34,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 34 minutes, 50 seconds)
2025-05-13 10:57:19,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:57:25,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1104.55151 ± 559.138
2025-05-13 10:57:25,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [711.3578, 866.3328, 789.2276, 217.5812, 1604.3799, 1380.9475, 1373.2666, 2299.3572, 656.142, 1146.9226]
2025-05-13 10:57:25,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [279.0, 346.0, 315.0, 113.0, 645.0, 532.0, 582.0, 946.0, 278.0, 430.0]
2025-05-13 10:57:25,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1104.55) for latency ExtremeSparseL4U32
2025-05-13 10:57:25,351 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 32 minutes, 27 seconds)
2025-05-13 11:01:13,206 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:01:17,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 938.50079 ± 492.736
2025-05-13 11:01:17,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [680.3177, 2070.0757, 968.37213, 1318.1703, 1061.1832, 814.3881, 935.5516, 723.9598, 789.3612, 23.628796]
2025-05-13 11:01:17,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [293.0, 759.0, 355.0, 572.0, 339.0, 324.0, 323.0, 296.0, 276.0, 32.0]
2025-05-13 11:01:17,835 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 28 minutes, 48 seconds)
2025-05-13 11:05:01,046 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:05:06,935 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1122.69043 ± 767.821
2025-05-13 11:05:06,935 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2002.5856, 1101.7788, 24.51289, 749.2547, 1159.0983, 2607.2256, 826.7868, 1662.778, 33.19485, 1059.688]
2025-05-13 11:05:06,935 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [703.0, 365.0, 35.0, 392.0, 466.0, 960.0, 462.0, 647.0, 40.0, 479.0]
2025-05-13 11:05:06,935 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1122.69) for latency ExtremeSparseL4U32
2025-05-13 11:05:06,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 25 minutes, 10 seconds)
2025-05-13 11:08:55,914 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:08:59,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 740.51935 ± 279.162
2025-05-13 11:08:59,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [647.92316, 698.797, 854.4942, 749.9755, 1192.9539, 841.3255, 797.1334, 16.354906, 856.059, 750.1767]
2025-05-13 11:08:59,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [286.0, 273.0, 314.0, 295.0, 386.0, 333.0, 308.0, 27.0, 333.0, 299.0]
2025-05-13 11:08:59,588 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 22 minutes, 37 seconds)
2025-05-13 11:12:43,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:12:47,670 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 953.00751 ± 869.345
2025-05-13 11:12:47,670 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [860.20996, 52.992363, 201.04814, 2595.5647, 807.5565, 634.7306, 645.50714, 899.73376, 227.6117, 2605.1204]
2025-05-13 11:12:47,670 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [302.0, 80.0, 103.0, 907.0, 311.0, 233.0, 274.0, 320.0, 118.0, 907.0]
2025-05-13 11:12:47,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 17 minutes, 32 seconds)
2025-05-13 11:16:43,423 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:16:46,649 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 624.21277 ± 300.438
2025-05-13 11:16:46,649 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [206.28937, 823.20825, 675.07874, 760.87805, 1109.6405, 29.627968, 561.6741, 689.8424, 519.3546, 866.5342]
2025-05-13 11:16:46,649 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [123.0, 399.0, 257.0, 312.0, 397.0, 37.0, 227.0, 276.0, 204.0, 301.0]
2025-05-13 11:16:46,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 15 minutes, 29 seconds)
2025-05-13 11:20:21,344 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:20:24,935 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 760.29706 ± 420.040
2025-05-13 11:20:24,935 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [491.93033, 829.8849, 1131.8575, 1387.8358, 217.37189, 141.58942, 1080.363, 852.60236, 1162.4242, 307.1108]
2025-05-13 11:20:24,935 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [215.0, 293.0, 398.0, 476.0, 124.0, 102.0, 413.0, 316.0, 398.0, 125.0]
2025-05-13 11:20:24,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 8 minutes, 32 seconds)
2025-05-13 11:24:23,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:24:27,365 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 784.70837 ± 333.175
2025-05-13 11:24:27,365 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [680.9939, 748.7907, 383.94452, 715.0419, 1007.0062, 1681.6888, 727.8347, 587.24304, 613.4155, 701.1245]
2025-05-13 11:24:27,365 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [283.0, 301.0, 208.0, 298.0, 368.0, 654.0, 286.0, 306.0, 329.0, 299.0]
2025-05-13 11:24:27,372 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 7 minutes, 33 seconds)
2025-05-13 11:28:05,509 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:28:11,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1228.42847 ± 624.311
2025-05-13 11:28:11,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1171.7113, 999.11346, 1600.9296, 2467.7432, 132.21169, 634.93744, 1551.1904, 828.77563, 1093.7295, 1803.9423]
2025-05-13 11:28:11,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [470.0, 370.0, 572.0, 863.0, 92.0, 265.0, 586.0, 304.0, 409.0, 641.0]
2025-05-13 11:28:11,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1228.43) for latency ExtremeSparseL4U32
2025-05-13 11:28:11,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 1 minute, 52 seconds)
2025-05-13 11:31:52,532 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:31:57,906 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1150.65295 ± 1012.049
2025-05-13 11:31:57,906 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2970.8376, 896.98267, 162.91779, 55.147213, 3031.4998, 669.86865, 1477.8324, 286.2523, 1034.1392, 921.0528]
2025-05-13 11:31:57,906 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [995.0, 328.0, 81.0, 76.0, 950.0, 271.0, 518.0, 129.0, 341.0, 379.0]
2025-05-13 11:31:57,915 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 57 minutes, 42 seconds)
2025-05-13 11:35:47,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:35:51,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 715.00989 ± 607.905
2025-05-13 11:35:51,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [765.8759, 1166.6136, 767.2869, 209.23648, 803.8645, 1940.663, 1344.1459, 28.315805, 51.685257, 72.4121]
2025-05-13 11:35:51,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [306.0, 424.0, 306.0, 107.0, 347.0, 626.0, 458.0, 41.0, 65.0, 89.0]
2025-05-13 11:35:51,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 52 minutes, 45 seconds)
2025-05-13 11:39:47,785 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:39:52,005 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 854.20721 ± 1071.879
2025-05-13 11:39:52,005 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [-1.048972, 1479.1731, 703.774, 675.71484, 22.524462, 28.779182, 20.990105, 20.027958, 2733.1147, 2859.023]
2025-05-13 11:39:52,005 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [10.0, 469.0, 333.0, 270.0, 41.0, 50.0, 36.0, 35.0, 1000.0, 1000.0]
2025-05-13 11:39:52,013 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 53 minutes, 24 seconds)
2025-05-13 11:43:26,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:43:31,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1201.66711 ± 457.482
2025-05-13 11:43:31,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [353.2884, 767.9203, 1708.3324, 1507.1133, 1566.1622, 1317.4868, 1480.4126, 1645.3154, 565.3216, 1105.3181]
2025-05-13 11:43:31,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [157.0, 299.0, 564.0, 494.0, 541.0, 467.0, 518.0, 588.0, 271.0, 431.0]
2025-05-13 11:43:31,996 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 45 minutes, 6 seconds)
2025-05-13 11:47:18,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:47:26,294 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1528.50610 ± 848.732
2025-05-13 11:47:26,294 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [896.8562, 1314.9907, 818.1075, 2596.179, 1252.5204, 2679.1013, 333.07394, 2715.3079, 1973.9835, 704.9416]
2025-05-13 11:47:26,294 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [306.0, 507.0, 288.0, 909.0, 494.0, 1000.0, 135.0, 960.0, 668.0, 293.0]
2025-05-13 11:47:26,294 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1528.51) for latency ExtremeSparseL4U32
2025-05-13 11:47:26,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 43 minutes, 17 seconds)
2025-05-13 11:51:12,039 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:51:17,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1167.00024 ± 808.699
2025-05-13 11:51:17,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [570.56714, 567.4638, 1912.325, 2654.0244, 205.10092, 999.506, 873.84503, 1560.3566, 198.00189, 2128.8115]
2025-05-13 11:51:17,526 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [240.0, 202.0, 667.0, 878.0, 107.0, 346.0, 316.0, 595.0, 152.0, 718.0]
2025-05-13 11:51:17,533 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 40 minutes, 19 seconds)
2025-05-13 11:55:02,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:55:08,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1430.25952 ± 473.385
2025-05-13 11:55:08,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1996.1282, 1136.2771, 788.0295, 1539.5332, 1560.6722, 874.7724, 835.2581, 1508.0664, 2042.6604, 2021.1996]
2025-05-13 11:55:08,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [699.0, 347.0, 290.0, 535.0, 546.0, 306.0, 292.0, 491.0, 699.0, 766.0]
2025-05-13 11:55:08,988 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 36 minutes, 5 seconds)
2025-05-13 11:59:01,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:59:06,470 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1016.06512 ± 711.782
2025-05-13 11:59:06,470 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [21.489008, 1022.87103, 226.37099, 1852.7773, 1069.509, 1920.5338, 633.789, 200.06116, 2066.4956, 1146.7552]
2025-05-13 11:59:06,470 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [33.0, 389.0, 119.0, 1000.0, 391.0, 595.0, 307.0, 119.0, 670.0, 386.0]
2025-05-13 11:59:06,478 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 31 minutes, 39 seconds)
2025-05-13 12:02:52,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:02:58,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1260.13184 ± 862.155
2025-05-13 12:02:58,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [96.229355, 730.32697, 1609.5629, 25.357185, 847.62634, 866.37726, 1800.3477, 2741.6313, 1489.4932, 2394.3665]
2025-05-13 12:02:58,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [83.0, 263.0, 544.0, 36.0, 338.0, 267.0, 577.0, 973.0, 565.0, 823.0]
2025-05-13 12:02:58,038 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 29 minutes, 53 seconds)
2025-05-13 12:06:49,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:06:55,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1291.87988 ± 1037.556
2025-05-13 12:06:55,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3149.5593, 1416.1698, 2880.9648, 20.02786, 43.84749, 1159.0953, 1001.5259, 185.68164, 1812.4618, 1249.4646]
2025-05-13 12:06:55,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 518.0, 1000.0, 40.0, 59.0, 437.0, 352.0, 93.0, 697.0, 425.0]
2025-05-13 12:06:55,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 26 minutes, 31 seconds)
2025-05-13 12:10:36,474 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:10:43,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1523.11658 ± 858.861
2025-05-13 12:10:43,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2424.0295, 2908.5562, 1349.1937, 1182.4058, 2069.963, 2342.29, 582.36237, 5.1236224, 1417.1682, 950.0747]
2025-05-13 12:10:43,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [712.0, 1000.0, 454.0, 390.0, 642.0, 774.0, 232.0, 14.0, 506.0, 328.0]
2025-05-13 12:10:43,009 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 22 minutes)
2025-05-13 12:14:27,173 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:14:33,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1377.11462 ± 647.025
2025-05-13 12:14:33,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [901.0047, 1918.3541, 1513.9207, 2127.26, 616.38873, 699.36646, 947.025, 2594.1716, 1636.6671, 816.98694]
2025-05-13 12:14:33,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [318.0, 679.0, 511.0, 896.0, 213.0, 339.0, 348.0, 1000.0, 598.0, 297.0]
2025-05-13 12:14:33,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 18 minutes, 2 seconds)
2025-05-13 12:18:17,789 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:18:23,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1199.64380 ± 670.761
2025-05-13 12:18:23,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1296.187, 573.43085, 2904.2773, 1508.2627, 1360.1013, 694.5551, 883.6931, 380.1224, 1048.8815, 1346.9257]
2025-05-13 12:18:23,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [427.0, 271.0, 909.0, 511.0, 511.0, 245.0, 339.0, 150.0, 509.0, 427.0]
2025-05-13 12:18:23,310 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 12 minutes, 48 seconds)
2025-05-13 12:22:09,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:22:13,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 971.24817 ± 901.601
2025-05-13 12:22:13,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [908.19867, 698.4553, 34.72282, 39.002193, 3101.0332, 1459.2405, 1747.896, 881.2312, 817.6547, 25.047537]
2025-05-13 12:22:13,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [306.0, 261.0, 55.0, 61.0, 1000.0, 460.0, 548.0, 325.0, 279.0, 43.0]
2025-05-13 12:22:13,351 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 8 minutes, 42 seconds)
2025-05-13 12:25:59,914 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:26:05,358 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1208.37830 ± 936.529
2025-05-13 12:26:05,358 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1135.2921, 23.66632, 1973.1691, 599.05927, 2949.239, 2672.3245, 451.01996, 627.4281, 946.70074, 705.8841]
2025-05-13 12:26:05,358 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [386.0, 38.0, 615.0, 273.0, 1000.0, 815.0, 171.0, 251.0, 362.0, 258.0]
2025-05-13 12:26:05,367 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 4 minutes)
2025-05-13 12:29:52,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:29:57,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1005.43213 ± 1012.642
2025-05-13 12:29:57,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [173.27534, 1342.4786, 871.7733, 248.73396, 539.4224, 26.416328, 28.732227, 1783.1089, 3396.1326, 1644.2483]
2025-05-13 12:29:57,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [119.0, 504.0, 332.0, 148.0, 225.0, 48.0, 41.0, 626.0, 945.0, 536.0]
2025-05-13 12:29:57,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 53 seconds)
2025-05-13 12:33:53,131 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:34:03,290 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2334.53638 ± 988.933
2025-05-13 12:34:03,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1885.67, 2891.049, 1763.5619, 2959.6292, 3723.8923, 3373.0686, 1004.05066, 1781.6465, 712.9759, 3249.8186]
2025-05-13 12:34:03,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [686.0, 1000.0, 680.0, 1000.0, 1000.0, 964.0, 411.0, 577.0, 296.0, 1000.0]
2025-05-13 12:34:03,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (2334.54) for latency ExtremeSparseL4U32
2025-05-13 12:34:03,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 59 minutes, 18 seconds)
2025-05-13 12:37:37,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:37:44,076 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1368.36499 ± 627.510
2025-05-13 12:37:44,076 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [42.12343, 1677.4598, 1604.874, 832.6564, 1364.5571, 1720.7604, 1358.133, 2606.1396, 1080.0791, 1396.8671]
2025-05-13 12:37:44,076 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [49.0, 548.0, 558.0, 331.0, 454.0, 649.0, 524.0, 1000.0, 382.0, 599.0]
2025-05-13 12:37:44,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 54 minutes, 6 seconds)
2025-05-13 12:41:30,727 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:41:36,537 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1266.48083 ± 1087.041
2025-05-13 12:41:36,537 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1400.1945, 487.968, 1513.4092, 39.22369, 24.746141, 2501.5315, 1073.538, 32.924908, 2393.94, 3197.332]
2025-05-13 12:41:36,537 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [443.0, 212.0, 495.0, 56.0, 43.0, 868.0, 367.0, 59.0, 893.0, 1000.0]
2025-05-13 12:41:36,546 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 50 minutes, 36 seconds)
2025-05-13 12:45:20,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:45:27,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1533.06689 ± 985.000
2025-05-13 12:45:27,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1163.5999, 12.113937, 2015.0743, 1004.29785, 1886.0212, 32.083008, 2975.9429, 1658.3588, 1545.0333, 3038.1438]
2025-05-13 12:45:27,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [415.0, 23.0, 620.0, 352.0, 563.0, 44.0, 989.0, 545.0, 461.0, 1000.0]
2025-05-13 12:45:27,183 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 46 minutes, 31 seconds)
2025-05-13 12:49:15,066 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:49:23,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1835.51978 ± 1060.170
2025-05-13 12:49:23,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1094.3585, 1509.5358, 2960.5051, 2877.24, 1495.6958, 2880.1558, 1883.2148, 437.39578, 32.724968, 3184.3704]
2025-05-13 12:49:23,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [371.0, 549.0, 1000.0, 1000.0, 496.0, 878.0, 594.0, 218.0, 44.0, 1000.0]
2025-05-13 12:49:23,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 43 minutes, 9 seconds)
2025-05-13 12:53:12,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:53:17,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 991.91290 ± 681.489
2025-05-13 12:53:17,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [718.98804, 1885.4935, 1257.8647, 794.9293, 2104.7705, 536.9212, 24.553762, 1041.9657, 1546.7156, 6.9268947]
2025-05-13 12:53:17,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [310.0, 597.0, 416.0, 305.0, 875.0, 240.0, 36.0, 393.0, 597.0, 17.0]
2025-05-13 12:53:17,272 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 37 minutes, 42 seconds)
2025-05-13 12:56:56,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:57:05,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1956.87463 ± 1362.173
2025-05-13 12:57:05,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [14.181686, 15.183033, 2771.1787, 3282.5945, 3150.619, 1191.2072, 2573.0454, 196.79225, 3380.8813, 2993.0618]
2025-05-13 12:57:05,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 27.0, 1000.0, 1000.0, 1000.0, 437.0, 774.0, 124.0, 1000.0, 1000.0]
2025-05-13 12:57:05,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 34 minutes, 50 seconds)
2025-05-13 13:00:52,885 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:01:02,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2216.05444 ± 1123.980
2025-05-13 13:01:02,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3170.7708, 3345.7314, 212.61314, 1525.3822, 3164.3284, 3346.7927, 993.04126, 916.0014, 2608.908, 2876.977]
2025-05-13 13:01:02,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 108.0, 566.0, 1000.0, 1000.0, 338.0, 317.0, 823.0, 899.0]
2025-05-13 13:01:02,336 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 31 minutes, 33 seconds)
2025-05-13 13:04:56,764 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:05:04,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1594.36145 ± 1166.740
2025-05-13 13:05:04,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3125.48, 1035.391, 21.728539, 44.251743, 2367.7458, 1311.2826, 1488.3148, 2633.6482, 3380.2278, 535.54376]
2025-05-13 13:05:04,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 347.0, 34.0, 96.0, 799.0, 415.0, 529.0, 1000.0, 1000.0, 226.0]
2025-05-13 13:05:04,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 29 minutes, 4 seconds)
2025-05-13 13:08:47,002 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:08:53,330 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1355.28784 ± 988.729
2025-05-13 13:08:53,330 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3050.9893, 1954.3345, 1720.2091, 1044.7194, 1448.6165, 878.69794, 603.8091, 33.593327, 38.04764, 2779.861]
2025-05-13 13:08:53,330 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 697.0, 553.0, 355.0, 481.0, 321.0, 231.0, 57.0, 57.0, 1000.0]
2025-05-13 13:08:53,338 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 24 minutes, 20 seconds)
2025-05-13 13:12:38,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:12:47,756 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2224.78198 ± 1332.155
2025-05-13 13:12:47,756 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [33.880413, 2693.3054, 3381.2212, 2737.1338, 3064.8865, 662.87085, 3296.2795, 3454.7092, 2903.3613, 20.171686]
2025-05-13 13:12:47,756 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [54.0, 833.0, 1000.0, 909.0, 1000.0, 234.0, 1000.0, 1000.0, 857.0, 29.0]
2025-05-13 13:12:47,764 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 20 minutes, 27 seconds)
2025-05-13 13:16:47,246 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:16:55,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1792.34106 ± 977.740
2025-05-13 13:16:55,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [509.89624, 3289.0686, 1935.4061, 2315.1658, 86.70334, 1605.1719, 2417.6294, 1248.7877, 1446.271, 3069.311]
2025-05-13 13:16:55,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [224.0, 1000.0, 580.0, 725.0, 90.0, 735.0, 790.0, 443.0, 550.0, 1000.0]
2025-05-13 13:16:55,264 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 18 minutes, 49 seconds)
2025-05-13 13:20:22,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:20:26,410 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 896.39099 ± 991.517
2025-05-13 13:20:26,410 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [17.145851, 22.440737, 16.269312, 15.701636, 2445.3025, 296.19858, 1270.2384, 1589.1321, 2704.4917, 586.98926]
2025-05-13 13:20:26,410 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 35.0, 28.0, 25.0, 1000.0, 122.0, 428.0, 584.0, 831.0, 229.0]
2025-05-13 13:20:26,422 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 11 minutes, 55 seconds)
2025-05-13 13:24:11,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:24:17,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1153.84766 ± 1075.704
2025-05-13 13:24:17,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1295.6443, 3114.0059, 1532.2593, 532.93463, 21.299908, 2811.8423, 40.675533, 517.81824, 28.844774, 1643.151]
2025-05-13 13:24:17,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [424.0, 1000.0, 514.0, 226.0, 36.0, 907.0, 68.0, 212.0, 49.0, 563.0]
2025-05-13 13:24:17,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 6 minutes, 50 seconds)
2025-05-13 13:28:19,502 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:28:25,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1462.63171 ± 1434.547
2025-05-13 13:28:25,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1725.1879, 228.9219, 25.763544, 44.79206, 3480.8303, 1955.5219, 268.41443, 3422.2212, 3346.7253, 127.93844]
2025-05-13 13:28:25,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [551.0, 153.0, 39.0, 98.0, 1000.0, 609.0, 112.0, 1000.0, 1000.0, 94.0]
2025-05-13 13:28:25,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 5 minutes, 3 seconds)
2025-05-13 13:32:10,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:32:15,501 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1058.01465 ± 1198.357
2025-05-13 13:32:15,501 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1387.0623, 3152.3135, 242.77808, 22.688988, 3468.0264, 1008.7706, 579.5932, 30.409576, 256.6043, 431.89987]
2025-05-13 13:32:15,501 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [482.0, 1000.0, 130.0, 35.0, 1000.0, 397.0, 244.0, 48.0, 138.0, 209.0]
2025-05-13 13:32:15,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 40 seconds)
2025-05-13 13:35:43,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:35:51,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1913.41833 ± 1081.369
2025-05-13 13:35:51,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [544.57404, 2798.0737, 2289.0056, 3123.3352, 2188.3003, 3167.7593, 835.06055, 26.399977, 2789.2107, 1372.4647]
2025-05-13 13:35:51,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [266.0, 879.0, 768.0, 1000.0, 695.0, 1000.0, 302.0, 50.0, 1000.0, 424.0]
2025-05-13 13:35:51,915 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 53 minutes, 39 seconds)
2025-05-13 13:39:52,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:39:59,770 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1625.69495 ± 1262.719
2025-05-13 13:39:59,770 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1436.553, 868.1343, 3377.6155, 3320.7874, 817.01416, 16.307472, 799.6166, 2520.1733, 24.853546, 3075.8948]
2025-05-13 13:39:59,770 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [435.0, 302.0, 1000.0, 949.0, 305.0, 28.0, 341.0, 790.0, 38.0, 1000.0]
2025-05-13 13:39:59,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 53 minutes, 25 seconds)
2025-05-13 13:43:39,223 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:43:42,934 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 744.95807 ± 853.375
2025-05-13 13:43:42,935 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2811.1343, 1164.5837, 306.7309, 18.32717, 1668.935, 552.80206, 326.93622, 39.76659, 32.63401, 527.731]
2025-05-13 13:43:42,935 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 366.0, 131.0, 25.0, 583.0, 226.0, 156.0, 61.0, 53.0, 219.0]
2025-05-13 13:43:42,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 48 minutes, 48 seconds)
2025-05-13 13:47:21,316 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:47:29,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2020.21021 ± 1263.235
2025-05-13 13:47:29,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3473.8225, 246.36345, 3198.8218, 1377.5425, 225.29332, 3099.554, 1752.3057, 596.24445, 3179.1006, 3053.0527]
2025-05-13 13:47:29,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 131.0, 1000.0, 414.0, 117.0, 931.0, 586.0, 247.0, 998.0, 935.0]
2025-05-13 13:47:29,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 42 minutes, 56 seconds)
2025-05-13 13:51:16,596 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:51:26,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2355.22852 ± 1145.734
2025-05-13 13:51:26,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3419.0183, 3372.21, 3241.9287, 535.0808, 2805.1663, 3172.8618, 1824.6802, 28.157303, 2263.3225, 2889.859]
2025-05-13 13:51:26,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 189.0, 861.0, 1000.0, 563.0, 50.0, 652.0, 897.0]
2025-05-13 13:51:26,223 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (2355.23) for latency ExtremeSparseL4U32
2025-05-13 13:51:26,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 39 minutes, 43 seconds)
2025-05-13 13:55:15,929 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:55:23,920 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1866.27563 ± 1256.201
2025-05-13 13:55:23,920 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [459.9436, 1015.3955, 243.71184, 3132.4407, 3113.6526, 27.992878, 3199.8389, 3270.5269, 2255.0283, 1944.2255]
2025-05-13 13:55:23,920 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [179.0, 357.0, 108.0, 1000.0, 986.0, 43.0, 1000.0, 1000.0, 721.0, 572.0]
2025-05-13 13:55:23,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 37 minutes, 40 seconds)
2025-05-13 13:59:20,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:59:28,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1762.69690 ± 1453.482
2025-05-13 13:59:28,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2894.2798, 3374.5554, 3221.4, 3220.4524, 3024.1343, 1573.2773, 23.636827, 248.30006, 22.105413, 24.828798]
2025-05-13 13:59:28,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [921.0, 1000.0, 1000.0, 1000.0, 1000.0, 505.0, 45.0, 148.0, 37.0, 44.0]
2025-05-13 13:59:28,402 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 33 minutes, 29 seconds)
2025-05-13 14:02:54,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:03:00,314 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1447.04199 ± 1341.934
2025-05-13 14:03:00,314 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [31.842262, 896.6779, 991.18256, 1744.8575, 26.369236, 21.38347, 720.6273, 3407.444, 3263.3342, 3366.7014]
2025-05-13 14:03:00,314 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [50.0, 389.0, 372.0, 536.0, 44.0, 33.0, 288.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:03:00,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 28 minutes, 43 seconds)
2025-05-13 14:06:54,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:07:00,826 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1410.62915 ± 1175.426
2025-05-13 14:07:00,826 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [680.50183, 2423.3684, 2763.313, 211.23772, 2617.704, 23.769821, 1631.218, 589.92316, 29.009626, 3136.2456]
2025-05-13 14:07:00,826 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [234.0, 681.0, 843.0, 143.0, 812.0, 44.0, 606.0, 259.0, 53.0, 1000.0]
2025-05-13 14:07:00,835 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 25 minutes, 53 seconds)
2025-05-13 14:10:39,563 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:10:46,269 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1494.72925 ± 1076.883
2025-05-13 14:10:46,269 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [26.012516, 176.10359, 1247.945, 1360.216, 200.13431, 1946.3895, 2008.1376, 3097.0325, 1696.4103, 3188.911]
2025-05-13 14:10:46,269 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [38.0, 131.0, 422.0, 514.0, 100.0, 662.0, 653.0, 1000.0, 600.0, 1000.0]
2025-05-13 14:10:46,278 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 21 minutes, 12 seconds)
2025-05-13 14:14:45,636 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:14:52,352 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1519.58813 ± 849.584
2025-05-13 14:14:52,352 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1410.0895, 1261.9324, 2793.4902, 2249.35, 843.1796, 508.4665, 201.73004, 1358.9548, 2819.1414, 1749.5463]
2025-05-13 14:14:52,352 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [485.0, 447.0, 985.0, 735.0, 299.0, 206.0, 89.0, 440.0, 862.0, 565.0]
2025-05-13 14:14:52,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 17 minutes, 53 seconds)
2025-05-13 14:18:26,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:18:35,219 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2035.54590 ± 908.519
2025-05-13 14:18:35,219 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2888.7578, 2372.5498, 17.314583, 1223.2823, 3218.7576, 2283.969, 2223.908, 1469.7548, 2916.1455, 1741.0223]
2025-05-13 14:18:35,219 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [897.0, 794.0, 33.0, 440.0, 1000.0, 675.0, 692.0, 493.0, 969.0, 590.0]
2025-05-13 14:18:35,228 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 12 minutes, 37 seconds)
2025-05-13 14:22:28,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:22:36,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1861.89185 ± 1260.315
2025-05-13 14:22:36,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2894.458, 24.806116, 36.924255, 958.27704, 2907.3877, 1437.2267, 3108.3577, 3274.916, 3092.8567, 883.7069]
2025-05-13 14:22:36,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [874.0, 34.0, 50.0, 310.0, 1000.0, 490.0, 1000.0, 1000.0, 1000.0, 302.0]
2025-05-13 14:22:36,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 10 minutes, 33 seconds)
2025-05-13 14:26:21,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:26:26,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1077.39136 ± 987.568
2025-05-13 14:26:26,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [24.05227, 1100.6211, 345.14337, 2166.5967, 557.87756, 1495.422, 2774.8157, 2255.3235, 13.891456, 40.170288]
2025-05-13 14:26:26,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [34.0, 357.0, 424.0, 655.0, 212.0, 503.0, 778.0, 699.0, 26.0, 48.0]
2025-05-13 14:26:26,684 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 6 minutes, 3 seconds)
2025-05-13 14:30:18,660 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:30:26,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1791.05957 ± 1161.087
2025-05-13 14:30:26,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2921.052, 1380.9197, 1480.4622, 2985.6694, 3288.4028, 27.443943, 37.881447, 988.6024, 1808.1323, 2992.0288]
2025-05-13 14:30:26,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [910.0, 476.0, 511.0, 950.0, 1000.0, 41.0, 55.0, 327.0, 551.0, 1000.0]
2025-05-13 14:30:26,306 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 2 minutes, 56 seconds)
2025-05-13 14:33:50,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:33:58,211 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2001.72424 ± 1289.405
2025-05-13 14:33:58,211 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1361.572, 27.438698, 303.6112, 2179.447, 2407.5864, 3331.9856, 482.05252, 3447.1978, 3316.1077, 3160.2444]
2025-05-13 14:33:58,211 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [435.0, 41.0, 120.0, 650.0, 718.0, 1000.0, 204.0, 1000.0, 959.0, 1000.0]
2025-05-13 14:33:58,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 57 minutes, 17 seconds)
2025-05-13 14:37:46,912 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:37:51,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 861.89093 ± 1137.146
2025-05-13 14:37:51,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1323.9796, 123.30278, 27.709032, 29.834965, 2601.7415, 1046.3035, 150.4389, 22.122438, 22.029959, 3271.447]
2025-05-13 14:37:51,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 103.0, 40.0, 41.0, 747.0, 347.0, 122.0, 39.0, 46.0, 980.0]
2025-05-13 14:37:51,444 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 53 minutes, 57 seconds)
2025-05-13 14:41:29,448 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:41:36,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1613.83813 ± 1223.672
2025-05-13 14:41:36,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [155.00005, 2084.2266, 3146.3323, 1575.0416, 154.72035, 26.362846, 1432.3806, 3277.8335, 3206.9058, 1079.5789]
2025-05-13 14:41:36,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [116.0, 658.0, 1000.0, 528.0, 103.0, 40.0, 498.0, 1000.0, 1000.0, 382.0]
2025-05-13 14:41:36,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 49 minutes, 24 seconds)
2025-05-13 14:45:09,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:45:13,009 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 922.21838 ± 784.064
2025-05-13 14:45:13,009 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [705.9841, 28.64386, 2470.6125, 1335.2242, 41.315624, 1365.1104, 1845.3673, 797.83453, 33.61963, 598.4707]
2025-05-13 14:45:13,009 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [263.0, 38.0, 756.0, 408.0, 63.0, 439.0, 528.0, 274.0, 48.0, 287.0]
2025-05-13 14:45:13,022 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 45 minutes, 3 seconds)
2025-05-13 14:48:43,855 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:48:53,965 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2326.04053 ± 1246.593
2025-05-13 14:48:53,965 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3253.819, 3169.2415, 3500.897, 534.21686, 851.62726, 1568.3646, 3381.4075, 3289.138, 3295.6003, 416.0945]
2025-05-13 14:48:53,965 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 208.0, 299.0, 1000.0, 1000.0, 1000.0, 1000.0, 183.0]
2025-05-13 14:48:53,975 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 40 minutes, 36 seconds)
2025-05-13 14:52:32,129 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:52:39,172 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1696.76050 ± 1391.398
2025-05-13 14:52:39,172 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [585.9856, 3222.5127, 171.4112, 3269.7715, 1683.8329, 3286.8042, 960.63666, 31.740286, 282.27142, 3472.6384]
2025-05-13 14:52:39,172 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [287.0, 1000.0, 86.0, 1000.0, 518.0, 1000.0, 321.0, 55.0, 149.0, 1000.0]
2025-05-13 14:52:39,183 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 37 minutes, 21 seconds)
2025-05-13 14:56:12,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:56:18,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1280.87134 ± 976.487
2025-05-13 14:56:18,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2037.4833, 1282.4138, 3175.109, 209.39967, 396.44995, 524.1553, 14.737289, 1837.5701, 2241.288, 1090.1077]
2025-05-13 14:56:18,563 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [615.0, 486.0, 1000.0, 136.0, 179.0, 214.0, 26.0, 595.0, 759.0, 427.0]
2025-05-13 14:56:18,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 33 minutes, 12 seconds)
2025-05-13 14:59:56,474 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:00:01,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1169.84546 ± 1077.648
2025-05-13 15:00:01,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3358.173, 1463.6298, 32.6486, 2798.5647, 932.0981, 20.711864, 1177.0404, 37.438053, 1053.7272, 824.4221]
2025-05-13 15:00:01,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 484.0, 41.0, 1000.0, 323.0, 33.0, 423.0, 47.0, 389.0, 281.0]
2025-05-13 15:00:01,737 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 29 minutes, 28 seconds)
2025-05-13 15:03:36,635 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:03:42,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1535.96021 ± 1402.964
2025-05-13 15:03:42,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3387.1094, 3364.4897, 940.4859, 546.9417, 2563.6565, 852.7924, 41.906815, 19.054335, 184.3878, 3458.7776]
2025-05-13 15:03:42,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 328.0, 206.0, 791.0, 296.0, 62.0, 31.0, 122.0, 1000.0]
2025-05-13 15:03:42,935 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 25 minutes, 53 seconds)
2025-05-13 15:07:37,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:07:40,749 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 768.51379 ± 979.754
2025-05-13 15:07:40,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3325.313, 203.41058, 1014.2443, 1403.1003, 34.781246, 36.57845, 997.9855, 640.22577, 17.132446, 12.3667]
2025-05-13 15:07:40,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 102.0, 450.0, 435.0, 52.0, 56.0, 348.0, 253.0, 28.0, 26.0]
2025-05-13 15:07:40,762 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 22 minutes, 32 seconds)
2025-05-13 15:11:02,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:11:09,087 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1653.84729 ± 1162.542
2025-05-13 15:11:09,087 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [208.2681, 3108.5466, 912.35144, 29.15506, 1091.0938, 1464.8438, 3353.761, 2892.142, 923.7747, 2554.5364]
2025-05-13 15:11:09,087 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [107.0, 1000.0, 335.0, 40.0, 361.0, 456.0, 1000.0, 859.0, 335.0, 796.0]
2025-05-13 15:11:09,099 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 18 minutes, 29 seconds)
2025-05-13 15:14:53,003 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:14:58,673 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1260.57520 ± 1212.569
2025-05-13 15:14:58,673 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [25.258917, 320.08826, 813.0754, 2914.0479, 27.494394, 832.74744, 1010.87964, 490.25183, 2676.5657, 3495.3425]
2025-05-13 15:14:58,673 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [37.0, 174.0, 281.0, 872.0, 41.0, 367.0, 434.0, 193.0, 901.0, 1000.0]
2025-05-13 15:14:58,684 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 14 minutes, 56 seconds)
2025-05-13 15:18:27,233 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:18:35,311 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1985.41382 ± 1174.989
2025-05-13 15:18:35,311 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1949.8799, 2454.9548, 3076.3384, 685.19434, 2839.093, 793.1794, 3526.016, 3424.717, 528.0673, 576.6994]
2025-05-13 15:18:35,311 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [627.0, 712.0, 1000.0, 239.0, 877.0, 287.0, 990.0, 1000.0, 201.0, 242.0]
2025-05-13 15:18:35,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 11 minutes, 8 seconds)
2025-05-13 15:22:15,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:22:22,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1544.40527 ± 1300.568
2025-05-13 15:22:22,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2758.0369, 919.4316, 19.666735, 455.83453, 306.3194, 3251.2402, 1562.5814, 2947.5244, 12.150568, 3211.2676]
2025-05-13 15:22:22,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 318.0, 32.0, 196.0, 128.0, 1000.0, 509.0, 1000.0, 25.0, 1000.0]
2025-05-13 15:22:22,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 7 minutes, 27 seconds)
2025-05-13 15:26:04,010 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:26:11,134 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1689.79395 ± 993.644
2025-05-13 15:26:11,134 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2094.8643, 1213.8568, 3150.8884, 894.0118, 1387.6655, 18.091707, 2506.5046, 2233.2788, 502.27335, 2896.504]
2025-05-13 15:26:11,134 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [661.0, 348.0, 1000.0, 341.0, 530.0, 28.0, 782.0, 685.0, 201.0, 877.0]
2025-05-13 15:26:11,147 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 42 seconds)
2025-05-13 15:29:40,902 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:29:46,615 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1272.09705 ± 1128.487
2025-05-13 15:29:46,615 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [775.50867, 824.06586, 967.8201, 3416.7617, 1559.1136, 145.97849, 1168.1144, 520.1312, 36.332104, 3307.1443]
2025-05-13 15:29:46,615 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [289.0, 298.0, 351.0, 994.0, 485.0, 279.0, 367.0, 211.0, 49.0, 1000.0]
2025-05-13 15:29:46,626 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1251 [DEBUG]: Training session finished
