2025-05-13 09:06:27,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mda-mem24
2025-05-13 09:06:27,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mda-mem24
2025-05-13 09:06:27,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x152df36b6a90>}
2025-05-13 09:06:27,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:27,300 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1133 [INFO]: Creating new trainer
2025-05-13 09:06:27,329 baseline-bpql-mda-noisy-walker2d:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-13 09:06:27,329 baseline-bpql-mda-noisy-walker2d:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:27,334 baseline-bpql-mda-noisy-walker2d:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(6, 384, batch_first=True)
)
2025-05-13 09:06:28,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:28,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 1/100
2025-05-13 09:10:21,487 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:10:23,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 15.94576 ± 7.756
2025-05-13 09:10:23,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [14.028778, 17.843918, 22.717903, 30.21801, 2.4068873, 17.640707, 23.72485, 10.951035, 9.082273, 10.843215]
2025-05-13 09:10:23,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [97.0, 98.0, 96.0, 99.0, 118.0, 94.0, 97.0, 95.0, 92.0, 94.0]
2025-05-13 09:10:23,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (15.95) for latency ExtremeClogL1U23
2025-05-13 09:10:23,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 6 hours, 27 minutes, 57 seconds)
2025-05-13 09:14:25,706 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:14:28,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 214.66415 ± 23.646
2025-05-13 09:14:28,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [209.90173, 202.80759, 217.84616, 220.24493, 275.2006, 194.7724, 201.81728, 194.47388, 234.20323, 195.3737]
2025-05-13 09:14:28,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [121.0, 115.0, 132.0, 124.0, 182.0, 114.0, 106.0, 117.0, 138.0, 113.0]
2025-05-13 09:14:28,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (214.66) for latency ExtremeClogL1U23
2025-05-13 09:14:28,092 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 31 minutes, 53 seconds)
2025-05-13 09:18:31,229 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:18:32,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 14.00350 ± 8.372
2025-05-13 09:18:32,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [26.280846, 15.85542, 21.077566, 23.05784, 12.859901, -0.7018622, 18.246588, 1.9402924, 13.15005, 8.268365]
2025-05-13 09:18:32,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [35.0, 106.0, 87.0, 47.0, 100.0, 122.0, 87.0, 109.0, 71.0, 113.0]
2025-05-13 09:18:32,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 6 hours, 30 minutes, 30 seconds)
2025-05-13 09:22:35,448 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:22:37,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 191.46622 ± 80.002
2025-05-13 09:22:37,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [102.2748, 272.17618, 248.39896, 163.16882, 49.60833, 235.91426, 145.03998, 333.46182, 161.01567, 203.60335]
2025-05-13 09:22:37,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [134.0, 154.0, 141.0, 117.0, 64.0, 128.0, 84.0, 208.0, 87.0, 108.0]
2025-05-13 09:22:37,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 6 hours, 27 minutes, 48 seconds)
2025-05-13 09:26:41,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:26:43,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 156.00046 ± 54.703
2025-05-13 09:26:43,115 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [159.00539, 227.87888, 148.86586, 13.230011, 149.1928, 163.02351, 144.45078, 212.22803, 186.39, 155.73943]
2025-05-13 09:26:43,115 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [126.0, 137.0, 83.0, 24.0, 81.0, 130.0, 109.0, 155.0, 119.0, 127.0]
2025-05-13 09:26:43,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 6 hours, 24 minutes, 43 seconds)
2025-05-13 09:30:47,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:30:51,105 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 360.17563 ± 195.303
2025-05-13 09:30:51,105 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [389.97714, 161.47174, 529.91296, 259.55145, 713.1134, 645.70953, 172.65578, 132.67477, 325.41782, 271.2716]
2025-05-13 09:30:51,105 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [194.0, 127.0, 276.0, 165.0, 352.0, 338.0, 128.0, 96.0, 231.0, 145.0]
2025-05-13 09:30:51,105 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (360.18) for latency ExtremeClogL1U23
2025-05-13 09:30:51,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 6 hours, 24 minutes, 42 seconds)
2025-05-13 09:34:57,143 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:35:00,836 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 199.29219 ± 263.050
2025-05-13 09:35:00,837 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [122.49324, 10.607659, 222.86685, 140.96118, 185.02202, 101.71871, 58.715813, 77.108986, 104.0637, 969.36365]
2025-05-13 09:35:00,837 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [139.0, 21.0, 313.0, 190.0, 188.0, 156.0, 138.0, 129.0, 184.0, 523.0]
2025-05-13 09:35:00,841 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 6 hours, 22 minutes, 9 seconds)
2025-05-13 09:39:01,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:39:03,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 225.97177 ± 71.635
2025-05-13 09:39:03,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [203.31508, 162.01158, 139.1244, 98.585884, 338.20096, 302.13058, 284.96362, 251.97102, 237.13799, 242.27686]
2025-05-13 09:39:03,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [132.0, 98.0, 101.0, 95.0, 183.0, 170.0, 178.0, 139.0, 145.0, 161.0]
2025-05-13 09:39:03,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 6 hours, 17 minutes, 32 seconds)
2025-05-13 09:43:10,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:43:14,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 509.75351 ± 219.423
2025-05-13 09:43:14,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [624.86584, 116.94042, 338.01297, 357.997, 575.9412, 360.3659, 411.0355, 744.7173, 685.08374, 882.5753]
2025-05-13 09:43:14,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [274.0, 72.0, 220.0, 150.0, 263.0, 147.0, 208.0, 376.0, 313.0, 382.0]
2025-05-13 09:43:14,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (509.75) for latency ExtremeClogL1U23
2025-05-13 09:43:14,885 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 15 minutes, 15 seconds)
2025-05-13 09:47:14,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:47:19,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 383.16287 ± 257.060
2025-05-13 09:47:19,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [84.128136, 438.5621, 905.58325, 660.7574, 516.3102, 294.09235, 150.09337, 412.78912, 355.91037, 13.402315]
2025-05-13 09:47:19,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [121.0, 245.0, 970.0, 431.0, 282.0, 183.0, 157.0, 213.0, 163.0, 25.0]
2025-05-13 09:47:19,788 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 6 hours, 11 minutes)
2025-05-13 09:51:24,790 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:51:29,585 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 471.36099 ± 242.302
2025-05-13 09:51:29,585 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [602.23065, 14.70146, 694.8872, 431.38058, 307.66833, 438.2614, 429.06015, 320.1892, 497.24124, 977.9895]
2025-05-13 09:51:29,585 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [252.0, 25.0, 338.0, 273.0, 172.0, 191.0, 283.0, 213.0, 224.0, 561.0]
2025-05-13 09:51:29,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 6 hours, 7 minutes, 24 seconds)
2025-05-13 09:55:33,973 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:55:38,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 386.91144 ± 208.175
2025-05-13 09:55:38,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [425.1458, 195.20995, 364.81308, 136.33281, 128.44968, 543.74713, 399.6316, 740.751, 241.55223, 693.4809]
2025-05-13 09:55:38,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [180.0, 120.0, 215.0, 164.0, 101.0, 247.0, 220.0, 404.0, 205.0, 347.0]
2025-05-13 09:55:38,131 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 6 hours, 2 minutes, 56 seconds)
2025-05-13 09:59:40,398 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:59:45,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 515.09497 ± 146.719
2025-05-13 09:59:45,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [439.86377, 535.5537, 385.74518, 319.07724, 325.02167, 621.55, 479.78903, 679.8918, 576.63947, 787.818]
2025-05-13 09:59:45,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [199.0, 253.0, 180.0, 172.0, 159.0, 287.0, 222.0, 326.0, 261.0, 439.0]
2025-05-13 09:59:45,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (515.09) for latency ExtremeClogL1U23
2025-05-13 09:59:45,066 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 59 minutes, 55 seconds)
2025-05-13 10:03:52,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:03:56,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 434.70963 ± 194.434
2025-05-13 10:03:56,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [210.11661, 789.52783, 392.67276, 558.3064, 342.42966, 361.7417, 560.62915, 365.34808, 114.04837, 652.2756]
2025-05-13 10:03:56,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [118.0, 378.0, 242.0, 267.0, 169.0, 171.0, 235.0, 177.0, 103.0, 263.0]
2025-05-13 10:03:56,089 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 55 minutes, 48 seconds)
2025-05-13 10:07:57,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:08:01,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 418.71762 ± 144.418
2025-05-13 10:08:01,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [336.80173, 609.9568, 524.17236, 550.5159, 656.7119, 255.49364, 250.67686, 373.6611, 342.49066, 286.69513]
2025-05-13 10:08:01,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [177.0, 251.0, 266.0, 253.0, 312.0, 158.0, 167.0, 169.0, 167.0, 179.0]
2025-05-13 10:08:01,702 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 51 minutes, 52 seconds)
2025-05-13 10:12:13,934 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:12:19,916 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 651.85730 ± 471.513
2025-05-13 10:12:19,916 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1040.8315, 95.47108, 561.80426, 527.8237, 165.88153, 1863.9548, 524.8618, 546.5888, 593.9614, 597.39404]
2025-05-13 10:12:19,916 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [409.0, 104.0, 246.0, 220.0, 160.0, 888.0, 243.0, 258.0, 269.0, 384.0]
2025-05-13 10:12:19,916 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (651.86) for latency ExtremeClogL1U23
2025-05-13 10:12:19,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 50 minutes, 5 seconds)
2025-05-13 10:16:12,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:16:17,833 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 552.18243 ± 276.053
2025-05-13 10:16:17,833 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [512.1225, 433.69412, 596.1671, 683.9176, 529.2716, 569.4698, 403.80417, 1251.5857, 439.5273, 102.26427]
2025-05-13 10:16:17,833 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [231.0, 214.0, 252.0, 299.0, 262.0, 265.0, 245.0, 604.0, 221.0, 106.0]
2025-05-13 10:16:17,839 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 42 minutes, 59 seconds)
2025-05-13 10:20:32,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:20:36,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 554.46002 ± 251.069
2025-05-13 10:20:36,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [469.59384, 14.01294, 591.1216, 438.80136, 1061.1674, 650.27716, 574.18243, 675.1721, 666.8699, 403.4007]
2025-05-13 10:20:36,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [191.0, 25.0, 243.0, 201.0, 394.0, 278.0, 242.0, 250.0, 263.0, 173.0]
2025-05-13 10:20:36,835 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 42 minutes, 9 seconds)
2025-05-13 10:24:34,903 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:24:38,508 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 427.40927 ± 191.435
2025-05-13 10:24:38,508 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [309.32358, 833.0776, 241.22617, 117.737144, 268.50266, 472.8936, 560.879, 492.07843, 504.22522, 474.1492]
2025-05-13 10:24:38,508 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [127.0, 313.0, 231.0, 74.0, 148.0, 195.0, 240.0, 188.0, 198.0, 204.0]
2025-05-13 10:24:38,514 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 35 minutes, 27 seconds)
2025-05-13 10:28:40,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:28:43,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 400.00528 ± 135.067
2025-05-13 10:28:43,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [463.97083, 269.47473, 449.8856, 346.10394, 614.0509, 343.87335, 456.9959, 412.43372, 535.23755, 108.026085]
2025-05-13 10:28:43,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [176.0, 111.0, 183.0, 144.0, 221.0, 164.0, 179.0, 166.0, 206.0, 68.0]
2025-05-13 10:28:43,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 31 minutes, 2 seconds)
2025-05-13 10:32:52,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:32:57,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 685.09546 ± 294.119
2025-05-13 10:32:57,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [583.60675, 642.8426, 789.95374, 776.007, 1380.1293, 717.50854, 347.14636, 412.52515, 871.55634, 329.67902]
2025-05-13 10:32:57,609 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [286.0, 251.0, 296.0, 307.0, 620.0, 265.0, 248.0, 189.0, 306.0, 167.0]
2025-05-13 10:32:57,609 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (685.10) for latency ExtremeClogL1U23
2025-05-13 10:32:57,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 25 minutes, 55 seconds)
2025-05-13 10:37:04,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:37:09,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 635.19519 ± 387.721
2025-05-13 10:37:09,225 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [424.40387, 823.54254, 103.20908, 632.55566, 1113.1235, 234.0585, 168.41562, 1345.4572, 735.81384, 771.3722]
2025-05-13 10:37:09,225 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [193.0, 464.0, 68.0, 273.0, 414.0, 140.0, 89.0, 535.0, 293.0, 278.0]
2025-05-13 10:37:09,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 25 minutes, 21 seconds)
2025-05-13 10:41:10,795 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:41:16,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 749.78601 ± 461.215
2025-05-13 10:41:16,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [530.55786, 496.45227, 607.01733, 1357.529, 869.971, 511.77972, 819.2764, 105.55054, 1766.6274, 433.09793]
2025-05-13 10:41:16,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [196.0, 214.0, 222.0, 479.0, 310.0, 213.0, 308.0, 66.0, 637.0, 175.0]
2025-05-13 10:41:16,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (749.79) for latency ExtremeClogL1U23
2025-05-13 10:41:16,138 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 18 minutes, 5 seconds)
2025-05-13 10:45:20,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:45:25,205 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 595.74927 ± 219.650
2025-05-13 10:45:25,205 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [569.6898, 509.55136, 976.5916, 587.659, 581.88226, 119.06475, 809.59814, 581.50275, 777.4479, 444.50516]
2025-05-13 10:45:25,205 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [252.0, 227.0, 364.0, 219.0, 212.0, 83.0, 291.0, 225.0, 310.0, 210.0]
2025-05-13 10:45:25,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 15 minutes, 49 seconds)
2025-05-13 10:49:23,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:49:28,514 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 599.03809 ± 189.277
2025-05-13 10:49:28,514 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [651.2475, 278.39392, 572.8767, 676.11444, 219.63078, 852.759, 652.26514, 761.4053, 648.1682, 677.5204]
2025-05-13 10:49:28,514 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [252.0, 143.0, 235.0, 290.0, 187.0, 334.0, 261.0, 286.0, 300.0, 257.0]
2025-05-13 10:49:28,521 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 11 minutes, 21 seconds)
2025-05-13 10:53:31,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:53:36,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 633.07690 ± 280.804
2025-05-13 10:53:36,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [350.79025, 718.99225, 1026.8573, 495.3264, 819.8268, 1018.1797, 365.0664, 261.40836, 378.10114, 896.22064]
2025-05-13 10:53:36,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [165.0, 254.0, 406.0, 214.0, 311.0, 375.0, 175.0, 151.0, 152.0, 319.0]
2025-05-13 10:53:36,284 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 5 minutes, 32 seconds)
2025-05-13 10:58:02,738 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:58:10,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1156.14050 ± 492.916
2025-05-13 10:58:10,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1277.3447, 1047.5402, 1070.9177, 1162.8088, 10.872966, 959.86646, 1349.721, 1814.007, 975.43915, 1892.8873]
2025-05-13 10:58:10,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [417.0, 386.0, 352.0, 441.0, 21.0, 363.0, 436.0, 684.0, 358.0, 708.0]
2025-05-13 10:58:10,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1156.14) for latency ExtremeClogL1U23
2025-05-13 10:58:10,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 6 minutes, 57 seconds)
2025-05-13 11:01:57,506 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:02:03,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 780.09106 ± 442.665
2025-05-13 11:02:03,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [384.30933, 14.390812, 1195.1343, 769.4058, 962.58215, 800.1164, 479.3233, 1068.623, 482.0762, 1644.9498]
2025-05-13 11:02:03,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [156.0, 26.0, 431.0, 272.0, 370.0, 278.0, 176.0, 408.0, 223.0, 593.0]
2025-05-13 11:02:03,026 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 59 minutes, 15 seconds)
2025-05-13 11:06:07,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:06:18,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1594.45996 ± 447.073
2025-05-13 11:06:18,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1659.0204, 1252.0685, 1056.7295, 2145.1895, 1568.8113, 1468.5631, 2537.299, 1304.9564, 1110.234, 1841.7275]
2025-05-13 11:06:18,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [557.0, 434.0, 401.0, 796.0, 567.0, 499.0, 934.0, 466.0, 381.0, 630.0]
2025-05-13 11:06:18,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1594.46) for latency ExtremeClogL1U23
2025-05-13 11:06:18,878 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 56 minutes, 42 seconds)
2025-05-13 11:10:27,253 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:10:36,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1334.81360 ± 584.173
2025-05-13 11:10:36,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [300.10306, 1347.127, 1070.5698, 1092.454, 2064.1882, 2515.3596, 1548.5751, 947.7477, 1043.2588, 1418.752]
2025-05-13 11:10:36,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [180.0, 465.0, 357.0, 376.0, 692.0, 857.0, 519.0, 338.0, 360.0, 481.0]
2025-05-13 11:10:36,026 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 55 minutes, 45 seconds)
2025-05-13 11:14:32,181 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:14:38,809 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 857.93573 ± 763.924
2025-05-13 11:14:38,809 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [227.5587, 656.81964, 525.89056, 2970.426, 968.81213, 269.898, 987.3053, 484.17313, 366.9233, 1121.5509]
2025-05-13 11:14:38,809 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [154.0, 271.0, 242.0, 1000.0, 364.0, 192.0, 413.0, 193.0, 229.0, 433.0]
2025-05-13 11:14:38,815 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 50 minutes, 22 seconds)
2025-05-13 11:18:43,410 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:18:51,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1337.47485 ± 478.262
2025-05-13 11:18:51,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1683.1298, 1412.8905, 972.83215, 1002.13074, 1079.3091, 1775.091, 938.5088, 2471.5254, 1094.9984, 944.3328]
2025-05-13 11:18:51,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [533.0, 460.0, 321.0, 373.0, 392.0, 576.0, 336.0, 756.0, 405.0, 361.0]
2025-05-13 11:18:51,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 41 minutes, 20 seconds)
2025-05-13 11:22:57,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:23:06,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1401.91858 ± 538.387
2025-05-13 11:23:06,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2543.8667, 1300.5541, 2046.656, 1239.174, 787.5371, 1697.2883, 923.3759, 826.8017, 1096.3052, 1557.6265]
2025-05-13 11:23:06,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [784.0, 413.0, 663.0, 405.0, 297.0, 543.0, 302.0, 274.0, 322.0, 532.0]
2025-05-13 11:23:06,013 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 42 minutes, 4 seconds)
2025-05-13 11:27:12,301 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:27:22,798 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1667.23413 ± 766.258
2025-05-13 11:27:22,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1428.7892, 2042.846, 1359.7964, 1028.8622, 2602.8228, 3023.7295, 173.81503, 1282.6134, 1869.1521, 1859.9143]
2025-05-13 11:27:22,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [504.0, 680.0, 443.0, 344.0, 811.0, 1000.0, 97.0, 442.0, 556.0, 639.0]
2025-05-13 11:27:22,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1667.23) for latency ExtremeClogL1U23
2025-05-13 11:27:22,806 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 38 minutes, 3 seconds)
2025-05-13 11:31:36,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:31:47,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1885.65918 ± 924.702
2025-05-13 11:31:47,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3398.1543, 736.9481, 875.3399, 1129.7197, 2608.0862, 2490.3076, 1887.6002, 3148.7058, 1574.6096, 1007.12024]
2025-05-13 11:31:47,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 261.0, 282.0, 375.0, 833.0, 734.0, 592.0, 1000.0, 518.0, 346.0]
2025-05-13 11:31:47,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1885.66) for latency ExtremeClogL1U23
2025-05-13 11:31:47,837 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 35 minutes, 33 seconds)
2025-05-13 11:35:39,767 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:35:51,476 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1908.97583 ± 848.665
2025-05-13 11:35:51,476 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1472.1884, 2149.7466, 1696.2666, 3125.0938, 873.4972, 1268.5648, 694.02893, 1833.7911, 2728.054, 3248.5278]
2025-05-13 11:35:51,476 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [473.0, 714.0, 580.0, 1000.0, 318.0, 445.0, 255.0, 579.0, 862.0, 1000.0]
2025-05-13 11:35:51,476 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1908.98) for latency ExtremeClogL1U23
2025-05-13 11:35:51,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 31 minutes, 30 seconds)
2025-05-13 11:39:57,137 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:40:07,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1714.65552 ± 970.996
2025-05-13 11:40:07,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [576.1633, 2818.8958, 1798.3041, 987.07745, 2906.7996, 2130.38, 324.3719, 796.72235, 3130.8145, 1677.0255]
2025-05-13 11:40:07,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [232.0, 921.0, 584.0, 362.0, 964.0, 711.0, 190.0, 293.0, 1000.0, 526.0]
2025-05-13 11:40:07,996 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 27 minutes, 58 seconds)
2025-05-13 11:44:13,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:44:26,776 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2302.58936 ± 917.577
2025-05-13 11:44:26,776 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1551.5577, 3188.275, 1778.0658, 1947.8894, 1833.0417, 431.42746, 3154.4985, 3377.2397, 2436.759, 3327.14]
2025-05-13 11:44:26,776 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [485.0, 1000.0, 552.0, 568.0, 676.0, 169.0, 1000.0, 1000.0, 728.0, 1000.0]
2025-05-13 11:44:26,776 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (2302.59) for latency ExtremeClogL1U23
2025-05-13 11:44:26,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 24 minutes, 41 seconds)
2025-05-13 11:48:27,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:48:41,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2372.55054 ± 1208.561
2025-05-13 11:48:41,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3547.5747, 3321.9229, 3428.986, 3186.5752, 3212.5447, 2607.7378, 2401.1099, 1536.3698, 257.06512, 225.62065]
2025-05-13 11:48:41,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 982.0, 1000.0, 951.0, 1000.0, 799.0, 792.0, 575.0, 112.0, 161.0]
2025-05-13 11:48:41,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (2372.55) for latency ExtremeClogL1U23
2025-05-13 11:48:41,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 20 minutes)
2025-05-13 11:52:42,409 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:52:51,723 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1718.54590 ± 573.530
2025-05-13 11:52:51,723 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1535.6128, 2119.547, 1546.29, 2220.562, 1309.019, 1100.1633, 1154.5991, 1468.2762, 1642.8456, 3088.544]
2025-05-13 11:52:51,723 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [444.0, 612.0, 452.0, 621.0, 382.0, 346.0, 372.0, 460.0, 505.0, 814.0]
2025-05-13 11:52:51,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 12 minutes, 46 seconds)
2025-05-13 11:56:51,369 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:57:02,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1803.96997 ± 906.540
2025-05-13 11:57:02,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3111.3416, 3130.0835, 2154.071, 1532.688, 236.20998, 1185.018, 2594.6135, 1232.1527, 939.8055, 1923.7167]
2025-05-13 11:57:02,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [926.0, 1000.0, 678.0, 473.0, 160.0, 379.0, 863.0, 436.0, 314.0, 621.0]
2025-05-13 11:57:02,290 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 9 minutes, 55 seconds)
2025-05-13 12:01:12,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:01:23,349 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1820.26331 ± 1330.125
2025-05-13 12:01:23,349 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1522.689, 502.23547, 3250.7688, 2966.797, 3285.2737, 2526.6377, 13.023303, 3327.4177, 798.0067, 9.785386]
2025-05-13 12:01:23,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [461.0, 182.0, 1000.0, 916.0, 1000.0, 734.0, 24.0, 981.0, 282.0, 20.0]
2025-05-13 12:01:23,360 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 6 minutes, 34 seconds)
2025-05-13 12:05:11,812 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:05:23,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1867.45447 ± 1264.133
2025-05-13 12:05:23,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3288.035, 3342.194, 2219.78, 453.48483, 310.7891, 3097.2969, 15.457059, 3203.1936, 1263.735, 1480.5804]
2025-05-13 12:05:23,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 712.0, 172.0, 311.0, 964.0, 26.0, 1000.0, 452.0, 505.0]
2025-05-13 12:05:23,448 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 58 minutes, 45 seconds)
2025-05-13 12:09:41,513 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:09:54,038 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2072.43994 ± 1312.055
2025-05-13 12:09:54,038 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3519.4683, 2715.4324, 284.84778, 3247.32, 3325.8738, 267.6109, 1607.5381, 333.14313, 1922.6223, 3500.5437]
2025-05-13 12:09:54,038 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 164.0, 1000.0, 1000.0, 181.0, 516.0, 146.0, 562.0, 1000.0]
2025-05-13 12:09:54,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 57 minutes, 32 seconds)
2025-05-13 12:13:44,762 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:13:57,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2236.72339 ± 1112.089
2025-05-13 12:13:57,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2083.0251, 3394.3108, 1310.2089, 3585.113, 371.85165, 955.3908, 1340.5353, 3050.9075, 2895.876, 3380.0156]
2025-05-13 12:13:57,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [640.0, 1000.0, 402.0, 1000.0, 156.0, 307.0, 412.0, 846.0, 798.0, 1000.0]
2025-05-13 12:13:57,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 51 minutes, 59 seconds)
2025-05-13 12:17:55,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:18:03,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1315.55774 ± 634.694
2025-05-13 12:18:03,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [959.4751, 357.18555, 1931.4973, 2772.652, 1056.4894, 1032.7028, 1342.42, 1560.6301, 794.381, 1348.1433]
2025-05-13 12:18:03,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [359.0, 152.0, 583.0, 788.0, 343.0, 342.0, 404.0, 498.0, 273.0, 413.0]
2025-05-13 12:18:03,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 47 minutes)
2025-05-13 12:22:21,423 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:22:31,410 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1762.27563 ± 1253.922
2025-05-13 12:22:31,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [17.627264, 1844.4705, 3524.064, 3345.37, 2408.7014, 1899.2145, 15.692394, 301.04117, 1370.1034, 2896.471]
2025-05-13 12:22:31,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [26.0, 566.0, 1000.0, 1000.0, 746.0, 572.0, 26.0, 127.0, 431.0, 811.0]
2025-05-13 12:22:31,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 44 minutes, 1 second)
2025-05-13 12:26:15,647 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:26:24,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1467.52893 ± 1091.501
2025-05-13 12:26:24,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1506.6576, 93.56772, 953.13477, 886.68134, 2937.8025, 2876.9072, 1206.3573, 3221.825, 843.74457, 148.61101]
2025-05-13 12:26:24,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [545.0, 62.0, 320.0, 294.0, 821.0, 822.0, 398.0, 919.0, 289.0, 83.0]
2025-05-13 12:26:24,161 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 38 minutes, 31 seconds)
2025-05-13 12:30:34,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:30:45,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1819.52832 ± 878.943
2025-05-13 12:30:45,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1583.6477, 2827.871, 2856.511, 2047.326, 2977.0054, 1623.8135, 1167.401, 183.72025, 2093.9883, 833.9993]
2025-05-13 12:30:45,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [491.0, 829.0, 837.0, 614.0, 840.0, 502.0, 368.0, 117.0, 604.0, 293.0]
2025-05-13 12:30:45,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 32 minutes, 42 seconds)
2025-05-13 12:34:50,360 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:35:01,312 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1922.79517 ± 1008.956
2025-05-13 12:35:01,312 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3156.3035, 1000.1869, 2463.9417, 2306.2686, 2106.4402, 246.6692, 2458.6929, 3369.433, 487.55328, 1632.4629]
2025-05-13 12:35:01,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 340.0, 691.0, 643.0, 595.0, 180.0, 779.0, 969.0, 187.0, 487.0]
2025-05-13 12:35:01,323 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 30 minutes, 41 seconds)
2025-05-13 12:39:07,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:39:20,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2244.79370 ± 1048.708
2025-05-13 12:39:20,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3377.148, 3307.3528, 3329.8777, 1821.0089, 2619.839, 2459.4644, 667.74133, 836.5621, 3125.0063, 903.9363]
2025-05-13 12:39:20,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 634.0, 767.0, 742.0, 270.0, 284.0, 931.0, 377.0]
2025-05-13 12:39:20,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 28 minutes, 37 seconds)
2025-05-13 12:43:10,567 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:43:24,443 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2477.74487 ± 1148.744
2025-05-13 12:43:24,443 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3473.0718, 3441.4226, 1162.8707, 3593.1787, 1824.3585, 3505.2131, 23.43465, 2076.8223, 3272.3372, 2404.74]
2025-05-13 12:43:24,443 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 372.0, 1000.0, 578.0, 976.0, 29.0, 715.0, 942.0, 662.0]
2025-05-13 12:43:24,443 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (2477.74) for latency ExtremeClogL1U23
2025-05-13 12:43:24,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 20 minutes, 29 seconds)
2025-05-13 12:47:24,858 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:47:33,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1535.81873 ± 1261.075
2025-05-13 12:47:33,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [478.26852, 3447.569, 95.84671, 2345.846, 475.3586, 2670.1934, 2138.92, 3111.0366, 100.944786, 494.2041]
2025-05-13 12:47:33,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [177.0, 1000.0, 65.0, 684.0, 177.0, 832.0, 624.0, 898.0, 67.0, 186.0]
2025-05-13 12:47:33,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 18 minutes, 53 seconds)
2025-05-13 12:51:34,388 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:51:48,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2435.97852 ± 1126.980
2025-05-13 12:51:48,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [139.88644, 1976.6841, 3171.3477, 1062.0597, 3731.847, 3382.6814, 3440.4922, 2438.7048, 1735.231, 3280.8508]
2025-05-13 12:51:48,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [78.0, 569.0, 1000.0, 356.0, 1000.0, 1000.0, 1000.0, 709.0, 533.0, 1000.0]
2025-05-13 12:51:48,206 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 13 minutes, 38 seconds)
2025-05-13 12:55:51,661 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:56:01,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1808.73267 ± 1271.459
2025-05-13 12:56:01,926 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [159.30927, 2666.6294, 2712.117, 867.27277, 3299.7659, 2635.847, 21.242685, 226.43219, 2187.4768, 3311.2341]
2025-05-13 12:56:01,926 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [92.0, 754.0, 741.0, 343.0, 1000.0, 744.0, 29.0, 146.0, 629.0, 1000.0]
2025-05-13 12:56:01,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 9 minutes, 5 seconds)
2025-05-13 13:00:12,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:00:24,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2029.81763 ± 1380.065
2025-05-13 13:00:24,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [148.45805, 2850.09, 1374.8959, 3330.6484, 3189.2815, 147.92424, 3591.1863, 1606.2291, 3643.6038, 415.8565]
2025-05-13 13:00:24,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [139.0, 867.0, 447.0, 1000.0, 1000.0, 80.0, 1000.0, 631.0, 1000.0, 162.0]
2025-05-13 13:00:24,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 5 minutes, 19 seconds)
2025-05-13 13:04:31,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:04:41,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1837.13013 ± 1071.987
2025-05-13 13:04:41,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [896.02246, 665.52795, 1626.4691, 1995.9203, 3360.1929, 2700.4832, 1603.3184, 10.898145, 3470.114, 2042.355]
2025-05-13 13:04:41,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [347.0, 257.0, 505.0, 599.0, 1000.0, 814.0, 491.0, 23.0, 1000.0, 628.0]
2025-05-13 13:04:41,706 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 3 hours, 3 minutes, 4 seconds)
2025-05-13 13:08:38,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:08:48,687 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1787.17944 ± 1393.272
2025-05-13 13:08:48,687 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1448.0886, 3728.2197, 3120.2986, 313.6065, 1197.6848, 1036.395, 3596.5, 164.88327, 3168.7441, 97.37232]
2025-05-13 13:08:48,687 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [469.0, 1000.0, 914.0, 136.0, 421.0, 342.0, 1000.0, 88.0, 843.0, 63.0]
2025-05-13 13:08:48,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 58 minutes, 29 seconds)
2025-05-13 13:12:59,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:13:04,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 834.12305 ± 1023.179
2025-05-13 13:13:04,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [356.06607, 305.66653, 593.58685, 151.6287, 353.36658, 214.27515, 3616.4968, 103.19755, 1507.0017, 1139.9438]
2025-05-13 13:13:04,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [152.0, 122.0, 205.0, 81.0, 143.0, 106.0, 979.0, 66.0, 468.0, 377.0]
2025-05-13 13:13:04,271 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 54 minutes, 23 seconds)
2025-05-13 13:16:55,839 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:17:06,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1840.92310 ± 1417.780
2025-05-13 13:17:06,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [604.3912, 394.0131, 199.20532, 3402.7932, 3415.633, 3650.4338, 1535.1399, 1792.4861, 55.55014, 3359.5845]
2025-05-13 13:17:06,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [312.0, 171.0, 133.0, 947.0, 971.0, 1000.0, 497.0, 542.0, 64.0, 1000.0]
2025-05-13 13:17:06,595 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 48 minutes, 37 seconds)
2025-05-13 13:21:19,973 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:21:35,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2823.79932 ± 1034.423
2025-05-13 13:21:35,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3480.1724, 2249.0469, 3402.021, 1399.9457, 3198.155, 3342.7646, 3577.7366, 3611.836, 3501.7554, 474.56024]
2025-05-13 13:21:35,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [976.0, 701.0, 1000.0, 433.0, 985.0, 958.0, 1000.0, 1000.0, 1000.0, 251.0]
2025-05-13 13:21:35,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (2823.80) for latency ExtremeClogL1U23
2025-05-13 13:21:35,672 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 45 minutes, 16 seconds)
2025-05-13 13:25:20,835 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:25:33,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2365.35107 ± 1326.287
2025-05-13 13:25:33,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3911.2922, 1181.6982, 3681.2673, 334.9337, 296.2358, 2624.6003, 3604.1064, 3708.2493, 1863.3256, 2447.8008]
2025-05-13 13:25:33,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 425.0, 1000.0, 148.0, 144.0, 723.0, 1000.0, 1000.0, 531.0, 710.0]
2025-05-13 13:25:33,563 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 38 minutes, 34 seconds)
2025-05-13 13:29:47,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:29:57,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1761.26001 ± 1116.612
2025-05-13 13:29:57,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3215.9358, 2658.594, 1218.4109, 3345.9268, 2902.776, 1690.9636, 721.94415, 269.58514, 435.39246, 1153.0728]
2025-05-13 13:29:57,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [963.0, 828.0, 396.0, 1000.0, 892.0, 550.0, 227.0, 177.0, 201.0, 372.0]
2025-05-13 13:29:57,631 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 36 minutes, 30 seconds)
2025-05-13 13:33:53,016 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:34:05,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2220.76221 ± 1342.824
2025-05-13 13:34:05,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [313.28403, 3496.3489, 3519.6187, 1284.137, 3039.4714, 2432.3884, 3519.8804, 3502.4849, 1085.4532, 14.555075]
2025-05-13 13:34:05,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [128.0, 1000.0, 1000.0, 377.0, 855.0, 691.0, 1000.0, 1000.0, 362.0, 25.0]
2025-05-13 13:34:05,423 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 31 minutes, 20 seconds)
2025-05-13 13:38:02,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:38:18,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2817.42700 ± 997.644
2025-05-13 13:38:18,677 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3381.1382, 3234.948, 1243.6991, 3313.112, 3400.2578, 3323.2856, 3323.3108, 468.27725, 3187.9802, 3298.2617]
2025-05-13 13:38:18,677 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 402.0, 1000.0, 1000.0, 1000.0, 1000.0, 198.0, 1000.0, 1000.0]
2025-05-13 13:38:18,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 28 minutes, 24 seconds)
2025-05-13 13:42:43,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:42:52,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1484.38293 ± 1321.435
2025-05-13 13:42:52,481 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3348.7751, 72.389755, 2448.4329, 333.0325, 505.06558, 2088.5715, 81.59691, 166.34743, 2307.6875, 3491.9307]
2025-05-13 13:42:52,481 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 72.0, 772.0, 171.0, 190.0, 656.0, 74.0, 139.0, 681.0, 1000.0]
2025-05-13 13:42:52,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 24 minutes, 42 seconds)
2025-05-13 13:46:41,738 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:46:53,479 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2061.30273 ± 1239.549
2025-05-13 13:46:53,479 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3500.06, 2284.2183, 2386.3477, 384.03476, 223.92108, 3439.084, 2928.2761, 611.6905, 3410.157, 1445.2389]
2025-05-13 13:46:53,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 664.0, 630.0, 192.0, 163.0, 1000.0, 910.0, 226.0, 1000.0, 429.0]
2025-05-13 13:46:53,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 20 minutes, 47 seconds)
2025-05-13 13:51:06,794 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:51:20,520 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2325.44312 ± 1181.491
2025-05-13 13:51:20,520 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1768.9924, 3437.954, 448.19357, 714.5843, 3596.436, 982.0189, 3512.199, 2847.5854, 2553.4373, 3393.032]
2025-05-13 13:51:20,520 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [613.0, 1000.0, 218.0, 304.0, 1000.0, 341.0, 1000.0, 845.0, 835.0, 1000.0]
2025-05-13 13:51:20,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 16 minutes, 50 seconds)
2025-05-13 13:55:22,992 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:55:39,838 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2994.34180 ± 698.166
2025-05-13 13:55:39,838 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3475.4514, 3358.9097, 3517.8047, 3527.6206, 3318.491, 2610.5183, 3308.5356, 1913.8566, 3418.8335, 1493.3967]
2025-05-13 13:55:39,838 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 775.0, 1000.0, 578.0, 1000.0, 461.0]
2025-05-13 13:55:39,838 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (2994.34) for latency ExtremeClogL1U23
2025-05-13 13:55:39,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 13 minutes, 45 seconds)
2025-05-13 13:59:25,221 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:59:36,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2169.51807 ± 1091.462
2025-05-13 13:59:36,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [924.69495, 1385.2482, 1001.16077, 3436.6523, 2141.9478, 3826.552, 3449.7725, 1574.586, 2968.2695, 986.2974]
2025-05-13 13:59:36,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [306.0, 408.0, 323.0, 919.0, 606.0, 1000.0, 943.0, 457.0, 769.0, 304.0]
2025-05-13 13:59:36,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 7 minutes, 46 seconds)
2025-05-13 14:03:38,829 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:03:51,536 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2178.15479 ± 1335.183
2025-05-13 14:03:51,536 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [382.36646, 2033.6503, 569.7389, 3335.2468, 3428.996, 1131.4658, 3411.089, 3575.1643, 3444.485, 469.3468]
2025-05-13 14:03:51,536 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [199.0, 624.0, 224.0, 1000.0, 1000.0, 411.0, 1000.0, 1000.0, 1000.0, 191.0]
2025-05-13 14:03:51,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 1 minute, 42 seconds)
2025-05-13 14:07:50,461 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:08:03,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2543.63354 ± 1291.220
2025-05-13 14:08:03,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3552.4692, 3632.169, 3596.679, 155.75192, 1712.2786, 2307.1216, 3185.8013, 3234.3484, 372.3825, 3687.3345]
2025-05-13 14:08:03,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 95.0, 525.0, 605.0, 875.0, 861.0, 155.0, 1000.0]
2025-05-13 14:08:03,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 58 minutes, 34 seconds)
2025-05-13 14:12:15,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:12:26,679 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1986.61401 ± 1480.722
2025-05-13 14:12:26,679 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3673.8782, 1517.4799, 2096.0232, 624.71606, 3645.463, 114.64043, 3526.6326, 327.05658, 505.42026, 3834.8318]
2025-05-13 14:12:26,679 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 438.0, 588.0, 221.0, 1000.0, 70.0, 1000.0, 138.0, 185.0, 1000.0]
2025-05-13 14:12:26,689 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 53 minutes, 57 seconds)
2025-05-13 14:16:42,186 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:17:00,478 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3527.40430 ± 533.026
2025-05-13 14:17:00,478 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3799.5288, 3431.9636, 3554.7876, 1983.7958, 3737.7986, 3581.825, 3866.5825, 3633.1123, 3879.2864, 3805.3623]
2025-05-13 14:17:00,478 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 562.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:17:00,478 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (3527.40) for latency ExtremeClogL1U23
2025-05-13 14:17:00,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 50 minutes, 59 seconds)
2025-05-13 14:21:03,663 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:21:19,881 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3005.17212 ± 1038.194
2025-05-13 14:21:19,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3739.5103, 1894.9373, 3591.4006, 3873.7756, 3577.7793, 3458.4868, 2542.7002, 414.99582, 3518.0166, 3440.1187]
2025-05-13 14:21:19,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 546.0, 1000.0, 1000.0, 1000.0, 1000.0, 746.0, 169.0, 1000.0, 1000.0]
2025-05-13 14:21:19,891 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 48 minutes, 36 seconds)
2025-05-13 14:25:12,763 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:25:27,186 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2813.71167 ± 1370.831
2025-05-13 14:25:27,186 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3817.553, 3090.6907, 3825.7122, 3948.9485, 1027.6843, 3750.225, 1035.6741, 249.42235, 3827.949, 3563.2588]
2025-05-13 14:25:27,186 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 775.0, 1000.0, 1000.0, 321.0, 1000.0, 334.0, 126.0, 1000.0, 992.0]
2025-05-13 14:25:27,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 43 minutes, 39 seconds)
2025-05-13 14:29:35,620 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:29:43,636 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1318.92993 ± 1392.846
2025-05-13 14:29:43,636 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [142.7131, 3614.7966, 2621.623, 199.27246, 3587.685, 174.33087, 1882.0525, 389.15533, 419.0991, 158.57227]
2025-05-13 14:29:43,636 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [82.0, 1000.0, 735.0, 95.0, 1000.0, 326.0, 560.0, 221.0, 171.0, 85.0]
2025-05-13 14:29:43,644 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 39 minutes, 38 seconds)
2025-05-13 14:33:27,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:33:43,856 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3211.67822 ± 971.721
2025-05-13 14:33:43,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3970.4585, 2542.394, 3640.3054, 3891.321, 825.038, 3727.5784, 3753.131, 2255.2134, 3681.4138, 3829.927]
2025-05-13 14:33:43,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 694.0, 1000.0, 1000.0, 274.0, 1000.0, 1000.0, 620.0, 1000.0, 1000.0]
2025-05-13 14:33:43,866 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 33 minutes, 39 seconds)
2025-05-13 14:37:57,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:38:15,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3314.55225 ± 553.422
2025-05-13 14:38:15,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3265.7822, 3294.3123, 3827.2676, 3532.2148, 3479.4395, 1733.9171, 3684.2258, 3438.8867, 3582.7688, 3306.7056]
2025-05-13 14:38:15,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 486.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:38:15,626 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 29 minutes, 15 seconds)
2025-05-13 14:42:29,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:42:45,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3225.75269 ± 978.120
2025-05-13 14:42:45,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3650.0417, 3631.7913, 3575.8777, 3547.373, 3838.99, 3851.1614, 3644.6077, 2136.7136, 3727.3777, 653.5939]
2025-05-13 14:42:45,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 573.0, 1000.0, 302.0]
2025-05-13 14:42:45,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 25 minutes, 43 seconds)
2025-05-13 14:46:34,927 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:46:51,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2981.06885 ± 911.828
2025-05-13 14:46:51,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1258.5951, 3744.9458, 2891.3186, 3317.2603, 1157.0479, 3568.9336, 3459.8325, 3568.272, 3480.866, 3363.6155]
2025-05-13 14:46:51,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [398.0, 1000.0, 844.0, 1000.0, 342.0, 1000.0, 1000.0, 979.0, 1000.0, 1000.0]
2025-05-13 14:46:51,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 21 minutes, 19 seconds)
2025-05-13 14:51:06,444 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:51:22,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3134.94116 ± 976.472
2025-05-13 14:51:22,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2346.0098, 3635.4683, 3698.5059, 2569.3826, 3734.069, 3714.3977, 3773.0327, 3592.3137, 3683.5798, 602.6518]
2025-05-13 14:51:22,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [673.0, 1000.0, 1000.0, 693.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 221.0]
2025-05-13 14:51:22,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 17 minutes, 56 seconds)
2025-05-13 14:55:17,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:55:36,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3631.72510 ± 135.079
2025-05-13 14:55:36,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3657.0205, 3421.0298, 3691.3115, 3347.1465, 3711.3157, 3712.118, 3780.4531, 3570.929, 3736.2231, 3689.704]
2025-05-13 14:55:36,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:55:36,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (3631.73) for latency ExtremeClogL1U23
2025-05-13 14:55:36,172 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 14 minutes, 21 seconds)
2025-05-13 14:59:26,506 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:59:42,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2984.36133 ± 1019.679
2025-05-13 14:59:42,453 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3452.3088, 3701.52, 3627.2927, 3312.7686, 693.2709, 1444.3385, 2618.914, 3693.368, 3652.8901, 3646.9407]
2025-05-13 14:59:42,453 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 929.0, 259.0, 425.0, 709.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:59:42,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 8 minutes, 37 seconds)
2025-05-13 15:03:42,643 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:04:00,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3369.87817 ± 763.991
2025-05-13 15:04:00,493 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3751.3184, 3514.2239, 3696.8894, 3045.2312, 3744.6785, 3695.02, 1159.3574, 3741.431, 3653.7832, 3696.8513]
2025-05-13 15:04:00,493 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 846.0, 1000.0, 1000.0, 386.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:04:00,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 3 minutes, 43 seconds)
2025-05-13 15:08:19,477 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:08:35,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3011.62622 ± 1114.349
2025-05-13 15:08:35,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3655.6345, 3650.4084, 882.93427, 3619.0544, 3406.5508, 3486.379, 3612.7078, 695.7992, 3584.3452, 3522.4478]
2025-05-13 15:08:35,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 299.0, 1000.0, 1000.0, 1000.0, 1000.0, 247.0, 1000.0, 1000.0]
2025-05-13 15:08:35,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 1 hour, 52 seconds)
2025-05-13 15:12:30,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:12:48,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3291.88281 ± 773.503
2025-05-13 15:12:48,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3619.785, 999.95795, 3605.2278, 3254.7703, 3712.4822, 3671.6565, 3454.9026, 3548.5747, 3532.3557, 3519.118]
2025-05-13 15:12:48,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 315.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:12:48,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 55 minutes, 43 seconds)
2025-05-13 15:16:57,461 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:17:14,005 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3210.97754 ± 901.768
2025-05-13 15:17:14,005 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3699.097, 3603.9417, 3636.5283, 2962.2473, 901.1682, 3684.6218, 3755.255, 3902.7786, 2255.0774, 3709.0613]
2025-05-13 15:17:14,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 911.0, 1000.0, 793.0, 312.0, 1000.0, 1000.0, 1000.0, 637.0, 1000.0]
2025-05-13 15:17:14,015 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 51 minutes, 54 seconds)
2025-05-13 15:21:03,240 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:21:17,964 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2822.85693 ± 1345.772
2025-05-13 15:21:17,964 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3718.0957, 1763.7749, 3881.6042, 3820.7007, 2926.613, 3818.2334, 343.25763, 3806.697, 3629.499, 520.0924]
2025-05-13 15:21:17,964 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 511.0, 1000.0, 1000.0, 834.0, 1000.0, 146.0, 1000.0, 1000.0, 197.0]
2025-05-13 15:21:17,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 47 minutes, 30 seconds)
2025-05-13 15:25:39,837 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:25:55,283 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3009.27783 ± 1194.077
2025-05-13 15:25:55,284 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3795.1108, 3859.8936, 1030.873, 3803.2585, 3041.7346, 3881.9802, 518.196, 2586.0762, 3868.75, 3706.9019]
2025-05-13 15:25:55,284 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 342.0, 1000.0, 805.0, 1000.0, 189.0, 723.0, 1000.0, 1000.0]
2025-05-13 15:25:55,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 43 minutes, 49 seconds)
2025-05-13 15:29:45,878 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:29:58,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2346.92480 ± 1443.579
2025-05-13 15:29:58,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3785.1216, 2394.0073, 402.60403, 3809.3809, 3832.7146, 545.0997, 3814.326, 3197.744, 869.5073, 818.74255]
2025-05-13 15:29:58,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 659.0, 166.0, 1000.0, 1000.0, 184.0, 1000.0, 865.0, 279.0, 316.0]
2025-05-13 15:29:58,269 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 38 minutes, 28 seconds)
2025-05-13 15:34:08,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:34:24,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3288.28369 ± 1161.686
2025-05-13 15:34:24,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3634.9075, 3942.876, 3848.1372, 3704.1086, 3768.163, 2505.571, 3834.7212, 11.947572, 3700.3108, 3932.093]
2025-05-13 15:34:24,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 983.0, 676.0, 1000.0, 23.0, 1000.0, 1000.0]
2025-05-13 15:34:24,815 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 34 minutes, 33 seconds)
2025-05-13 15:38:39,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:38:58,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3681.38672 ± 126.412
2025-05-13 15:38:58,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3538.0586, 3698.986, 3779.2378, 3487.1287, 3822.3782, 3691.234, 3874.9216, 3586.2817, 3779.1592, 3556.4807]
2025-05-13 15:38:58,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:38:58,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (3681.39) for latency ExtremeClogL1U23
2025-05-13 15:38:58,979 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 30 minutes, 26 seconds)
2025-05-13 15:42:43,916 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:43:03,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3678.50659 ± 123.987
2025-05-13 15:43:03,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3566.0618, 3717.0894, 3586.6917, 3664.143, 3796.1262, 3673.3228, 3954.5312, 3476.3455, 3672.3208, 3678.4338]
2025-05-13 15:43:03,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:43:03,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 26 minutes, 6 seconds)
2025-05-13 15:47:12,312 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:47:28,038 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2968.57178 ± 1417.449
2025-05-13 15:47:28,038 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3764.22, 3662.4182, 3703.522, 3650.937, 3705.2002, 3666.1729, 116.07937, 155.43059, 3546.1907, 3715.5461]
2025-05-13 15:47:28,038 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 74.0, 89.0, 1000.0, 1000.0]
2025-05-13 15:47:28,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 21 minutes, 32 seconds)
2025-05-13 15:51:43,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:51:55,846 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2358.54175 ± 1735.594
2025-05-13 15:51:55,846 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3925.4097, 3900.6956, 149.33353, 3746.9858, 125.42022, 568.197, 3556.389, 122.82812, 3671.6316, 3818.5278]
2025-05-13 15:51:55,846 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 88.0, 1000.0, 84.0, 225.0, 1000.0, 84.0, 1000.0, 1000.0]
2025-05-13 15:51:55,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 17 minutes, 34 seconds)
2025-05-13 15:56:00,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:56:14,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2611.71387 ± 1212.989
2025-05-13 15:56:14,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2438.1233, 1589.1707, 3658.6133, 1843.7875, 837.35455, 3830.803, 3671.1055, 732.7909, 3619.3977, 3895.991]
2025-05-13 15:56:14,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [718.0, 471.0, 1000.0, 541.0, 270.0, 1000.0, 1000.0, 247.0, 1000.0, 1000.0]
2025-05-13 15:56:14,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 13 minutes, 5 seconds)
2025-05-13 16:00:25,265 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:00:43,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3520.97021 ± 886.247
2025-05-13 16:00:43,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3762.4011, 3813.831, 869.4738, 3865.4812, 3884.3914, 3707.0623, 3810.987, 3900.9211, 3878.0674, 3717.0867]
2025-05-13 16:00:43,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 291.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:00:43,202 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 8 minutes, 41 seconds)
2025-05-13 16:04:26,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:04:42,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2948.84668 ± 1443.449
2025-05-13 16:04:42,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3675.2742, 3637.8713, 3640.7832, 3729.4268, 12.998798, 3613.4866, 116.81117, 3576.9746, 3811.3167, 3673.5234]
2025-05-13 16:04:42,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 24.0, 1000.0, 86.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:04:42,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 19 seconds)
2025-05-13 16:09:02,115 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:09:19,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3372.36011 ± 1087.037
2025-05-13 16:09:19,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3759.3489, 120.33327, 3790.9956, 3793.9673, 3640.4417, 3531.044, 3793.593, 3736.9705, 3758.4, 3798.5078]
2025-05-13 16:09:19,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 85.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:09:19,644 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1251 [DEBUG]: Training session finished
