2025-05-13 09:06:36,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mda-mem2
2025-05-13 09:06:36,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-bpql-mda-mem2
2025-05-13 09:06:36,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1466b09aa250>}
2025-05-13 09:06:36,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:36,821 baseline-bpql-mda-noisy-walker2d:91 [WARNING]: args.assumed_delay != args.horizon: 2 != 24
2025-05-13 09:06:36,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1133 [INFO]: Creating new trainer
2025-05-13 09:06:36,838 baseline-bpql-mda-noisy-walker2d:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-13 09:06:36,838 baseline-bpql-mda-noisy-walker2d:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:36,844 baseline-bpql-mda-noisy-walker2d:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(6, 384, batch_first=True)
)
2025-05-13 09:06:37,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:37,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 1/100
2025-05-13 09:10:15,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:10:17,409 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 226.02856 ± 142.311
2025-05-13 09:10:17,409 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [308.1786, 331.90152, 389.29565, 387.91376, 135.9026, 93.74233, 44.914204, 182.56366, 6.926612, 378.9466]
2025-05-13 09:10:17,409 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [190.0, 231.0, 259.0, 262.0, 76.0, 206.0, 66.0, 109.0, 19.0, 259.0]
2025-05-13 09:10:17,409 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (226.03) for latency ExtremeClogL1U23
2025-05-13 09:10:17,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 6 hours, 3 minutes, 1 second)
2025-05-13 09:14:07,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:14:08,667 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 166.90741 ± 104.386
2025-05-13 09:14:08,667 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [254.16222, 270.67148, 55.245724, 264.41885, 27.191746, 157.09593, 319.54565, 95.60713, 26.830397, 198.30495]
2025-05-13 09:14:08,667 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [147.0, 153.0, 167.0, 165.0, 34.0, 98.0, 171.0, 162.0, 41.0, 110.0]
2025-05-13 09:14:08,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 8 minutes, 32 seconds)
2025-05-13 09:17:54,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:17:56,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 207.83554 ± 134.315
2025-05-13 09:17:56,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [197.79102, 491.79407, 116.83001, 344.72397, 265.22992, 151.44344, 48.791214, 206.3474, 246.10043, 9.304195]
2025-05-13 09:17:56,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [116.0, 396.0, 249.0, 319.0, 226.0, 97.0, 50.0, 114.0, 115.0, 16.0]
2025-05-13 09:17:56,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 6 hours, 5 minutes, 47 seconds)
2025-05-13 09:21:46,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:21:47,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 142.83459 ± 146.771
2025-05-13 09:21:47,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [335.51257, 289.44345, 24.8627, 414.9046, -1.1612117, 62.382526, 51.184177, 206.85696, 17.774817, 26.585333]
2025-05-13 09:21:47,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [173.0, 153.0, 37.0, 247.0, 24.0, 76.0, 63.0, 126.0, 40.0, 56.0]
2025-05-13 09:21:47,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 6 hours, 4 minutes, 4 seconds)
2025-05-13 09:25:36,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:25:38,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 243.67581 ± 152.983
2025-05-13 09:25:38,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [69.47997, 51.512806, 364.3919, 330.37936, 423.4342, 249.82094, 48.29475, 506.83127, 181.89279, 210.72032]
2025-05-13 09:25:38,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [84.0, 64.0, 193.0, 174.0, 262.0, 132.0, 57.0, 275.0, 108.0, 124.0]
2025-05-13 09:25:38,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (243.68) for latency ExtremeClogL1U23
2025-05-13 09:25:38,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 6 hours, 1 minute, 24 seconds)
2025-05-13 09:29:31,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:29:33,158 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 241.30168 ± 218.038
2025-05-13 09:29:33,158 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [13.386046, 704.154, 328.39832, 187.38948, 56.149128, 26.206871, 460.73798, 25.436794, 392.68796, 218.47025]
2025-05-13 09:29:33,158 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 478.0, 252.0, 143.0, 103.0, 39.0, 224.0, 30.0, 262.0, 122.0]
2025-05-13 09:29:33,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 6 hours, 2 minutes, 8 seconds)
2025-05-13 09:33:21,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:33:23,828 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 294.91931 ± 161.478
2025-05-13 09:33:23,828 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [62.359222, 338.1001, 336.65262, 61.47581, 220.44617, 276.04642, 272.8217, 655.37744, 324.31506, 401.59897]
2025-05-13 09:33:23,828 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [68.0, 207.0, 489.0, 81.0, 295.0, 145.0, 175.0, 331.0, 184.0, 186.0]
2025-05-13 09:33:23,828 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (294.92) for latency ExtremeClogL1U23
2025-05-13 09:33:23,834 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 58 minutes, 6 seconds)
2025-05-13 09:37:12,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:37:15,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 322.05145 ± 114.128
2025-05-13 09:37:15,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [192.30641, 563.0429, 303.58835, 309.97974, 355.53568, 245.06924, 278.7673, 172.3321, 475.39072, 324.5023]
2025-05-13 09:37:15,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [205.0, 265.0, 168.0, 184.0, 206.0, 129.0, 137.0, 159.0, 207.0, 164.0]
2025-05-13 09:37:15,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (322.05) for latency ExtremeClogL1U23
2025-05-13 09:37:15,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 55 minutes, 23 seconds)
2025-05-13 09:41:03,358 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:41:05,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 344.93115 ± 84.032
2025-05-13 09:41:05,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [482.2949, 447.81018, 254.6534, 427.56274, 258.18796, 241.53525, 322.02356, 340.95654, 397.5839, 276.70276]
2025-05-13 09:41:05,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [199.0, 231.0, 156.0, 252.0, 134.0, 138.0, 174.0, 213.0, 216.0, 158.0]
2025-05-13 09:41:05,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (344.93) for latency ExtremeClogL1U23
2025-05-13 09:41:05,764 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 51 minutes, 18 seconds)
2025-05-13 09:44:59,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:45:02,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 350.14966 ± 232.661
2025-05-13 09:45:02,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [547.2596, 56.178288, 384.18643, 506.8818, 626.00775, 59.37736, 265.41718, 705.07043, 42.2183, 308.89932]
2025-05-13 09:45:02,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [267.0, 87.0, 232.0, 269.0, 332.0, 83.0, 184.0, 361.0, 65.0, 195.0]
2025-05-13 09:45:02,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (350.15) for latency ExtremeClogL1U23
2025-05-13 09:45:02,028 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 48 minutes, 59 seconds)
2025-05-13 09:48:47,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:48:49,766 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 300.83963 ± 119.776
2025-05-13 09:48:49,766 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [202.42928, 258.36658, 398.5671, 351.17004, 126.86709, 509.09143, 119.40381, 299.7966, 330.1703, 412.53384]
2025-05-13 09:48:49,766 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [127.0, 152.0, 204.0, 195.0, 82.0, 262.0, 76.0, 167.0, 182.0, 254.0]
2025-05-13 09:48:49,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 43 minutes, 7 seconds)
2025-05-13 09:52:41,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:52:45,316 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 354.79477 ± 265.082
2025-05-13 09:52:45,316 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [328.58466, 316.86523, 244.35178, 346.18774, 438.55258, 535.82904, 39.415573, 326.41232, 992.76, -21.011147]
2025-05-13 09:52:45,316 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [185.0, 156.0, 156.0, 209.0, 244.0, 529.0, 57.0, 176.0, 1000.0, 98.0]
2025-05-13 09:52:45,316 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (354.79) for latency ExtremeClogL1U23
2025-05-13 09:52:45,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 40 minutes, 42 seconds)
2025-05-13 09:56:33,936 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:56:36,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 388.21970 ± 210.848
2025-05-13 09:56:36,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [567.4577, 687.69666, 27.666365, 182.3097, 626.9727, 345.73502, 586.8395, 211.20348, 401.57614, 244.73979]
2025-05-13 09:56:36,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [304.0, 479.0, 50.0, 149.0, 329.0, 219.0, 269.0, 158.0, 252.0, 128.0]
2025-05-13 09:56:36,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (388.22) for latency ExtremeClogL1U23
2025-05-13 09:56:36,912 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 36 minutes, 56 seconds)
2025-05-13 10:00:30,208 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:00:33,068 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 439.97980 ± 73.176
2025-05-13 10:00:33,068 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [530.1318, 311.92358, 484.99103, 388.5496, 459.98114, 344.2674, 385.95322, 480.76984, 480.86267, 532.36755]
2025-05-13 10:00:33,068 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [224.0, 171.0, 243.0, 208.0, 222.0, 203.0, 241.0, 268.0, 210.0, 263.0]
2025-05-13 10:00:33,068 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (439.98) for latency ExtremeClogL1U23
2025-05-13 10:00:33,075 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 34 minutes, 37 seconds)
2025-05-13 10:04:20,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:04:23,433 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 439.91098 ± 241.590
2025-05-13 10:04:23,433 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [633.6683, 555.4797, 241.99779, 381.82565, 270.53445, 973.0707, 49.722595, 369.13373, 364.2372, 559.4398]
2025-05-13 10:04:23,433 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [322.0, 293.0, 161.0, 220.0, 159.0, 494.0, 67.0, 238.0, 185.0, 319.0]
2025-05-13 10:04:23,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 29 minutes, 4 seconds)
2025-05-13 10:08:15,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:08:17,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 226.84749 ± 140.288
2025-05-13 10:08:17,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [32.31776, 27.162926, 431.91797, 329.36374, 249.99683, 53.425053, 194.11488, 268.72952, 280.90613, 400.53998]
2025-05-13 10:08:17,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [39.0, 37.0, 210.0, 179.0, 123.0, 60.0, 109.0, 154.0, 169.0, 175.0]
2025-05-13 10:08:17,450 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 26 minutes, 57 seconds)
2025-05-13 10:12:03,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:12:05,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 183.72569 ± 110.540
2025-05-13 10:12:05,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [232.70288, 239.53247, 242.95383, 405.79388, 12.213707, 210.17432, 136.57596, 81.74403, 45.871067, 229.69482]
2025-05-13 10:12:05,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [136.0, 111.0, 108.0, 196.0, 23.0, 106.0, 100.0, 150.0, 79.0, 108.0]
2025-05-13 10:12:05,310 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 20 minutes, 55 seconds)
2025-05-13 10:15:53,899 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:15:55,758 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 264.88126 ± 176.578
2025-05-13 10:15:55,758 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [285.18964, 25.10087, 400.95337, 541.2037, 33.883736, 29.54554, 408.3638, 232.10771, 433.92532, 258.53894]
2025-05-13 10:15:55,758 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [129.0, 45.0, 235.0, 277.0, 75.0, 46.0, 226.0, 128.0, 202.0, 114.0]
2025-05-13 10:15:55,764 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 16 minutes, 45 seconds)
2025-05-13 10:19:48,141 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:19:50,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 354.06171 ± 72.848
2025-05-13 10:19:50,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [395.2102, 384.8018, 345.0299, 373.9957, 366.8965, 443.9709, 153.10283, 351.84006, 338.82028, 386.94897]
2025-05-13 10:19:50,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [184.0, 193.0, 176.0, 191.0, 190.0, 220.0, 82.0, 184.0, 178.0, 213.0]
2025-05-13 10:19:50,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 12 minutes, 30 seconds)
2025-05-13 10:23:40,949 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:23:43,752 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 423.10513 ± 194.941
2025-05-13 10:23:43,752 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [276.7922, 749.6957, 378.50967, 306.19626, 724.2072, 65.109886, 447.30087, 324.11722, 460.67444, 498.44754]
2025-05-13 10:23:43,752 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [159.0, 436.0, 190.0, 142.0, 356.0, 74.0, 213.0, 141.0, 223.0, 237.0]
2025-05-13 10:23:43,756 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 9 minutes, 25 seconds)
2025-05-13 10:27:32,369 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:27:34,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 285.87616 ± 105.024
2025-05-13 10:27:34,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [264.49, 389.90094, 317.9715, 268.71268, 20.215508, 318.89307, 327.5671, 223.89949, 298.81268, 428.29852]
2025-05-13 10:27:34,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [126.0, 204.0, 148.0, 116.0, 35.0, 144.0, 143.0, 124.0, 131.0, 187.0]
2025-05-13 10:27:34,120 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 4 minutes, 35 seconds)
2025-05-13 10:31:22,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:31:24,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 422.93417 ± 121.143
2025-05-13 10:31:24,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [534.47894, 452.0646, 307.18756, 442.64462, 546.112, 476.20602, 307.19724, 378.95877, 597.74225, 186.74977]
2025-05-13 10:31:24,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [217.0, 194.0, 170.0, 202.0, 223.0, 202.0, 130.0, 159.0, 264.0, 106.0]
2025-05-13 10:31:24,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 1 minute, 23 seconds)
2025-05-13 10:35:14,320 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:35:16,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 323.11642 ± 94.788
2025-05-13 10:35:16,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [238.50665, 149.34427, 349.8111, 368.7552, 366.83002, 284.08472, 403.56818, 459.8619, 406.3159, 204.08632]
2025-05-13 10:35:16,380 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [139.0, 127.0, 165.0, 194.0, 180.0, 125.0, 180.0, 239.0, 190.0, 115.0]
2025-05-13 10:35:16,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 57 minutes, 53 seconds)
2025-05-13 10:39:03,707 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:39:06,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 472.49872 ± 253.722
2025-05-13 10:39:06,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [259.48187, 207.17186, 239.56375, 329.67603, 684.04346, 523.3027, 423.66803, 693.1973, 317.2424, 1047.6398]
2025-05-13 10:39:06,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [125.0, 103.0, 135.0, 162.0, 359.0, 223.0, 244.0, 297.0, 186.0, 573.0]
2025-05-13 10:39:06,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (472.50) for latency ExtremeClogL1U23
2025-05-13 10:39:06,660 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 52 minutes, 53 seconds)
2025-05-13 10:42:54,663 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:42:58,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 534.29626 ± 335.334
2025-05-13 10:42:58,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [438.37555, 565.4232, 322.04547, 530.9827, 800.6072, 298.51666, 1284.9132, 262.76675, 45.962223, 793.3697]
2025-05-13 10:42:58,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [181.0, 291.0, 163.0, 317.0, 422.0, 160.0, 522.0, 133.0, 55.0, 344.0]
2025-05-13 10:42:58,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (534.30) for latency ExtremeClogL1U23
2025-05-13 10:42:58,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 48 minutes, 33 seconds)
2025-05-13 10:46:46,400 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:46:49,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 465.82480 ± 422.016
2025-05-13 10:46:49,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [889.03925, 224.13887, 370.56442, 1538.0354, 363.64896, 496.02917, 210.88678, 406.08875, 138.93124, 20.885155]
2025-05-13 10:46:49,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [410.0, 108.0, 152.0, 611.0, 165.0, 210.0, 119.0, 189.0, 96.0, 33.0]
2025-05-13 10:46:49,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 44 minutes, 52 seconds)
2025-05-13 10:50:33,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:50:37,733 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 754.64465 ± 326.438
2025-05-13 10:50:37,733 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [947.36426, 546.4482, 1263.121, 578.7654, 790.87665, 235.41072, 396.6919, 589.84906, 971.79395, 1226.1256]
2025-05-13 10:50:37,733 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [346.0, 213.0, 497.0, 267.0, 354.0, 121.0, 218.0, 249.0, 446.0, 438.0]
2025-05-13 10:50:37,733 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (754.64) for latency ExtremeClogL1U23
2025-05-13 10:50:37,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 40 minutes, 37 seconds)
2025-05-13 10:54:25,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:54:28,786 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 461.14218 ± 411.759
2025-05-13 10:54:28,786 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [352.41592, 327.21045, 541.71686, 106.75565, 194.00093, 1557.304, 435.51077, 22.338497, 704.3278, 369.84058]
2025-05-13 10:54:28,786 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [162.0, 169.0, 246.0, 116.0, 130.0, 619.0, 245.0, 260.0, 249.0, 208.0]
2025-05-13 10:54:28,792 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 36 minutes, 34 seconds)
2025-05-13 10:58:16,127 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:58:19,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 573.43341 ± 692.562
2025-05-13 10:58:19,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [25.730177, 28.376032, 657.90765, 788.81134, 511.50266, 195.91417, 20.841846, 628.38354, 390.9021, 2485.9644]
2025-05-13 10:58:19,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [38.0, 49.0, 267.0, 354.0, 205.0, 121.0, 32.0, 236.0, 159.0, 1000.0]
2025-05-13 10:58:19,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 32 minutes, 46 seconds)
2025-05-13 11:02:04,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:02:09,970 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 972.83386 ± 833.882
2025-05-13 11:02:09,970 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [23.117834, 1055.9436, 738.51874, 25.10642, 790.7564, 2120.6013, 301.54196, 2536.1023, 424.72958, 1711.9208]
2025-05-13 11:02:09,970 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [36.0, 384.0, 294.0, 44.0, 346.0, 858.0, 137.0, 1000.0, 167.0, 708.0]
2025-05-13 11:02:09,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (972.83) for latency ExtremeClogL1U23
2025-05-13 11:02:09,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 28 minutes, 47 seconds)
2025-05-13 11:05:56,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:05:59,749 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 633.65540 ± 265.509
2025-05-13 11:05:59,749 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [658.53174, 774.87555, 192.68283, 934.4281, 536.96326, 1152.0979, 530.9167, 510.35696, 711.29346, 334.4075]
2025-05-13 11:05:59,749 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [259.0, 298.0, 99.0, 334.0, 231.0, 486.0, 253.0, 233.0, 278.0, 135.0]
2025-05-13 11:05:59,756 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 24 minutes, 39 seconds)
2025-05-13 11:09:52,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:09:57,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 841.76379 ± 626.577
2025-05-13 11:09:57,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1038.4548, 235.03041, 502.99738, 308.76413, 882.3914, 513.529, 571.0798, 644.38367, 2502.0056, 1219.0021]
2025-05-13 11:09:57,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [486.0, 104.0, 197.0, 144.0, 346.0, 242.0, 258.0, 329.0, 1000.0, 510.0]
2025-05-13 11:09:57,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 22 minutes, 51 seconds)
2025-05-13 11:13:36,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:13:42,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1085.44238 ± 738.462
2025-05-13 11:13:42,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1529.8768, 255.47475, 441.49246, 1753.6539, 2113.923, 340.08597, 50.011402, 2019.4774, 918.5779, 1431.8501]
2025-05-13 11:13:42,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [594.0, 169.0, 217.0, 621.0, 833.0, 145.0, 63.0, 798.0, 381.0, 556.0]
2025-05-13 11:13:42,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1085.44) for latency ExtremeClogL1U23
2025-05-13 11:13:42,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 17 minutes, 34 seconds)
2025-05-13 11:17:35,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:17:38,743 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 698.06537 ± 712.490
2025-05-13 11:17:38,743 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [408.6498, 482.93063, 644.185, 2577.015, 244.55482, 227.30159, 252.66429, 476.44135, 241.18665, 1425.7252]
2025-05-13 11:17:38,743 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [171.0, 181.0, 262.0, 1000.0, 118.0, 112.0, 123.0, 187.0, 122.0, 607.0]
2025-05-13 11:17:38,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 15 minutes, 5 seconds)
2025-05-13 11:21:18,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:21:21,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 628.50848 ± 184.478
2025-05-13 11:21:21,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [794.8561, 788.2542, 914.54486, 579.1383, 536.98724, 386.68558, 437.81693, 461.84253, 514.0185, 870.94025]
2025-05-13 11:21:21,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [293.0, 292.0, 327.0, 223.0, 195.0, 163.0, 164.0, 196.0, 214.0, 326.0]
2025-05-13 11:21:21,237 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 9 minutes, 26 seconds)
2025-05-13 11:25:10,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:25:16,644 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1186.57874 ± 880.210
2025-05-13 11:25:16,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2162.4421, 2457.5923, 2566.2126, 1007.2882, 437.3216, 991.49963, 247.30653, 1395.5302, 28.21473, 572.3788]
2025-05-13 11:25:16,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [831.0, 869.0, 932.0, 398.0, 175.0, 427.0, 114.0, 603.0, 55.0, 253.0]
2025-05-13 11:25:16,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1186.58) for latency ExtremeClogL1U23
2025-05-13 11:25:16,651 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 6 minutes, 48 seconds)
2025-05-13 11:29:04,787 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:29:09,996 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 893.10339 ± 845.014
2025-05-13 11:29:09,996 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [256.35287, 21.50184, 799.2268, 291.18973, 2512.9155, 2328.968, 294.9915, 1342.8374, 777.6696, 305.37988]
2025-05-13 11:29:09,996 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [124.0, 38.0, 423.0, 151.0, 1000.0, 1000.0, 133.0, 560.0, 365.0, 133.0]
2025-05-13 11:29:10,003 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 2 minutes, 2 seconds)
2025-05-13 11:32:55,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:32:58,216 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 445.59122 ± 188.079
2025-05-13 11:32:58,216 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [672.63745, 565.081, 586.79254, 553.1337, 204.58673, 322.00555, 715.5979, 231.41823, 194.5652, 410.09366]
2025-05-13 11:32:58,216 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [247.0, 224.0, 283.0, 211.0, 103.0, 138.0, 271.0, 117.0, 98.0, 172.0]
2025-05-13 11:32:58,223 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 58 minutes, 55 seconds)
2025-05-13 11:36:47,274 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:36:50,789 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 643.29425 ± 328.858
2025-05-13 11:36:50,789 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [374.81903, 208.95224, 1105.53, 901.829, 1109.3788, 718.978, 890.7252, 386.94635, 228.87326, 506.91052]
2025-05-13 11:36:50,789 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [142.0, 99.0, 441.0, 370.0, 474.0, 329.0, 359.0, 202.0, 131.0, 203.0]
2025-05-13 11:36:50,794 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 54 minutes, 14 seconds)
2025-05-13 11:40:38,635 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:40:45,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1324.41113 ± 861.354
2025-05-13 11:40:45,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [-0.8319971, 2226.816, 2092.375, 1900.8848, 914.89166, 1372.4714, 354.48944, 2631.079, 1470.2162, 281.71942]
2025-05-13 11:40:45,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [12.0, 1000.0, 823.0, 900.0, 366.0, 527.0, 188.0, 1000.0, 653.0, 124.0]
2025-05-13 11:40:45,912 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1324.41) for latency ExtremeClogL1U23
2025-05-13 11:40:45,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 52 minutes, 56 seconds)
2025-05-13 11:44:32,196 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:44:37,936 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1116.34595 ± 885.818
2025-05-13 11:44:37,936 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [385.20966, 480.56238, 1052.3086, 891.99396, 1568.1647, 2683.1138, 2698.1887, 439.92776, 945.70526, 18.286123]
2025-05-13 11:44:37,936 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [171.0, 223.0, 392.0, 333.0, 675.0, 1000.0, 1000.0, 192.0, 326.0, 44.0]
2025-05-13 11:44:37,946 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 48 minutes, 23 seconds)
2025-05-13 11:48:32,553 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:48:40,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1477.53040 ± 938.924
2025-05-13 11:48:40,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [991.8456, 20.80829, 2579.4329, 1297.2264, 221.10956, 2575.0388, 633.3429, 1670.6783, 2426.341, 2359.4805]
2025-05-13 11:48:40,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [404.0, 35.0, 987.0, 539.0, 118.0, 1000.0, 253.0, 666.0, 1000.0, 1000.0]
2025-05-13 11:48:40,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1477.53) for latency ExtremeClogL1U23
2025-05-13 11:48:40,444 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 46 minutes, 17 seconds)
2025-05-13 11:52:20,779 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:52:25,425 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 980.65411 ± 816.620
2025-05-13 11:52:25,425 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1216.7677, 519.0957, 2945.3494, 268.4278, 643.4193, 244.48663, 1691.0204, 1452.4886, 460.88223, 364.6039]
2025-05-13 11:52:25,425 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [497.0, 184.0, 1000.0, 131.0, 254.0, 113.0, 539.0, 489.0, 187.0, 167.0]
2025-05-13 11:52:25,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 41 minutes, 46 seconds)
2025-05-13 11:56:10,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:56:17,015 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1357.11670 ± 912.950
2025-05-13 11:56:17,015 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [533.2967, 2680.6145, 548.7398, 394.3899, 2627.0774, 795.14056, 2586.0288, 663.9571, 1820.6514, 921.2704]
2025-05-13 11:56:17,015 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [268.0, 1000.0, 244.0, 165.0, 1000.0, 350.0, 972.0, 250.0, 686.0, 358.0]
2025-05-13 11:56:17,022 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 37 minutes, 41 seconds)
2025-05-13 12:00:22,557 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:00:26,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 824.85956 ± 693.323
2025-05-13 12:00:26,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [256.9228, 1498.7299, 2074.1177, 468.9582, 1930.8896, 592.88245, 340.71674, 678.5583, 66.12613, 340.6942]
2025-05-13 12:00:26,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [132.0, 632.0, 758.0, 199.0, 786.0, 214.0, 143.0, 271.0, 85.0, 153.0]
2025-05-13 12:00:26,832 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 36 minutes, 30 seconds)
2025-05-13 12:03:58,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:04:02,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 791.27258 ± 699.311
2025-05-13 12:04:02,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [288.06815, 702.4555, 824.9679, 219.98572, 249.88289, 2697.982, 353.90182, 1094.5573, 946.22614, 534.6991]
2025-05-13 12:04:02,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [135.0, 309.0, 337.0, 109.0, 114.0, 1000.0, 143.0, 373.0, 407.0, 216.0]
2025-05-13 12:04:02,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 29 minutes, 33 seconds)
2025-05-13 12:07:46,209 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:07:51,652 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1015.55731 ± 993.386
2025-05-13 12:07:51,652 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [217.24893, 430.2612, 175.45125, 2413.579, 2589.3342, 473.98175, 2568.6147, 408.41675, 357.74252, 520.94226]
2025-05-13 12:07:51,652 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [111.0, 167.0, 117.0, 1000.0, 1000.0, 190.0, 1000.0, 197.0, 168.0, 220.0]
2025-05-13 12:07:51,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 23 minutes, 22 seconds)
2025-05-13 12:11:43,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:11:48,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 911.59540 ± 854.319
2025-05-13 12:11:48,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1928.4092, 2673.0593, 806.2537, 1643.6722, 541.9718, 21.567902, 49.294495, 906.7417, 523.1337, 21.849821]
2025-05-13 12:11:48,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [705.0, 1000.0, 294.0, 683.0, 238.0, 31.0, 75.0, 376.0, 221.0, 34.0]
2025-05-13 12:11:48,422 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 21 minutes, 35 seconds)
2025-05-13 12:15:30,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:15:38,209 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1518.98010 ± 967.026
2025-05-13 12:15:38,209 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2383.2896, 573.4593, 470.5595, 1316.9874, 2793.2988, 654.0513, 916.9991, 2782.8408, 2699.0981, 599.2171]
2025-05-13 12:15:38,209 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 234.0, 169.0, 487.0, 1000.0, 274.0, 355.0, 1000.0, 1000.0, 268.0]
2025-05-13 12:15:38,209 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1518.98) for latency ExtremeClogL1U23
2025-05-13 12:15:38,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 17 minutes, 24 seconds)
2025-05-13 12:19:28,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:19:35,626 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1308.89539 ± 1090.242
2025-05-13 12:19:35,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2542.8862, 1005.3315, 2649.0728, 2648.215, 210.7239, 180.72993, 740.2988, 493.87592, 2576.8276, 40.990837]
2025-05-13 12:19:35,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 380.0, 1000.0, 1000.0, 96.0, 80.0, 313.0, 215.0, 1000.0, 55.0]
2025-05-13 12:19:35,634 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 11 minutes, 28 seconds)
2025-05-13 12:23:20,362 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:23:24,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 900.42590 ± 862.479
2025-05-13 12:23:24,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [468.5718, 643.7735, 800.66565, 2897.3362, 224.1553, 57.465355, 742.83026, 480.56497, 500.24008, 2188.6555]
2025-05-13 12:23:24,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [215.0, 260.0, 299.0, 1000.0, 132.0, 65.0, 266.0, 228.0, 208.0, 782.0]
2025-05-13 12:23:24,902 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 9 minutes, 54 seconds)
2025-05-13 12:27:28,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:27:33,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1020.86511 ± 781.890
2025-05-13 12:27:33,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2682.8875, 767.2584, 1615.5951, 693.78656, 73.74209, 20.732513, 1879.7144, 735.5751, 690.5667, 1048.7938]
2025-05-13 12:27:33,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [936.0, 298.0, 600.0, 278.0, 62.0, 34.0, 694.0, 278.0, 270.0, 425.0]
2025-05-13 12:27:33,581 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 9 minutes, 6 seconds)
2025-05-13 12:31:06,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:31:14,564 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1683.40405 ± 987.100
2025-05-13 12:31:14,564 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [764.46576, 2834.1711, 1852.3633, 2408.058, 30.466576, 286.05502, 2811.7034, 1199.3875, 2391.6711, 2255.6992]
2025-05-13 12:31:14,564 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [320.0, 1000.0, 643.0, 1000.0, 49.0, 131.0, 1000.0, 449.0, 804.0, 790.0]
2025-05-13 12:31:14,564 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1683.40) for latency ExtremeClogL1U23
2025-05-13 12:31:14,572 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 2 minutes, 41 seconds)
2025-05-13 12:35:05,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:35:12,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1493.90771 ± 982.276
2025-05-13 12:35:12,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2733.7048, 207.66386, 714.584, 376.01822, 258.62466, 2278.5054, 2426.2075, 1314.1646, 2638.3538, 1991.251]
2025-05-13 12:35:12,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 103.0, 269.0, 168.0, 138.0, 865.0, 878.0, 540.0, 1000.0, 722.0]
2025-05-13 12:35:12,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 4 seconds)
2025-05-13 12:39:02,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:39:10,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1633.12427 ± 973.169
2025-05-13 12:39:10,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2727.7102, 481.48593, 2752.8896, 2617.5042, 1869.1646, 344.88095, 2660.0178, 567.9686, 1579.896, 729.7253]
2025-05-13 12:39:10,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 216.0, 978.0, 1000.0, 703.0, 157.0, 990.0, 224.0, 571.0, 305.0]
2025-05-13 12:39:10,033 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 56 minutes, 9 seconds)
2025-05-13 12:43:07,981 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:43:16,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1787.16528 ± 939.419
2025-05-13 12:43:16,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2626.4558, 2240.0537, 2719.6233, 2721.324, 1116.607, 2157.2874, 811.1681, 196.6936, 2656.6086, 625.82996]
2025-05-13 12:43:16,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 825.0, 1000.0, 1000.0, 412.0, 890.0, 342.0, 129.0, 1000.0, 240.0]
2025-05-13 12:43:16,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1787.17) for latency ExtremeClogL1U23
2025-05-13 12:43:16,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 54 minutes, 49 seconds)
2025-05-13 12:46:49,260 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:46:58,037 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1817.13062 ± 655.416
2025-05-13 12:46:58,037 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1798.1604, 1351.1526, 2144.399, 2656.8586, 821.06287, 2209.4658, 2551.9253, 2302.5132, 667.2257, 1668.5436]
2025-05-13 12:46:58,037 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [690.0, 494.0, 804.0, 1000.0, 349.0, 829.0, 953.0, 801.0, 259.0, 639.0]
2025-05-13 12:46:58,037 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1817.13) for latency ExtremeClogL1U23
2025-05-13 12:46:58,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 46 minutes, 54 seconds)
2025-05-13 12:50:48,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:50:55,187 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1303.43225 ± 714.371
2025-05-13 12:50:55,187 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [840.6958, 1452.5121, 462.02054, 1416.057, 2596.043, 569.1581, 1112.1337, 652.17084, 2512.4287, 1421.1029]
2025-05-13 12:50:55,187 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [416.0, 564.0, 198.0, 544.0, 1000.0, 243.0, 577.0, 272.0, 1000.0, 545.0]
2025-05-13 12:50:55,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 45 minutes, 17 seconds)
2025-05-13 12:54:51,227 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:54:58,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1492.15466 ± 679.921
2025-05-13 12:54:58,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1796.0969, 2871.9333, 1177.7263, 1087.0966, 703.32806, 1152.9469, 759.1383, 1166.9105, 2447.2063, 1759.1636]
2025-05-13 12:54:58,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [625.0, 986.0, 450.0, 399.0, 266.0, 435.0, 247.0, 462.0, 865.0, 637.0]
2025-05-13 12:54:58,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 42 minutes, 1 second)
2025-05-13 12:58:35,188 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:58:41,689 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1271.00806 ± 938.760
2025-05-13 12:58:41,689 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2736.132, 111.518036, 768.2352, 250.95392, 672.82733, 1353.4313, 498.50293, 1321.6958, 2658.8923, 2337.8916]
2025-05-13 12:58:41,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 71.0, 283.0, 160.0, 286.0, 529.0, 201.0, 497.0, 958.0, 1000.0]
2025-05-13 12:58:41,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 36 minutes, 13 seconds)
2025-05-13 13:02:31,803 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:02:38,609 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1337.37732 ± 952.901
2025-05-13 13:02:38,609 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2486.7317, 863.5616, 346.19485, 868.695, 293.81427, 2669.121, 1231.3582, 107.609985, 2619.7205, 1886.9663]
2025-05-13 13:02:38,609 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 350.0, 167.0, 327.0, 139.0, 984.0, 490.0, 80.0, 1000.0, 726.0]
2025-05-13 13:02:38,616 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 31 minutes, 1 second)
2025-05-13 13:06:31,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:06:37,027 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1181.35559 ± 1071.015
2025-05-13 13:06:37,027 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1643.1624, 416.5747, 2684.2244, 2812.243, 532.549, 24.467793, 2468.7043, 201.09584, 20.647898, 1009.88635]
2025-05-13 13:06:37,027 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [605.0, 179.0, 994.0, 1000.0, 206.0, 46.0, 883.0, 109.0, 32.0, 418.0]
2025-05-13 13:06:37,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 29 minutes, 20 seconds)
2025-05-13 13:10:18,688 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:10:23,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1044.23059 ± 577.417
2025-05-13 13:10:23,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1103.755, 1983.5667, 1468.5964, 459.50723, 471.51337, 1161.339, 17.351913, 766.2943, 1623.1061, 1387.2754]
2025-05-13 13:10:23,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [390.0, 671.0, 526.0, 200.0, 206.0, 410.0, 33.0, 298.0, 657.0, 494.0]
2025-05-13 13:10:23,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 24 minutes, 5 seconds)
2025-05-13 13:14:13,123 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:14:19,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1319.52441 ± 987.261
2025-05-13 13:14:19,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1211.8544, 485.32593, 2754.555, 2602.7114, 444.03406, 614.6091, 2805.2148, 928.8141, 1336.2263, 11.898695]
2025-05-13 13:14:19,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [436.0, 209.0, 1000.0, 988.0, 204.0, 250.0, 1000.0, 358.0, 522.0, 25.0]
2025-05-13 13:14:19,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 19 minutes, 22 seconds)
2025-05-13 13:18:07,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:18:13,173 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1135.95264 ± 874.350
2025-05-13 13:18:13,173 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [693.0008, 1588.4458, 1871.4839, 1353.7035, 1261.857, 1497.7677, 200.00368, -12.208821, 2886.919, 18.553642]
2025-05-13 13:18:13,173 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [325.0, 581.0, 679.0, 487.0, 484.0, 548.0, 103.0, 14.0, 1000.0, 40.0]
2025-05-13 13:18:13,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 16 minutes, 40 seconds)
2025-05-13 13:22:00,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:22:08,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1627.26526 ± 935.977
2025-05-13 13:22:08,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2651.2095, 2698.9841, 880.9825, 1881.3422, 2404.1707, 2648.0835, 1657.1265, 891.0043, 32.44406, 527.30566]
2025-05-13 13:22:08,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 355.0, 688.0, 869.0, 1000.0, 633.0, 356.0, 57.0, 206.0]
2025-05-13 13:22:08,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 12 minutes, 34 seconds)
2025-05-13 13:26:06,643 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:26:14,225 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1531.73438 ± 779.156
2025-05-13 13:26:14,225 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [217.48734, 916.3364, 1492.3798, 1208.541, 1774.396, 2599.5063, 2688.6187, 2385.4895, 839.4112, 1195.1766]
2025-05-13 13:26:14,225 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [107.0, 329.0, 557.0, 454.0, 684.0, 1000.0, 1000.0, 863.0, 344.0, 528.0]
2025-05-13 13:26:14,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 9 minutes, 29 seconds)
2025-05-13 13:29:53,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:30:00,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1365.04663 ± 914.363
2025-05-13 13:30:00,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [424.26318, 1152.3997, 2682.5808, 1300.384, 2597.9666, 980.10034, 1170.4963, 132.8503, 520.5273, 2688.8977]
2025-05-13 13:30:00,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [184.0, 413.0, 1000.0, 495.0, 1000.0, 368.0, 443.0, 99.0, 230.0, 1000.0]
2025-05-13 13:30:00,330 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 5 minutes, 31 seconds)
2025-05-13 13:33:52,505 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:33:58,914 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1275.08252 ± 879.087
2025-05-13 13:33:58,914 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1218.4073, 1254.0621, 17.093803, 2682.3713, 1343.181, 364.2264, 361.6733, 1013.23395, 1733.5758, 2762.9998]
2025-05-13 13:33:58,914 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [450.0, 503.0, 28.0, 1000.0, 549.0, 192.0, 201.0, 366.0, 692.0, 1000.0]
2025-05-13 13:33:58,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 1 minute, 52 seconds)
2025-05-13 13:37:45,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:37:52,612 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1379.21570 ± 882.057
2025-05-13 13:37:52,612 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [438.8313, 218.16925, 2677.9822, 1260.9197, 406.44342, 2468.1472, 1172.1031, 2654.1038, 1325.9491, 1169.5074]
2025-05-13 13:37:52,612 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [213.0, 119.0, 1000.0, 522.0, 198.0, 906.0, 439.0, 1000.0, 525.0, 435.0]
2025-05-13 13:37:52,625 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 57 minutes, 56 seconds)
2025-05-13 13:41:50,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:41:56,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1226.99890 ± 635.162
2025-05-13 13:41:56,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1532.1041, 673.3207, 1322.0746, 1712.1354, 1635.2443, 614.67773, 2236.0461, 257.50977, 481.3924, 1805.484]
2025-05-13 13:41:56,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [585.0, 272.0, 458.0, 631.0, 579.0, 256.0, 1000.0, 127.0, 204.0, 607.0]
2025-05-13 13:41:56,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 54 minutes, 48 seconds)
2025-05-13 13:45:29,101 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:45:35,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1376.78943 ± 959.737
2025-05-13 13:45:35,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [32.583992, 2417.3477, 875.5102, 546.75354, 22.26203, 1303.7463, 2691.236, 1565.994, 2686.4941, 1625.9669]
2025-05-13 13:45:35,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [57.0, 836.0, 310.0, 254.0, 33.0, 496.0, 1000.0, 627.0, 1000.0, 617.0]
2025-05-13 13:45:35,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 48 minutes, 25 seconds)
2025-05-13 13:49:27,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:49:32,921 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1139.68530 ± 851.958
2025-05-13 13:49:32,921 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [465.82434, 652.76483, 2755.9978, 2228.6086, 1283.7567, 944.7462, 1972.1418, 414.3266, 15.354513, 663.3311]
2025-05-13 13:49:32,921 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [195.0, 266.0, 1000.0, 844.0, 491.0, 420.0, 735.0, 190.0, 21.0, 255.0]
2025-05-13 13:49:32,929 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 45 minutes, 32 seconds)
2025-05-13 13:53:26,532 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:53:33,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1430.80835 ± 894.646
2025-05-13 13:53:33,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [311.93478, 2021.0074, 2655.87, 1600.5562, 2157.2986, 2629.1406, 1145.217, 255.0868, 1296.3007, 235.67221]
2025-05-13 13:53:33,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [143.0, 789.0, 1000.0, 543.0, 776.0, 1000.0, 435.0, 120.0, 494.0, 130.0]
2025-05-13 13:53:33,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 41 minutes, 48 seconds)
2025-05-13 13:57:33,138 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:57:40,089 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1337.37134 ± 1026.668
2025-05-13 13:57:40,089 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [58.744366, 1685.7965, 457.92462, 1408.1058, 120.90018, 2618.4841, 2478.5479, 43.247066, 2596.1611, 1905.802]
2025-05-13 13:57:40,090 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [62.0, 667.0, 208.0, 555.0, 91.0, 1000.0, 962.0, 61.0, 1000.0, 732.0]
2025-05-13 13:57:40,099 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 38 minutes, 57 seconds)
2025-05-13 14:01:21,884 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:01:29,115 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1432.83325 ± 992.611
2025-05-13 14:01:29,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1.6483173, 703.05566, 1748.9259, 2658.9465, 151.06837, 2002.8357, 2564.37, 2292.238, 1958.1235, 247.11868]
2025-05-13 14:01:29,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 247.0, 673.0, 1000.0, 100.0, 755.0, 1000.0, 871.0, 737.0, 131.0]
2025-05-13 14:01:29,123 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 33 minutes, 50 seconds)
2025-05-13 14:05:07,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:05:14,223 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1455.14954 ± 921.256
2025-05-13 14:05:14,223 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2306.2131, 1592.7958, 2787.342, 333.26364, 1230.2585, 262.7029, 1728.4658, 15.0026245, 2402.6218, 1892.8295]
2025-05-13 14:05:14,223 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [875.0, 594.0, 1000.0, 152.0, 473.0, 142.0, 661.0, 35.0, 858.0, 692.0]
2025-05-13 14:05:14,230 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 30 minutes, 20 seconds)
2025-05-13 14:08:55,588 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:09:02,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1278.85339 ± 799.175
2025-05-13 14:09:02,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1771.227, 1010.3572, 1285.0378, 415.7475, 1438.7338, 713.31757, 258.92535, 2595.421, 2632.9854, 666.78125]
2025-05-13 14:09:02,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [651.0, 354.0, 464.0, 243.0, 604.0, 258.0, 122.0, 1000.0, 1000.0, 274.0]
2025-05-13 14:09:02,029 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 25 minutes, 44 seconds)
2025-05-13 14:12:56,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:13:04,641 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1778.31091 ± 864.099
2025-05-13 14:13:04,642 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1612.942, 2730.023, 2711.0159, 695.6214, 2192.9104, 786.2488, 2692.9575, 672.54333, 1048.403, 2640.4436]
2025-05-13 14:13:04,642 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [604.0, 1000.0, 1000.0, 269.0, 808.0, 261.0, 1000.0, 264.0, 350.0, 1000.0]
2025-05-13 14:13:04,652 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 21 minutes, 57 seconds)
2025-05-13 14:16:54,765 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:17:01,373 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1283.06104 ± 732.798
2025-05-13 14:17:01,373 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [975.17566, 1898.4132, 1603.1101, 781.3601, 2607.3398, 968.51666, 183.44012, 806.80145, 742.71436, 2263.739]
2025-05-13 14:17:01,373 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [348.0, 706.0, 654.0, 307.0, 1000.0, 395.0, 132.0, 332.0, 302.0, 876.0]
2025-05-13 14:17:01,381 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 17 minutes, 25 seconds)
2025-05-13 14:20:45,539 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:20:51,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1225.78613 ± 819.990
2025-05-13 14:20:51,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2769.3555, 2645.4553, 1469.9469, 1045.1428, 532.5051, 1281.1746, 1041.1223, 246.9211, 554.7553, 671.4831]
2025-05-13 14:20:51,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 542.0, 351.0, 202.0, 478.0, 463.0, 139.0, 236.0, 254.0]
2025-05-13 14:20:51,670 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 13 minutes, 37 seconds)
2025-05-13 14:24:34,692 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:24:42,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1641.19177 ± 901.541
2025-05-13 14:24:42,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [285.1481, 875.47925, 1535.9763, 2790.121, 2769.3608, 1559.8992, 2767.6443, 423.55872, 2110.4714, 1294.2587]
2025-05-13 14:24:42,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [148.0, 325.0, 558.0, 1000.0, 984.0, 549.0, 1000.0, 154.0, 773.0, 488.0]
2025-05-13 14:24:42,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 10 minutes, 6 seconds)
2025-05-13 14:28:35,791 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:28:42,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1339.46423 ± 980.996
2025-05-13 14:28:42,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [329.34964, 2698.5261, 2698.982, 531.33276, 1693.9618, 2430.6636, 1705.7012, 444.29376, 11.383375, 850.4483]
2025-05-13 14:28:42,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [158.0, 1000.0, 1000.0, 206.0, 640.0, 912.0, 654.0, 211.0, 27.0, 354.0]
2025-05-13 14:28:42,611 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 6 minutes, 53 seconds)
2025-05-13 14:32:25,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:32:32,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1410.76355 ± 1029.699
2025-05-13 14:32:32,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [682.20496, 2680.5393, 2616.599, 328.43237, -2.2943864, 1128.3656, 552.5391, 2633.9285, 2550.9392, 936.3815]
2025-05-13 14:32:32,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [320.0, 1000.0, 1000.0, 143.0, 216.0, 444.0, 204.0, 1000.0, 1000.0, 387.0]
2025-05-13 14:32:32,902 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 2 minutes, 18 seconds)
2025-05-13 14:36:20,804 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:36:27,023 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1256.83765 ± 1006.137
2025-05-13 14:36:27,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2704.164, 91.126335, 1230.0985, 1751.0925, 33.365044, 2328.1182, 814.00934, 248.14056, 635.03046, 2733.2317]
2025-05-13 14:36:27,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 95.0, 482.0, 625.0, 44.0, 842.0, 288.0, 115.0, 237.0, 1000.0]
2025-05-13 14:36:27,032 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 58 minutes, 16 seconds)
2025-05-13 14:40:15,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:40:25,303 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1923.27673 ± 838.620
2025-05-13 14:40:25,303 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2744.2742, 2715.1172, 2342.209, 1174.3041, 792.45874, 2086.8699, 2564.3792, 246.0737, 1962.4401, 2604.6418]
2025-05-13 14:40:25,303 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 982.0, 830.0, 437.0, 283.0, 803.0, 1000.0, 111.0, 730.0, 1000.0]
2025-05-13 14:40:25,303 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1923.28) for latency ExtremeClogL1U23
2025-05-13 14:40:25,314 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 54 minutes, 46 seconds)
2025-05-13 14:44:12,220 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:44:20,957 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1772.21741 ± 1156.191
2025-05-13 14:44:20,957 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2591.953, 2653.5051, 2647.5764, 2778.7168, 2645.1877, 21.16934, 2817.3108, -12.648029, 562.8046, 1016.5978]
2025-05-13 14:44:20,957 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 930.0, 1000.0, 1000.0, 1000.0, 34.0, 1000.0, 14.0, 223.0, 386.0]
2025-05-13 14:44:20,979 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 51 minutes, 3 seconds)
2025-05-13 14:48:19,463 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:48:25,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1357.63013 ± 998.130
2025-05-13 14:48:25,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [963.30756, 1529.9424, 1854.429, 193.10588, 2622.5928, 658.89923, 2689.6956, 2648.8535, 208.37311, 207.10237]
2025-05-13 14:48:25,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [346.0, 536.0, 619.0, 97.0, 1000.0, 241.0, 1000.0, 1000.0, 99.0, 109.0]
2025-05-13 14:48:25,981 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 47 minutes, 20 seconds)
2025-05-13 14:52:12,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:52:15,585 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 603.47278 ± 625.426
2025-05-13 14:52:15,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [398.68683, 764.1454, 21.385181, 144.18636, 2145.8123, 265.7071, 260.95462, 1307.2692, 172.06325, 554.5172]
2025-05-13 14:52:15,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [186.0, 313.0, 32.0, 99.0, 804.0, 115.0, 114.0, 506.0, 111.0, 215.0]
2025-05-13 14:52:15,594 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 43 minutes, 21 seconds)
2025-05-13 14:56:01,154 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:56:09,103 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1585.41687 ± 1032.945
2025-05-13 14:56:09,103 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2696.3606, 298.90558, 314.70593, 2095.4714, 912.10754, 2637.6658, 974.96014, 443.26913, 2745.6208, 2735.1013]
2025-05-13 14:56:09,103 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 173.0, 162.0, 798.0, 348.0, 1000.0, 412.0, 216.0, 1000.0, 1000.0]
2025-05-13 14:56:09,111 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 39 minutes, 24 seconds)
2025-05-13 15:00:07,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:00:16,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1741.75903 ± 838.782
2025-05-13 15:00:16,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1907.7651, 2738.6182, 2628.8645, 2428.1497, 2287.6064, 1013.9502, 681.1136, 397.47998, 2368.9082, 965.13257]
2025-05-13 15:00:16,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [706.0, 1000.0, 1000.0, 895.0, 813.0, 387.0, 239.0, 165.0, 894.0, 408.0]
2025-05-13 15:00:16,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 35 minutes, 43 seconds)
2025-05-13 15:03:43,651 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:03:51,383 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1628.61304 ± 916.031
2025-05-13 15:03:51,383 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [179.80269, 1742.7916, 1815.4861, 530.1959, 701.03625, 2045.2458, 1072.8975, 2627.6284, 2885.2148, 2685.8325]
2025-05-13 15:03:51,383 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [137.0, 666.0, 645.0, 206.0, 250.0, 683.0, 398.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:03:51,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 31 minutes, 12 seconds)
2025-05-13 15:07:44,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:07:52,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1499.80933 ± 943.236
2025-05-13 15:07:52,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [625.4002, 2751.27, 775.69794, 1959.2527, 1565.0504, 2538.7756, 2868.1826, 740.255, 1149.078, 25.130869]
2025-05-13 15:07:52,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [249.0, 1000.0, 286.0, 675.0, 588.0, 1000.0, 1000.0, 258.0, 429.0, 48.0]
2025-05-13 15:07:52,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 27 minutes, 12 seconds)
2025-05-13 15:11:43,670 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:11:51,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1718.82593 ± 837.207
2025-05-13 15:11:51,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1865.5789, 464.64795, 832.8608, 1640.6716, 2392.9756, 362.91122, 2192.7449, 2739.6294, 1979.3602, 2716.8794]
2025-05-13 15:11:51,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [665.0, 186.0, 360.0, 618.0, 849.0, 171.0, 806.0, 1000.0, 688.0, 986.0]
2025-05-13 15:11:51,961 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 23 minutes, 31 seconds)
2025-05-13 15:15:36,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:15:42,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1323.71875 ± 839.986
2025-05-13 15:15:42,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1278.223, 2441.0063, 1132.1467, 787.135, 2278.995, 2687.3035, 1332.7814, 451.44446, 829.7693, 18.382576]
2025-05-13 15:15:42,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [448.0, 912.0, 394.0, 316.0, 811.0, 1000.0, 491.0, 178.0, 280.0, 28.0]
2025-05-13 15:15:42,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 19 minutes, 33 seconds)
2025-05-13 15:19:33,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:19:40,084 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1369.28601 ± 1077.578
2025-05-13 15:19:40,084 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [484.1128, 23.661844, 130.27849, 918.1709, 2672.739, 475.0927, 1121.6179, 2565.7952, 2610.0422, 2691.3496]
2025-05-13 15:19:40,084 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [217.0, 37.0, 102.0, 337.0, 1000.0, 270.0, 404.0, 935.0, 1000.0, 1000.0]
2025-05-13 15:19:40,092 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 15 minutes, 31 seconds)
2025-05-13 15:23:24,129 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:23:31,066 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1390.65967 ± 1101.412
2025-05-13 15:23:31,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [568.6867, 535.10144, 411.82358, 2774.6433, 1061.9333, 327.44794, 2718.6326, 2719.2734, 2636.038, 153.01527]
2025-05-13 15:23:31,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [206.0, 199.0, 209.0, 1000.0, 398.0, 184.0, 1000.0, 1000.0, 1000.0, 88.0]
2025-05-13 15:23:31,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 11 minutes, 47 seconds)
2025-05-13 15:27:12,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:27:18,793 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1212.37708 ± 896.344
2025-05-13 15:27:18,793 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [508.63474, 141.27144, 1568.0312, 925.86096, 1499.6566, 2769.5132, 267.65182, 654.52155, 1016.1794, 2772.4495]
2025-05-13 15:27:18,793 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [194.0, 167.0, 574.0, 409.0, 520.0, 1000.0, 117.0, 243.0, 381.0, 1000.0]
2025-05-13 15:27:18,804 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 7 minutes, 46 seconds)
2025-05-13 15:31:19,701 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:31:25,975 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1319.59937 ± 953.774
2025-05-13 15:31:25,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2761.6875, 1066.3943, 345.32684, 1775.0259, 557.17584, 1263.8949, 454.14575, 2772.0107, 2162.5422, 37.7894]
2025-05-13 15:31:25,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [970.0, 391.0, 183.0, 631.0, 208.0, 453.0, 171.0, 1000.0, 801.0, 65.0]
2025-05-13 15:31:25,987 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 54 seconds)
2025-05-13 15:35:07,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:35:13,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1367.03247 ± 824.633
2025-05-13 15:35:13,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1469.1364, 1660.71, 2027.0674, 177.74326, 414.0726, 1531.82, 2304.4146, 2760.19, 892.9363, 432.2337]
2025-05-13 15:35:13,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [476.0, 659.0, 747.0, 151.0, 174.0, 587.0, 827.0, 1000.0, 397.0, 177.0]
2025-05-13 15:35:13,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1251 [DEBUG]: Training session finished
