2025-05-13 09:06:29,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mda-mem4
2025-05-13 09:06:29,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mda-mem4
2025-05-13 09:06:29,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x154a9fc31290>}
2025-05-13 09:06:29,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:29,058 baseline-bpql-mda-noisy-walker2d:91 [WARNING]: args.assumed_delay != args.horizon: 4 != 24
2025-05-13 09:06:29,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1133 [INFO]: Creating new trainer
2025-05-13 09:06:29,082 baseline-bpql-mda-noisy-walker2d:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-13 09:06:29,082 baseline-bpql-mda-noisy-walker2d:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:29,087 baseline-bpql-mda-noisy-walker2d:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(6, 384, batch_first=True)
)
2025-05-13 09:06:29,936 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:29,936 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 1/100
2025-05-13 09:09:55,960 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:09:57,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 72.95695 ± 60.887
2025-05-13 09:09:57,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [151.78035, 148.253, 136.35258, 81.48725, 24.745066, 62.974293, 110.35347, -32.936253, -0.4466666, 47.00646]
2025-05-13 09:09:57,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [146.0, 203.0, 202.0, 135.0, 79.0, 132.0, 200.0, 88.0, 32.0, 134.0]
2025-05-13 09:09:57,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (72.96) for latency MM1Queue_a033_s075
2025-05-13 09:09:57,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 42 minutes, 51 seconds)
2025-05-13 09:13:35,386 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:13:38,776 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 227.33057 ± 292.589
2025-05-13 09:13:38,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [49.008118, 7.4771743, 34.04543, 2.2127097, 404.28278, 365.49783, 984.8402, 308.96613, 60.54305, 56.432354]
2025-05-13 09:13:38,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [200.0, 17.0, 55.0, 13.0, 517.0, 289.0, 1000.0, 158.0, 156.0, 76.0]
2025-05-13 09:13:38,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (227.33) for latency MM1Queue_a033_s075
2025-05-13 09:13:38,781 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 50 minutes, 13 seconds)
2025-05-13 09:17:21,522 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:17:23,601 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 166.29703 ± 103.008
2025-05-13 09:17:23,601 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [0.66208446, 252.19534, 65.0865, 313.67734, 188.89793, 263.09058, 171.37703, 71.2352, 269.31427, 67.4338]
2025-05-13 09:17:23,601 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [12.0, 152.0, 307.0, 193.0, 132.0, 146.0, 101.0, 182.0, 156.0, 205.0]
2025-05-13 09:17:23,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 52 minutes, 15 seconds)
2025-05-13 09:20:57,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:20:59,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 119.06698 ± 87.452
2025-05-13 09:20:59,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [156.01616, 46.295444, 2.8496091, 149.98375, 115.19258, 43.416145, 220.29852, 197.7429, 1.055605, 257.81906]
2025-05-13 09:20:59,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [336.0, 219.0, 67.0, 88.0, 338.0, 282.0, 125.0, 296.0, 11.0, 138.0]
2025-05-13 09:20:59,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 48 minutes, 1 second)
2025-05-13 09:24:39,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:24:42,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 312.82556 ± 37.901
2025-05-13 09:24:42,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [343.4839, 333.02582, 248.23697, 352.63214, 272.71985, 350.86017, 288.99902, 359.71912, 302.15598, 276.4227]
2025-05-13 09:24:42,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [259.0, 262.0, 179.0, 245.0, 192.0, 276.0, 172.0, 239.0, 164.0, 195.0]
2025-05-13 09:24:42,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (312.83) for latency MM1Queue_a033_s075
2025-05-13 09:24:42,427 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 45 minutes, 57 seconds)
2025-05-13 09:28:20,039 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:28:23,793 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 463.49869 ± 240.264
2025-05-13 09:28:23,793 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [264.4753, 440.68085, 287.9888, 726.97095, 837.90546, 483.6558, 740.1426, 454.72366, -0.8200323, 399.26318]
2025-05-13 09:28:23,793 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [219.0, 255.0, 210.0, 445.0, 400.0, 283.0, 411.0, 375.0, 10.0, 250.0]
2025-05-13 09:28:23,793 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (463.50) for latency MM1Queue_a033_s075
2025-05-13 09:28:23,801 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 46 minutes, 34 seconds)
2025-05-13 09:32:01,253 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:32:04,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 435.17889 ± 35.807
2025-05-13 09:32:04,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [372.37198, 410.5949, 467.48404, 464.62454, 493.6877, 421.8632, 433.61786, 407.9427, 471.2209, 408.381]
2025-05-13 09:32:04,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [172.0, 225.0, 230.0, 255.0, 264.0, 232.0, 213.0, 187.0, 235.0, 223.0]
2025-05-13 09:32:04,111 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 42 minutes, 39 seconds)
2025-05-13 09:35:31,609 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:35:33,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 304.92084 ± 138.874
2025-05-13 09:35:33,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [210.43886, 535.61993, 516.3007, 187.4442, 163.89395, 248.14749, 401.04532, 110.73173, 336.2015, 339.38513]
2025-05-13 09:35:33,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [144.0, 254.0, 280.0, 134.0, 130.0, 135.0, 200.0, 121.0, 197.0, 222.0]
2025-05-13 09:35:33,965 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 34 minutes, 22 seconds)
2025-05-13 09:39:17,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:39:19,830 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 234.76782 ± 234.899
2025-05-13 09:39:19,830 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [426.14798, 100.50544, 565.1222, 42.81918, 29.26155, 551.5997, 530.52344, 35.685883, 30.952553, 35.060402]
2025-05-13 09:39:19,830 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [270.0, 134.0, 350.0, 72.0, 56.0, 308.0, 308.0, 55.0, 58.0, 63.0]
2025-05-13 09:39:19,837 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 33 minutes, 37 seconds)
2025-05-13 09:43:04,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:43:08,356 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 456.78864 ± 263.443
2025-05-13 09:43:08,356 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [522.8379, 682.5806, 75.79223, 42.42661, 399.5432, 423.47162, 496.48578, 576.6362, 995.8368, 352.27478]
2025-05-13 09:43:08,356 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [291.0, 425.0, 87.0, 68.0, 211.0, 223.0, 259.0, 351.0, 1000.0, 209.0]
2025-05-13 09:43:08,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 31 minutes, 46 seconds)
2025-05-13 09:46:46,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:46:50,425 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 547.46082 ± 278.060
2025-05-13 09:46:50,425 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [397.76022, 805.4994, 395.0798, 300.6915, 299.04013, 258.1167, 768.7338, 322.69208, 1030.4883, 896.50635]
2025-05-13 09:46:50,425 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [201.0, 456.0, 216.0, 163.0, 174.0, 137.0, 400.0, 183.0, 520.0, 519.0]
2025-05-13 09:46:50,425 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (547.46) for latency MM1Queue_a033_s075
2025-05-13 09:46:50,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 28 minutes, 18 seconds)
2025-05-13 09:50:28,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:50:31,710 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 432.55084 ± 54.203
2025-05-13 09:50:31,710 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [437.6754, 490.91425, 509.3478, 370.2332, 351.00446, 492.7487, 403.27237, 465.72833, 369.86755, 434.71613]
2025-05-13 09:50:31,710 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [267.0, 290.0, 273.0, 199.0, 172.0, 244.0, 207.0, 245.0, 194.0, 227.0]
2025-05-13 09:50:31,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 24 minutes, 53 seconds)
2025-05-13 09:54:12,768 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:54:16,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 540.43719 ± 197.418
2025-05-13 09:54:16,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [781.38586, 449.26834, 842.14545, 735.886, 500.36975, 503.74435, 306.69775, 607.1801, 179.43156, 498.26285]
2025-05-13 09:54:16,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [325.0, 219.0, 381.0, 354.0, 279.0, 257.0, 182.0, 289.0, 155.0, 242.0]
2025-05-13 09:54:16,257 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 25 minutes, 27 seconds)
2025-05-13 09:57:56,543 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:58:00,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 639.20544 ± 247.974
2025-05-13 09:58:00,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [812.83264, 363.29388, 692.68524, 512.55426, 336.9262, 1049.5454, 779.5766, 430.062, 987.7098, 426.86765]
2025-05-13 09:58:00,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [399.0, 213.0, 327.0, 324.0, 189.0, 478.0, 332.0, 222.0, 486.0, 227.0]
2025-05-13 09:58:00,704 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (639.21) for latency MM1Queue_a033_s075
2025-05-13 09:58:00,712 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 21 minutes, 19 seconds)
2025-05-13 10:01:44,134 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:01:47,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 443.29401 ± 142.653
2025-05-13 10:01:47,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [589.87024, 486.07846, 609.7477, 280.0455, 289.51416, 346.13995, 688.8022, 438.69275, 440.85834, 263.19113]
2025-05-13 10:01:47,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [307.0, 247.0, 299.0, 154.0, 164.0, 180.0, 325.0, 240.0, 230.0, 145.0]
2025-05-13 10:01:47,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 16 minutes, 57 seconds)
2025-05-13 10:05:29,947 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:05:34,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 737.63794 ± 539.400
2025-05-13 10:05:34,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2225.8816, 757.409, 489.9566, 1048.5708, 421.25586, 441.86725, 371.162, 333.60434, 514.24585, 772.42645]
2025-05-13 10:05:34,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 356.0, 277.0, 443.0, 274.0, 255.0, 224.0, 229.0, 262.0, 388.0]
2025-05-13 10:05:34,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (737.64) for latency MM1Queue_a033_s075
2025-05-13 10:05:34,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 14 minutes, 50 seconds)
2025-05-13 10:09:20,941 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:09:32,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2193.57178 ± 539.829
2025-05-13 10:09:32,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2614.8926, 2657.0405, 1596.6963, 1804.7595, 994.2052, 2689.6035, 2236.8616, 2147.2336, 2596.7837, 2597.642]
2025-05-13 10:09:32,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 677.0, 812.0, 393.0, 1000.0, 823.0, 800.0, 1000.0, 1000.0]
2025-05-13 10:09:32,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (2193.57) for latency MM1Queue_a033_s075
2025-05-13 10:09:32,348 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 15 minutes, 34 seconds)
2025-05-13 10:13:09,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:13:20,726 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2092.27905 ± 513.555
2025-05-13 10:13:20,726 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2546.266, 1833.0155, 2355.882, 2229.1235, 2057.771, 2242.1538, 1557.9116, 924.13696, 2353.6519, 2822.8784]
2025-05-13 10:13:20,727 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 763.0, 960.0, 760.0, 839.0, 878.0, 646.0, 444.0, 1000.0, 1000.0]
2025-05-13 10:13:20,734 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 12 minutes, 49 seconds)
2025-05-13 10:16:56,792 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:17:02,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 998.57129 ± 674.781
2025-05-13 10:17:02,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1104.4581, 421.26056, 128.27605, 2485.6453, 1735.5327, 173.58516, 994.5063, 1078.1344, 908.6403, 955.67413]
2025-05-13 10:17:02,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [434.0, 185.0, 120.0, 1000.0, 713.0, 98.0, 455.0, 420.0, 353.0, 377.0]
2025-05-13 10:17:02,264 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 8 minutes, 13 seconds)
2025-05-13 10:20:44,246 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:20:51,394 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1232.04529 ± 429.511
2025-05-13 10:20:51,394 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [584.14233, 761.6564, 933.6977, 1529.9548, 1756.2886, 1817.001, 884.73444, 1755.7317, 1138.6136, 1158.6327]
2025-05-13 10:20:51,394 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [290.0, 345.0, 387.0, 650.0, 735.0, 1000.0, 413.0, 679.0, 446.0, 474.0]
2025-05-13 10:20:51,402 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 5 minutes, 10 seconds)
2025-05-13 10:24:36,643 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:24:41,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1002.16602 ± 111.979
2025-05-13 10:24:41,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1059.5492, 760.99396, 972.9651, 932.20966, 1055.881, 1047.289, 957.4362, 1209.8864, 949.88684, 1075.5634]
2025-05-13 10:24:41,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [436.0, 285.0, 360.0, 412.0, 403.0, 426.0, 351.0, 442.0, 333.0, 386.0]
2025-05-13 10:24:41,576 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 1 minute, 58 seconds)
2025-05-13 10:28:20,939 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:28:25,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1003.44189 ± 296.819
2025-05-13 10:28:25,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [854.28394, 1145.4114, 1006.7948, 867.969, 773.53235, 1279.9851, 1358.1738, 768.9572, 484.82214, 1494.4893]
2025-05-13 10:28:25,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [304.0, 388.0, 379.0, 295.0, 274.0, 439.0, 462.0, 289.0, 190.0, 550.0]
2025-05-13 10:28:25,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 54 minutes, 38 seconds)
2025-05-13 10:32:04,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:32:10,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1059.17017 ± 310.899
2025-05-13 10:32:10,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1128.338, 878.24664, 871.1947, 1576.6254, 807.2495, 1266.5922, 592.2352, 1118.805, 808.4436, 1543.9716]
2025-05-13 10:32:10,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [422.0, 343.0, 291.0, 528.0, 348.0, 569.0, 241.0, 403.0, 312.0, 532.0]
2025-05-13 10:32:10,089 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 49 minutes, 52 seconds)
2025-05-13 10:35:56,186 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:36:01,344 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1025.58337 ± 468.078
2025-05-13 10:36:01,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1750.4105, 645.9772, 771.5529, 792.5694, 1491.9044, 196.4552, 1247.0287, 984.56696, 743.56226, 1631.8062]
2025-05-13 10:36:01,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [635.0, 264.0, 287.0, 349.0, 518.0, 120.0, 491.0, 371.0, 345.0, 587.0]
2025-05-13 10:36:01,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 48 minutes, 34 seconds)
2025-05-13 10:39:40,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:39:46,002 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1091.46814 ± 347.978
2025-05-13 10:39:46,002 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1050.0109, 766.0441, 1726.9753, 1482.2352, 1038.553, 837.6942, 666.71936, 1553.082, 825.5517, 967.8155]
2025-05-13 10:39:46,002 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [468.0, 287.0, 489.0, 575.0, 429.0, 406.0, 241.0, 461.0, 319.0, 417.0]
2025-05-13 10:39:46,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 43 minutes, 39 seconds)
2025-05-13 10:43:30,932 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:43:38,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1516.54614 ± 577.504
2025-05-13 10:43:38,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1569.0106, 1237.1536, 2584.9373, 1423.5146, 981.01965, 2481.5325, 1567.7109, 1491.1527, 1196.2557, 633.17334]
2025-05-13 10:43:38,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [531.0, 415.0, 1000.0, 519.0, 412.0, 844.0, 584.0, 489.0, 412.0, 291.0]
2025-05-13 10:43:38,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 40 minutes, 23 seconds)
2025-05-13 10:47:13,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:47:19,938 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1342.47534 ± 644.580
2025-05-13 10:47:19,938 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2395.1426, 1205.532, 921.91156, 2469.3174, 1187.2283, 1596.8684, 857.42554, 849.75494, 363.27124, 1578.3015]
2025-05-13 10:47:19,938 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [698.0, 387.0, 321.0, 837.0, 425.0, 516.0, 340.0, 290.0, 181.0, 577.0]
2025-05-13 10:47:19,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 36 minutes, 1 second)
2025-05-13 10:51:02,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:51:12,113 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2056.33911 ± 812.611
2025-05-13 10:51:12,113 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2174.369, 3019.0776, 1218.3163, 2937.4941, 2597.778, 364.3015, 2661.8433, 1326.9387, 2374.238, 1889.035]
2025-05-13 10:51:12,113 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [817.0, 912.0, 454.0, 972.0, 1000.0, 175.0, 860.0, 560.0, 749.0, 563.0]
2025-05-13 10:51:12,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 34 minutes, 5 seconds)
2025-05-13 10:54:54,010 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:55:03,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1799.36365 ± 794.640
2025-05-13 10:55:03,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1152.7604, 2590.3008, 268.37967, 2024.9497, 2463.0022, 1045.0101, 2933.587, 2250.4668, 2033.2129, 1231.9674]
2025-05-13 10:55:03,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [601.0, 1000.0, 291.0, 674.0, 868.0, 433.0, 1000.0, 832.0, 622.0, 638.0]
2025-05-13 10:55:03,444 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 30 minutes, 17 seconds)
2025-05-13 10:58:45,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:58:53,675 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1868.53967 ± 916.760
2025-05-13 10:58:53,675 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1433.7726, 3456.8428, 1907.0365, 1189.8582, 2239.2485, 973.6566, 828.27094, 3536.3118, 1214.7572, 1905.6398]
2025-05-13 10:58:53,675 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [589.0, 993.0, 631.0, 507.0, 687.0, 314.0, 315.0, 996.0, 457.0, 607.0]
2025-05-13 10:58:53,684 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 27 minutes, 47 seconds)
2025-05-13 11:02:36,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:02:42,387 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1386.74683 ± 471.272
2025-05-13 11:02:42,387 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1533.5691, 907.6912, 2555.4116, 1549.7023, 1133.5681, 1028.9235, 1164.0021, 1081.3623, 1797.8041, 1115.4348]
2025-05-13 11:02:42,387 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [499.0, 357.0, 794.0, 483.0, 393.0, 354.0, 397.0, 367.0, 541.0, 358.0]
2025-05-13 11:02:42,396 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 23 minutes, 8 seconds)
2025-05-13 11:06:30,103 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:06:35,647 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1231.35181 ± 855.571
2025-05-13 11:06:35,647 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1128.0372, 1876.4731, 13.692233, 916.13293, 897.9684, 1064.9902, 1838.1471, 564.24915, 3263.11, 750.7168]
2025-05-13 11:06:35,647 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [418.0, 554.0, 30.0, 288.0, 311.0, 387.0, 625.0, 246.0, 1000.0, 297.0]
2025-05-13 11:06:35,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 21 minutes, 57 seconds)
2025-05-13 11:10:09,949 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:10:16,906 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1723.50977 ± 919.282
2025-05-13 11:10:16,906 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2916.1484, 890.1522, 2256.8, 3090.1626, 674.421, 211.73997, 1507.4664, 2558.1777, 1532.1273, 1597.9027]
2025-05-13 11:10:16,906 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [869.0, 310.0, 637.0, 884.0, 253.0, 135.0, 449.0, 723.0, 444.0, 516.0]
2025-05-13 11:10:16,913 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 15 minutes, 40 seconds)
2025-05-13 11:14:14,597 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:14:19,514 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1164.72070 ± 1024.399
2025-05-13 11:14:19,514 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1241.1427, 176.77031, 913.42346, 3599.2236, 340.51904, 1846.1521, 1726.3962, 206.86957, 136.15192, 1460.5571]
2025-05-13 11:14:19,514 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [402.0, 93.0, 312.0, 938.0, 174.0, 592.0, 532.0, 160.0, 131.0, 410.0]
2025-05-13 11:14:19,521 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 14 minutes, 20 seconds)
2025-05-13 11:17:51,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:18:01,418 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2472.25732 ± 1002.291
2025-05-13 11:18:01,418 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1513.3749, 3689.7378, 2084.0002, 1164.2963, 1071.8048, 3501.152, 3182.2224, 1740.2539, 3235.8723, 3539.8557]
2025-05-13 11:18:01,418 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [484.0, 1000.0, 581.0, 386.0, 355.0, 993.0, 1000.0, 556.0, 1000.0, 1000.0]
2025-05-13 11:18:01,418 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (2472.26) for latency MM1Queue_a033_s075
2025-05-13 11:18:01,425 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 8 minutes, 40 seconds)
2025-05-13 11:21:44,279 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:21:51,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1626.17346 ± 1084.507
2025-05-13 11:21:51,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [724.94354, 1256.8353, 190.23796, 1202.4032, 2036.5822, 1377.7863, 2016.5239, 492.1412, 3564.806, 3399.4753]
2025-05-13 11:21:51,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [225.0, 393.0, 137.0, 333.0, 596.0, 455.0, 798.0, 246.0, 971.0, 1000.0]
2025-05-13 11:21:51,231 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 5 minutes, 5 seconds)
2025-05-13 11:25:32,111 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:25:41,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2288.18481 ± 695.823
2025-05-13 11:25:41,611 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2920.1724, 3207.847, 1979.7166, 1734.8477, 2151.509, 2212.5105, 1457.688, 1350.7158, 2313.123, 3553.7173]
2025-05-13 11:25:41,611 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 972.0, 565.0, 483.0, 661.0, 682.0, 445.0, 444.0, 717.0, 1000.0]
2025-05-13 11:25:41,619 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 39 seconds)
2025-05-13 11:29:23,124 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:29:32,044 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2182.47803 ± 837.925
2025-05-13 11:29:32,044 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3398.4307, 2279.4604, 908.18225, 1699.3956, 1851.8461, 2018.7388, 878.5928, 3256.2336, 2809.4954, 2724.404]
2025-05-13 11:29:32,044 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [991.0, 707.0, 337.0, 527.0, 567.0, 639.0, 328.0, 1000.0, 802.0, 717.0]
2025-05-13 11:29:32,052 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 58 minutes, 43 seconds)
2025-05-13 11:33:20,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:33:27,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1689.41895 ± 576.910
2025-05-13 11:33:27,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1513.1768, 1110.26, 1091.2008, 2713.0977, 2200.375, 1027.116, 2212.2234, 2124.0806, 1076.7421, 1825.917]
2025-05-13 11:33:27,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [496.0, 361.0, 351.0, 692.0, 606.0, 334.0, 575.0, 658.0, 359.0, 535.0]
2025-05-13 11:33:27,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 53 minutes, 21 seconds)
2025-05-13 11:37:12,311 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:37:20,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2124.89429 ± 1400.629
2025-05-13 11:37:20,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3384.648, 2293.4768, 161.90729, 1775.2172, 3845.7322, 2973.0415, 679.5856, 5.260786, 4064.1553, 2065.9202]
2025-05-13 11:37:20,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 624.0, 152.0, 582.0, 1000.0, 846.0, 249.0, 18.0, 1000.0, 637.0]
2025-05-13 11:37:20,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 51 minutes, 50 seconds)
2025-05-13 11:41:01,802 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:41:10,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2177.39160 ± 1059.944
2025-05-13 11:41:10,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3776.418, 1105.6056, 3493.2092, 2269.3591, 2078.4714, 1121.6132, 3710.4282, 1737.1863, 812.6916, 1668.9331]
2025-05-13 11:41:10,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 367.0, 1000.0, 638.0, 532.0, 359.0, 1000.0, 480.0, 294.0, 494.0]
2025-05-13 11:41:10,099 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 47 minutes, 54 seconds)
2025-05-13 11:44:45,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:44:52,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1901.50977 ± 952.476
2025-05-13 11:44:52,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3351.2495, 1102.6782, 1020.35846, 3888.3926, 815.2977, 1340.2814, 2103.9114, 1709.7094, 1722.5898, 1960.6284]
2025-05-13 11:44:52,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [812.0, 342.0, 337.0, 1000.0, 303.0, 403.0, 527.0, 480.0, 527.0, 528.0]
2025-05-13 11:44:52,405 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 42 minutes, 29 seconds)
2025-05-13 11:48:34,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:48:41,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2032.80737 ± 769.894
2025-05-13 11:48:41,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2158.0176, 3252.5503, 1424.1195, 2407.2817, 1628.881, 1903.3187, 3399.952, 1000.4528, 1139.4557, 2014.0458]
2025-05-13 11:48:41,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [543.0, 827.0, 418.0, 623.0, 487.0, 551.0, 1000.0, 335.0, 346.0, 558.0]
2025-05-13 11:48:41,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 38 minutes, 25 seconds)
2025-05-13 11:52:20,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:52:30,847 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3233.55908 ± 1408.357
2025-05-13 11:52:30,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4426.695, 4381.7573, 2484.435, 1083.368, 3678.8914, 579.5477, 4469.013, 4409.48, 2490.902, 4331.5005]
2025-05-13 11:52:30,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 593.0, 393.0, 1000.0, 241.0, 972.0, 1000.0, 629.0, 1000.0]
2025-05-13 11:52:30,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (3233.56) for latency MM1Queue_a033_s075
2025-05-13 11:52:30,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 33 minutes, 29 seconds)
2025-05-13 11:55:57,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:56:07,309 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2810.00366 ± 1466.722
2025-05-13 11:56:07,309 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4134.084, 3190.8013, 527.3219, 3531.841, 587.32385, 2999.8, 4185.405, 4038.3835, 832.30493, 4072.7722]
2025-05-13 11:56:07,309 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 817.0, 205.0, 817.0, 223.0, 769.0, 1000.0, 947.0, 294.0, 1000.0]
2025-05-13 11:56:07,320 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 26 minutes, 33 seconds)
2025-05-13 11:59:51,563 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:59:59,206 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2366.64990 ± 1283.449
2025-05-13 11:59:59,206 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2838.4565, 664.0299, 2430.8345, 2177.7874, 4521.473, 2511.288, 2121.2747, 4076.1548, 2338.3625, -13.163145]
2025-05-13 11:59:59,206 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [687.0, 275.0, 603.0, 514.0, 1000.0, 615.0, 542.0, 1000.0, 576.0, 16.0]
2025-05-13 11:59:59,212 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 23 minutes, 14 seconds)
2025-05-13 12:03:32,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:03:41,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2779.08667 ± 1351.995
2025-05-13 12:03:41,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4127.307, 2219.7512, 4085.5007, 3762.3188, 3212.3882, 204.07094, 3987.5408, 2480.4453, 551.5757, 3159.9663]
2025-05-13 12:03:41,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 615.0, 1000.0, 870.0, 781.0, 164.0, 909.0, 599.0, 209.0, 734.0]
2025-05-13 12:03:41,365 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 19 minutes, 26 seconds)
2025-05-13 12:07:32,543 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:07:44,103 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3613.54443 ± 1077.250
2025-05-13 12:07:44,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4157.62, 2110.4888, 4047.8562, 4417.1914, 4246.718, 4457.4272, 4318.79, 3247.8123, 1103.7245, 4027.817]
2025-05-13 12:07:44,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 543.0, 934.0, 1000.0, 1000.0, 1000.0, 1000.0, 755.0, 337.0, 1000.0]
2025-05-13 12:07:44,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (3613.54) for latency MM1Queue_a033_s075
2025-05-13 12:07:44,115 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 18 minutes, 1 second)
2025-05-13 12:11:10,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:11:19,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2826.22510 ± 990.520
2025-05-13 12:11:19,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3524.2278, 1317.1396, 3528.096, 3102.4023, 4570.3125, 2973.2092, 1026.6586, 2420.024, 2905.6409, 2894.5396]
2025-05-13 12:11:19,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [812.0, 401.0, 836.0, 777.0, 1000.0, 663.0, 358.0, 612.0, 682.0, 691.0]
2025-05-13 12:11:19,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 11 minutes, 48 seconds)
2025-05-13 12:14:49,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:15:00,634 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3504.03125 ± 1028.437
2025-05-13 12:15:00,634 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3488.6553, 3925.9966, 2578.097, 4601.1304, 1263.7664, 4537.7393, 3100.8806, 2773.421, 4402.4136, 4368.212]
2025-05-13 12:15:00,634 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [811.0, 888.0, 640.0, 1000.0, 353.0, 1000.0, 700.0, 651.0, 1000.0, 1000.0]
2025-05-13 12:15:00,643 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 8 minutes, 53 seconds)
2025-05-13 12:18:45,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:18:54,142 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2898.25244 ± 1342.911
2025-05-13 12:18:54,142 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2676.5286, 3106.1055, 5000.7686, 2945.488, 4693.584, 2406.5999, 4057.1218, 390.00037, 2196.1453, 1510.1838]
2025-05-13 12:18:54,142 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [610.0, 705.0, 1000.0, 658.0, 1000.0, 523.0, 860.0, 165.0, 510.0, 395.0]
2025-05-13 12:18:54,152 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 5 minutes, 22 seconds)
2025-05-13 12:22:26,357 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:22:37,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3410.88721 ± 990.229
2025-05-13 12:22:37,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3871.745, 2949.6204, 1574.0419, 4197.556, 1609.5812, 4529.577, 3625.93, 3690.638, 4033.4678, 4026.7156]
2025-05-13 12:22:37,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [833.0, 719.0, 406.0, 1000.0, 408.0, 994.0, 836.0, 761.0, 920.0, 1000.0]
2025-05-13 12:22:37,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 1 minute, 42 seconds)
2025-05-13 12:26:13,931 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:26:24,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3325.52979 ± 1220.838
2025-05-13 12:26:24,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1670.0491, 1518.1144, 2724.3599, 4253.219, 4737.776, 4336.702, 4446.0645, 2968.849, 2020.6216, 4579.543]
2025-05-13 12:26:24,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [453.0, 398.0, 654.0, 1000.0, 1000.0, 1000.0, 1000.0, 650.0, 497.0, 1000.0]
2025-05-13 12:26:24,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 55 minutes, 29 seconds)
2025-05-13 12:29:57,834 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:30:06,878 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2952.44092 ± 1445.865
2025-05-13 12:30:06,878 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4406.0728, 3284.619, 4568.8794, 1565.4214, 2338.916, 1108.5, 2509.218, 4373.7383, 4685.5894, 683.4567]
2025-05-13 12:30:06,878 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [950.0, 711.0, 1000.0, 424.0, 569.0, 340.0, 586.0, 929.0, 1000.0, 226.0]
2025-05-13 12:30:06,888 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 52 minutes, 55 seconds)
2025-05-13 12:33:56,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:34:06,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3360.79565 ± 1091.449
2025-05-13 12:34:06,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4451.329, 1301.9579, 3492.9766, 2359.366, 4193.6646, 3202.0918, 2772.411, 4813.185, 2451.03, 4569.947]
2025-05-13 12:34:06,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 368.0, 814.0, 548.0, 930.0, 691.0, 640.0, 1000.0, 591.0, 1000.0]
2025-05-13 12:34:06,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 51 minutes, 52 seconds)
2025-05-13 12:37:25,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:37:38,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3873.57349 ± 882.164
2025-05-13 12:37:38,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4495.045, 4147.97, 4585.3657, 4341.371, 4357.4185, 4444.0547, 3490.4988, 2131.178, 4455.06, 2287.7751]
2025-05-13 12:37:38,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 956.0, 1000.0, 1000.0, 1000.0, 1000.0, 901.0, 615.0, 1000.0, 608.0]
2025-05-13 12:37:38,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (3873.57) for latency MM1Queue_a033_s075
2025-05-13 12:37:38,120 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 44 minutes, 50 seconds)
2025-05-13 12:41:15,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:41:25,767 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3587.39893 ± 1136.264
2025-05-13 12:41:25,768 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3454.847, 4935.7397, 1955.3706, 2200.8318, 4827.777, 4202.8306, 2103.0662, 4094.5908, 3131.7285, 4967.2046]
2025-05-13 12:41:25,768 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [740.0, 1000.0, 468.0, 504.0, 1000.0, 937.0, 497.0, 882.0, 668.0, 979.0]
2025-05-13 12:41:25,776 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 41 minutes, 46 seconds)
2025-05-13 12:45:13,660 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:45:24,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3663.03662 ± 1109.371
2025-05-13 12:45:24,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2355.8179, 3186.373, 2710.6243, 4447.332, 4748.552, 4315.432, 4377.179, 4466.7515, 1392.6674, 4629.637]
2025-05-13 12:45:24,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [549.0, 737.0, 676.0, 1000.0, 1000.0, 907.0, 939.0, 1000.0, 360.0, 1000.0]
2025-05-13 12:45:24,568 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 39 minutes, 38 seconds)
2025-05-13 12:49:08,927 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:49:22,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4594.67480 ± 155.976
2025-05-13 12:49:22,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4520.798, 4779.7686, 4748.5347, 4402.3145, 4789.0513, 4627.373, 4672.0024, 4595.137, 4290.1743, 4521.596]
2025-05-13 12:49:22,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [999.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:49:22,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (4594.67) for latency MM1Queue_a033_s075
2025-05-13 12:49:22,597 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 37 minutes, 56 seconds)
2025-05-13 12:52:52,588 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:53:04,387 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3757.49951 ± 1304.645
2025-05-13 12:53:04,387 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4445.4443, 4331.2515, 131.11133, 3106.675, 3350.3481, 4422.8496, 4639.1055, 4575.6343, 4408.7563, 4163.8203]
2025-05-13 12:53:04,388 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 925.0, 118.0, 724.0, 767.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:53:04,401 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 31 minutes, 43 seconds)
2025-05-13 12:56:35,600 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:56:47,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4064.43042 ± 570.161
2025-05-13 12:56:47,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4442.936, 4222.5547, 4584.995, 3386.39, 3387.3152, 4710.173, 2994.8984, 3979.9211, 4455.3135, 4479.809]
2025-05-13 12:56:47,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 990.0, 772.0, 754.0, 1000.0, 676.0, 916.0, 1000.0, 1000.0]
2025-05-13 12:56:47,770 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 29 minutes, 27 seconds)
2025-05-13 13:00:28,651 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:00:35,687 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2188.27173 ± 1731.787
2025-05-13 13:00:35,688 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2562.8171, 4269.5356, 1139.3291, 131.16687, 4669.1494, 1491.9109, 2225.3787, 4750.664, 252.48978, 390.27255]
2025-05-13 13:00:35,688 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [582.0, 1000.0, 316.0, 107.0, 1000.0, 387.0, 514.0, 1000.0, 119.0, 157.0]
2025-05-13 13:00:35,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 25 minutes, 39 seconds)
2025-05-13 13:04:10,503 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:04:22,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4038.20459 ± 1083.125
2025-05-13 13:04:22,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4364.5884, 4447.249, 2359.5068, 4570.953, 4482.7734, 4717.8057, 1479.5135, 4664.875, 4562.028, 4732.7554]
2025-05-13 13:04:22,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 574.0, 1000.0, 1000.0, 1000.0, 386.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:04:22,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 20 minutes, 22 seconds)
2025-05-13 13:07:58,594 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:08:10,461 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4106.13037 ± 1166.443
2025-05-13 13:08:10,461 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4700.637, 4183.1846, 2650.3718, 1175.6202, 4913.7227, 4559.095, 4568.982, 4660.902, 5077.5625, 4571.2285]
2025-05-13 13:08:10,461 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 884.0, 659.0, 314.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:08:10,474 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 15 minutes, 20 seconds)
2025-05-13 13:11:49,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:12:02,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4134.49902 ± 634.047
2025-05-13 13:12:02,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3056.5745, 2927.651, 4329.588, 3895.6875, 4450.426, 4626.8823, 4481.402, 4862.8784, 4684.4907, 4029.4116]
2025-05-13 13:12:02,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [676.0, 676.0, 1000.0, 918.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 877.0]
2025-05-13 13:12:02,108 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 12 minutes, 43 seconds)
2025-05-13 13:15:40,389 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:15:52,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4100.81543 ± 1300.333
2025-05-13 13:15:52,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2734.7478, 4972.1157, 4387.6934, 4585.6562, 4689.1865, 4714.907, 4803.377, 646.9271, 4810.5312, 4663.013]
2025-05-13 13:15:52,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [630.0, 1000.0, 883.0, 959.0, 1000.0, 1000.0, 1000.0, 223.0, 1000.0, 1000.0]
2025-05-13 13:15:52,121 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 9 minutes, 41 seconds)
2025-05-13 13:19:19,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:19:28,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3068.90796 ± 1108.234
2025-05-13 13:19:28,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1628.2726, 3915.7146, 4526.4014, 1592.4358, 4574.297, 2625.7947, 3154.797, 2029.387, 2452.8213, 4189.158]
2025-05-13 13:19:28,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [395.0, 1000.0, 1000.0, 380.0, 1000.0, 578.0, 732.0, 472.0, 547.0, 839.0]
2025-05-13 13:19:28,616 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 4 minutes, 37 seconds)
2025-05-13 13:23:16,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:23:29,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4556.49463 ± 580.654
2025-05-13 13:23:29,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4833.8, 4957.5737, 4495.4478, 4819.659, 4717.405, 4456.024, 2872.5232, 4746.5713, 4826.1353, 4839.8037]
2025-05-13 13:23:29,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 617.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:23:29,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 2 minutes, 21 seconds)
2025-05-13 13:27:08,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:27:19,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3797.44580 ± 1244.080
2025-05-13 13:27:19,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1408.8561, 4392.602, 4157.9854, 4815.998, 4745.833, 4780.005, 4512.7485, 2465.921, 4706.575, 1987.9321]
2025-05-13 13:27:19,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [354.0, 1000.0, 849.0, 1000.0, 1000.0, 1000.0, 1000.0, 560.0, 1000.0, 471.0]
2025-05-13 13:27:19,352 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 58 minutes, 43 seconds)
2025-05-13 13:31:02,587 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:31:12,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3354.08008 ± 1831.310
2025-05-13 13:31:12,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4839.1094, 4639.144, 1635.4836, 4982.0557, 1021.9633, 4593.523, 125.17242, 5014.833, 4744.273, 1945.2426]
2025-05-13 13:31:12,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 445.0, 1000.0, 286.0, 1000.0, 92.0, 1000.0, 1000.0, 470.0]
2025-05-13 13:31:12,352 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 55 minutes, 1 second)
2025-05-13 13:34:30,611 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:34:44,279 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4691.46484 ± 183.604
2025-05-13 13:34:44,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4919.6787, 4722.3774, 4907.3594, 4850.5957, 4541.8755, 4664.2256, 4577.273, 4840.3257, 4317.4287, 4573.507]
2025-05-13 13:34:44,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:34:44,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (4691.46) for latency MM1Queue_a033_s075
2025-05-13 13:34:44,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 49 minutes, 26 seconds)
2025-05-13 13:38:28,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:38:41,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4306.33203 ± 711.626
2025-05-13 13:38:41,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4443.203, 4682.496, 4624.8364, 2211.0337, 4527.856, 4656.1426, 4352.5474, 4268.291, 4687.4297, 4609.483]
2025-05-13 13:38:41,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 513.0, 1000.0, 1000.0, 912.0, 956.0, 1000.0, 1000.0]
2025-05-13 13:38:41,065 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 47 minutes, 33 seconds)
2025-05-13 13:42:29,344 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:42:41,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4306.33301 ± 1308.429
2025-05-13 13:42:41,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4687.8423, 5096.9194, 5078.0234, 5231.7173, 5120.7104, 952.25745, 4948.26, 3246.357, 3500.6626, 5200.583]
2025-05-13 13:42:41,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 269.0, 1000.0, 702.0, 747.0, 1000.0]
2025-05-13 13:42:41,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 43 minutes, 38 seconds)
2025-05-13 13:46:16,809 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:46:27,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3551.02930 ± 1290.939
2025-05-13 13:46:27,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4834.942, 2921.2, 1027.591, 1962.9158, 4832.5815, 2952.596, 3148.4192, 4322.0645, 4952.668, 4555.3174]
2025-05-13 13:46:27,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 637.0, 284.0, 451.0, 991.0, 685.0, 694.0, 911.0, 1000.0, 1000.0]
2025-05-13 13:46:27,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 39 minutes, 28 seconds)
2025-05-13 13:49:52,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:50:03,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3969.84229 ± 1237.284
2025-05-13 13:50:03,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1839.8942, 4943.1123, 4955.291, 3123.2776, 2800.7761, 4898.877, 4822.179, 5148.5522, 4904.6685, 2261.791]
2025-05-13 13:50:03,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [443.0, 1000.0, 1000.0, 649.0, 595.0, 1000.0, 1000.0, 1000.0, 1000.0, 517.0]
2025-05-13 13:50:03,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 34 minutes, 15 seconds)
2025-05-13 13:53:51,752 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:54:04,835 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4370.09863 ± 683.119
2025-05-13 13:54:04,835 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4702.4336, 4358.308, 4695.3047, 4626.7803, 3986.813, 4819.5186, 2463.4766, 4417.5835, 4841.6577, 4789.1084]
2025-05-13 13:54:04,835 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 587.0, 927.0, 1000.0, 1000.0]
2025-05-13 13:54:04,847 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 32 minutes, 50 seconds)
2025-05-13 13:57:38,137 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:57:49,803 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3784.27271 ± 1457.444
2025-05-13 13:57:49,803 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4403.265, 4343.361, 4503.4893, 4470.8877, 4496.5884, 4646.685, 4486.536, 611.85187, 4718.079, 1161.9786]
2025-05-13 13:57:49,803 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 225.0, 1000.0, 315.0]
2025-05-13 13:57:49,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 28 minutes, 4 seconds)
2025-05-13 14:01:14,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:01:26,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3895.29370 ± 776.386
2025-05-13 14:01:26,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4613.883, 2921.5688, 2981.8647, 3861.0793, 4710.668, 2631.517, 4171.5015, 4680.909, 3642.8904, 4737.0566]
2025-05-13 14:01:26,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [999.0, 639.0, 676.0, 822.0, 1000.0, 582.0, 905.0, 1000.0, 843.0, 1000.0]
2025-05-13 14:01:26,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 22 minutes, 30 seconds)
2025-05-13 14:05:09,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:05:21,542 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4094.17334 ± 1411.866
2025-05-13 14:05:21,542 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [807.58484, 4838.8823, 4803.322, 4972.564, 4783.4795, 4831.005, 4789.9937, 4534.6987, 4758.701, 1821.5007]
2025-05-13 14:05:21,542 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [277.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 446.0]
2025-05-13 14:05:21,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 19 minutes, 24 seconds)
2025-05-13 14:09:03,240 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:09:13,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3582.11670 ± 958.321
2025-05-13 14:09:13,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4725.3525, 3246.167, 3585.2156, 3653.77, 4914.067, 4708.2563, 3355.6763, 2353.4753, 1809.2357, 3469.9495]
2025-05-13 14:09:13,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 751.0, 757.0, 776.0, 1000.0, 1000.0, 709.0, 521.0, 417.0, 755.0]
2025-05-13 14:09:13,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 16 minutes, 40 seconds)
2025-05-13 14:12:56,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:13:09,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4367.17041 ± 844.996
2025-05-13 14:13:09,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4738.411, 4872.918, 3279.146, 2280.825, 4703.6953, 4900.789, 4593.861, 4352.226, 5078.893, 4870.9395]
2025-05-13 14:13:09,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 814.0, 515.0, 1000.0, 1000.0, 934.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:13:09,646 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 12 minutes, 30 seconds)
2025-05-13 14:16:29,012 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:16:41,092 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4255.93262 ± 1208.423
2025-05-13 14:16:41,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4870.178, 1889.7975, 4735.024, 4924.769, 1799.6486, 4938.3066, 4861.578, 4691.3394, 4899.922, 4948.765]
2025-05-13 14:16:41,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 472.0, 1000.0, 1000.0, 433.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:16:41,103 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 7 minutes, 52 seconds)
2025-05-13 14:20:34,983 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:20:47,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4369.71631 ± 1123.193
2025-05-13 14:20:47,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [5103.4414, 5062.5923, 4761.224, 5036.4556, 4694.14, 2956.378, 4950.589, 1513.7029, 4720.2603, 4898.3804]
2025-05-13 14:20:47,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 624.0, 1000.0, 374.0, 1000.0, 1000.0]
2025-05-13 14:20:47,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 5 minutes, 45 seconds)
2025-05-13 14:24:19,012 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:24:29,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3549.90820 ± 1323.448
2025-05-13 14:24:29,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2022.5338, 3078.0227, 4792.4443, 4368.8384, 771.1283, 4608.652, 4614.418, 4651.779, 2451.274, 4139.991]
2025-05-13 14:24:29,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [475.0, 688.0, 1000.0, 925.0, 234.0, 1000.0, 967.0, 1000.0, 529.0, 882.0]
2025-05-13 14:24:29,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 1 minute, 13 seconds)
2025-05-13 14:28:04,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:28:16,881 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4304.96387 ± 1101.268
2025-05-13 14:28:16,881 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1881.1665, 4866.489, 4988.482, 4796.6753, 4942.605, 4963.5503, 4704.3887, 4893.4575, 4648.185, 2364.643]
2025-05-13 14:28:16,881 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [453.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 965.0, 1000.0, 1000.0, 531.0]
2025-05-13 14:28:16,896 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 57 minutes, 9 seconds)
2025-05-13 14:32:00,765 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:32:12,755 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4282.86426 ± 1244.591
2025-05-13 14:32:12,755 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4773.298, 4754.6084, 1751.1666, 4804.775, 1865.4714, 4948.8247, 5047.725, 5173.957, 4741.6255, 4967.1914]
2025-05-13 14:32:12,755 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 402.0, 1000.0, 426.0, 1000.0, 1000.0, 1000.0, 947.0, 1000.0]
2025-05-13 14:32:12,769 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 53 minutes, 20 seconds)
2025-05-13 14:35:39,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:35:50,660 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3864.51636 ± 1766.402
2025-05-13 14:35:50,661 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4833.203, 5084.3716, 5020.524, 4735.228, 4826.0234, 5064.46, 307.6569, 3459.1558, 4729.2314, 585.3106]
2025-05-13 14:35:50,661 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 133.0, 753.0, 1000.0, 201.0]
2025-05-13 14:35:50,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 49 minutes, 48 seconds)
2025-05-13 14:39:40,012 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:39:52,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4531.17725 ± 552.828
2025-05-13 14:39:52,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4534.3203, 3114.7825, 4030.5935, 4915.659, 5127.173, 4580.3403, 4638.6704, 4650.12, 5021.214, 4698.8975]
2025-05-13 14:39:52,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 642.0, 832.0, 1000.0, 1000.0, 935.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:39:52,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 45 minutes, 49 seconds)
2025-05-13 14:43:22,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:43:34,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4501.87598 ± 1053.423
2025-05-13 14:43:34,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2958.9812, 5116.2886, 5060.383, 5004.1147, 4977.594, 4921.033, 4807.1694, 4975.905, 1951.9457, 5245.3457]
2025-05-13 14:43:34,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [640.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 454.0, 1000.0]
2025-05-13 14:43:34,721 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 41 minutes, 59 seconds)
2025-05-13 14:47:04,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:47:18,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4822.82373 ± 77.662
2025-05-13 14:47:18,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4838.795, 4774.6245, 4851.2173, 4932.0234, 4743.551, 4748.8843, 4972.1133, 4849.23, 4726.5293, 4791.268]
2025-05-13 14:47:18,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:47:18,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (4822.82) for latency MM1Queue_a033_s075
2025-05-13 14:47:18,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 38 minutes, 2 seconds)
2025-05-13 14:51:03,401 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:51:15,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4204.70020 ± 1253.838
2025-05-13 14:51:15,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4890.8813, 4902.6035, 5199.64, 2064.0774, 4542.8027, 4785.439, 4734.8696, 4808.8657, 4712.344, 1405.4797]
2025-05-13 14:51:15,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 471.0, 912.0, 955.0, 1000.0, 1000.0, 1000.0, 367.0]
2025-05-13 14:51:15,183 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 34 minutes, 16 seconds)
2025-05-13 14:54:56,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:55:09,637 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4380.28906 ± 562.289
2025-05-13 14:55:09,637 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2716.7532, 4663.612, 4322.634, 4536.352, 4543.749, 4553.5894, 4559.437, 4642.198, 4663.464, 4601.0996]
2025-05-13 14:55:09,637 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [640.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:55:09,651 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 30 minutes, 54 seconds)
2025-05-13 14:58:29,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:58:43,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4620.67480 ± 707.138
2025-05-13 14:58:43,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4533.149, 5025.303, 4742.8306, 5034.8403, 4720.935, 4912.375, 4842.1865, 2550.9158, 4794.1094, 5050.102]
2025-05-13 14:58:43,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 681.0, 1000.0, 1000.0]
2025-05-13 14:58:43,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 26 minutes, 22 seconds)
2025-05-13 15:02:26,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:02:36,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3411.58643 ± 1274.082
2025-05-13 15:02:36,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4611.6357, 4752.11, 4594.89, 2316.8948, 2637.5103, 596.40247, 4703.694, 3627.1365, 3196.0466, 3079.5444]
2025-05-13 15:02:36,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 551.0, 572.0, 213.0, 1000.0, 792.0, 686.0, 687.0]
2025-05-13 15:02:36,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 22 minutes, 50 seconds)
2025-05-13 15:06:25,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:06:37,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4609.45312 ± 628.562
2025-05-13 15:06:37,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4165.8223, 4729.008, 4944.0444, 5038.9746, 4862.586, 4611.9077, 2881.9595, 5092.531, 4809.362, 4958.3315]
2025-05-13 15:06:37,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [856.0, 1000.0, 990.0, 1000.0, 1000.0, 1000.0, 598.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:06:37,921 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 19 minutes, 19 seconds)
2025-05-13 15:09:59,264 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:10:11,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4309.24316 ± 692.783
2025-05-13 15:10:11,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4714.525, 4682.52, 4753.3706, 3404.0156, 4782.6865, 2938.3015, 4759.1265, 3477.4639, 4886.8203, 4693.6016]
2025-05-13 15:10:11,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 734.0, 1000.0, 662.0, 1000.0, 738.0, 1000.0, 1000.0]
2025-05-13 15:10:11,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 15 minutes, 9 seconds)
2025-05-13 15:14:00,338 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:14:11,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3984.20581 ± 877.785
2025-05-13 15:14:11,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4922.048, 3590.607, 2877.7747, 4950.2676, 4678.8965, 4380.83, 2319.161, 4802.205, 3993.0652, 3327.2021]
2025-05-13 15:14:11,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 737.0, 623.0, 1000.0, 1000.0, 881.0, 504.0, 1000.0, 788.0, 672.0]
2025-05-13 15:14:11,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 11 minutes, 25 seconds)
2025-05-13 15:17:40,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:17:52,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4503.96777 ± 1381.777
2025-05-13 15:17:52,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [5020.906, 4834.975, 5133.1367, 367.35913, 4877.6406, 5071.1523, 5001.018, 4868.468, 4918.9336, 4946.087]
2025-05-13 15:17:52,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 154.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:17:52,887 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 7 minutes, 39 seconds)
2025-05-13 15:21:43,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:21:57,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4825.75049 ± 126.275
2025-05-13 15:21:57,646 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4853.188, 4924.203, 4707.502, 4829.336, 5050.904, 5004.762, 4730.131, 4730.969, 4779.7275, 4646.785]
2025-05-13 15:21:57,646 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:21:57,646 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (4825.75) for latency MM1Queue_a033_s075
2025-05-13 15:21:57,661 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 52 seconds)
2025-05-13 15:25:21,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:25:34,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 4487.46436 ± 358.934
2025-05-13 15:25:34,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [4565.8887, 4478.2876, 4599.5713, 4603.783, 4442.2446, 4723.9507, 4506.266, 4763.615, 3457.4836, 4733.5493]
2025-05-13 15:25:34,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 995.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 740.0, 1000.0]
2025-05-13 15:25:34,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1251 [DEBUG]: Training session finished
