2025-05-13 09:06:35,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mda-mem16
2025-05-13 09:06:35,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mda-mem16
2025-05-13 09:06:35,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14f0e0f09e10>}
2025-05-13 09:06:35,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:35,705 baseline-bpql-mda-noisy-walker2d:91 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-05-13 09:06:35,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1133 [INFO]: Creating new trainer
2025-05-13 09:06:35,724 baseline-bpql-mda-noisy-walker2d:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-13 09:06:35,724 baseline-bpql-mda-noisy-walker2d:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:35,730 baseline-bpql-mda-noisy-walker2d:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(6, 384, batch_first=True)
)
2025-05-13 09:06:36,704 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:36,704 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 1/100
2025-05-13 09:10:12,101 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:10:13,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 132.68141 ± 36.157
2025-05-13 09:10:13,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [145.05354, 89.9662, 90.055786, 143.63423, 91.94518, 100.06495, 191.70073, 176.31671, 133.02782, 165.0489]
2025-05-13 09:10:13,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [79.0, 54.0, 58.0, 74.0, 57.0, 62.0, 99.0, 91.0, 71.0, 86.0]
2025-05-13 09:10:13,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (132.68) for latency MM1Queue_a033_s075
2025-05-13 09:10:13,305 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 57 minutes, 23 seconds)
2025-05-13 09:13:58,167 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:13:59,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 160.57239 ± 50.918
2025-05-13 09:13:59,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [187.39659, 93.14945, 171.67488, 134.85619, 116.140144, 218.98912, 141.93623, 139.69864, 273.21814, 128.66443]
2025-05-13 09:13:59,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [96.0, 60.0, 91.0, 72.0, 68.0, 113.0, 76.0, 76.0, 146.0, 253.0]
2025-05-13 09:13:59,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (160.57) for latency MM1Queue_a033_s075
2025-05-13 09:13:59,876 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 1 minute, 55 seconds)
2025-05-13 09:17:40,305 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:17:41,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 149.96989 ± 52.309
2025-05-13 09:17:41,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [104.784035, 119.86738, 138.22708, 106.78465, 177.055, 169.03683, 104.17126, 288.4056, 142.88223, 148.48505]
2025-05-13 09:17:41,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [67.0, 72.0, 85.0, 68.0, 93.0, 95.0, 66.0, 168.0, 81.0, 83.0]
2025-05-13 09:17:41,712 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 58 minutes, 21 seconds)
2025-05-13 09:21:24,229 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:21:26,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 190.15121 ± 61.717
2025-05-13 09:21:26,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [282.34085, 288.17468, 199.58514, 181.47322, 174.9526, 107.626564, 227.54962, 85.018684, 177.08772, 177.70299]
2025-05-13 09:21:26,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [143.0, 163.0, 103.0, 98.0, 84.0, 64.0, 111.0, 172.0, 104.0, 107.0]
2025-05-13 09:21:26,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (190.15) for latency MM1Queue_a033_s075
2025-05-13 09:21:26,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 55 minutes, 44 seconds)
2025-05-13 09:25:07,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:25:10,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 233.92935 ± 113.073
2025-05-13 09:25:10,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [168.78352, 229.54799, 164.36336, 441.079, 106.269196, 297.17224, 173.861, 89.543144, 408.2494, 260.42465]
2025-05-13 09:25:10,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [144.0, 147.0, 109.0, 292.0, 179.0, 212.0, 131.0, 115.0, 179.0, 144.0]
2025-05-13 09:25:10,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (233.93) for latency MM1Queue_a033_s075
2025-05-13 09:25:10,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 52 minutes, 38 seconds)
2025-05-13 09:28:51,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:28:55,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 403.76816 ± 165.575
2025-05-13 09:28:55,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [296.12674, 411.85895, 138.85745, 652.6292, 246.49828, 345.55914, 289.90402, 464.46326, 672.68744, 519.09705]
2025-05-13 09:28:55,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [152.0, 298.0, 180.0, 322.0, 220.0, 179.0, 149.0, 377.0, 330.0, 299.0]
2025-05-13 09:28:55,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (403.77) for latency MM1Queue_a033_s075
2025-05-13 09:28:55,283 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 51 minutes, 33 seconds)
2025-05-13 09:32:38,721 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:32:42,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 476.13101 ± 183.273
2025-05-13 09:32:42,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [758.6702, 428.92468, 387.36383, 280.1752, 297.1727, 693.0888, 180.80708, 577.5909, 508.46194, 649.05475]
2025-05-13 09:32:42,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [332.0, 206.0, 203.0, 158.0, 180.0, 308.0, 161.0, 324.0, 232.0, 297.0]
2025-05-13 09:32:42,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (476.13) for latency MM1Queue_a033_s075
2025-05-13 09:32:42,543 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 48 minutes, 1 second)
2025-05-13 09:36:25,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:36:28,727 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 374.29639 ± 179.222
2025-05-13 09:36:28,727 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [277.73456, 653.93195, 532.86316, 246.95221, 220.57428, 238.28668, 193.56883, 699.42084, 408.55283, 271.0786]
2025-05-13 09:36:28,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [151.0, 296.0, 213.0, 129.0, 147.0, 146.0, 253.0, 341.0, 158.0, 136.0]
2025-05-13 09:36:28,736 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 45 minutes, 37 seconds)
2025-05-13 09:40:06,887 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:40:09,252 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 287.93643 ± 140.025
2025-05-13 09:40:09,252 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [587.1744, 226.09494, 227.81873, 234.77701, 142.27359, 171.9237, 190.07935, 515.6639, 284.03278, 299.52597]
2025-05-13 09:40:09,252 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [244.0, 136.0, 139.0, 120.0, 95.0, 124.0, 136.0, 232.0, 135.0, 143.0]
2025-05-13 09:40:09,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 40 minutes, 42 seconds)
2025-05-13 09:43:52,357 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:43:55,404 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 439.90909 ± 141.011
2025-05-13 09:43:55,404 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [528.4226, 759.66296, 513.9662, 384.24664, 339.45657, 285.75735, 466.2719, 455.23206, 226.84035, 439.2342]
2025-05-13 09:43:55,404 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [212.0, 289.0, 206.0, 167.0, 237.0, 131.0, 180.0, 178.0, 132.0, 184.0]
2025-05-13 09:43:55,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 37 minutes, 32 seconds)
2025-05-13 09:47:36,380 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:47:39,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 463.56012 ± 123.664
2025-05-13 09:47:39,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [497.52106, 674.80255, 595.34625, 376.99307, 509.74878, 322.2994, 283.42422, 375.3345, 407.04837, 593.0828]
2025-05-13 09:47:39,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [210.0, 289.0, 228.0, 158.0, 198.0, 148.0, 136.0, 166.0, 168.0, 242.0]
2025-05-13 09:47:39,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 33 minutes, 31 seconds)
2025-05-13 09:51:22,561 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:51:26,444 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 505.02808 ± 230.809
2025-05-13 09:51:26,444 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [685.07227, 393.06076, 1029.2693, 344.1455, 286.86017, 735.22284, 295.9472, 501.0153, 479.86957, 299.81808]
2025-05-13 09:51:26,444 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [284.0, 183.0, 555.0, 168.0, 254.0, 261.0, 145.0, 219.0, 206.0, 144.0]
2025-05-13 09:51:26,444 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (505.03) for latency MM1Queue_a033_s075
2025-05-13 09:51:26,456 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 29 minutes, 40 seconds)
2025-05-13 09:55:08,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:55:11,653 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 400.54242 ± 140.406
2025-05-13 09:55:11,653 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [573.58374, 642.1096, 324.47076, 410.94037, 285.3187, 271.0682, 258.39703, 386.05188, 266.8723, 586.6114]
2025-05-13 09:55:11,654 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [222.0, 223.0, 163.0, 188.0, 137.0, 132.0, 129.0, 157.0, 129.0, 238.0]
2025-05-13 09:55:11,662 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 25 minutes, 38 seconds)
2025-05-13 09:58:51,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:58:54,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 541.27380 ± 250.994
2025-05-13 09:58:54,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [159.33972, 453.7979, 713.25665, 252.64256, 902.5734, 565.39264, 427.06854, 389.3599, 983.69507, 565.6117]
2025-05-13 09:58:54,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [102.0, 197.0, 309.0, 118.0, 308.0, 222.0, 189.0, 160.0, 357.0, 229.0]
2025-05-13 09:58:54,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (541.27) for latency MM1Queue_a033_s075
2025-05-13 09:58:54,890 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 22 minutes, 40 seconds)
2025-05-13 10:02:34,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:02:37,508 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 464.57178 ± 101.578
2025-05-13 10:02:37,508 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [508.84357, 334.40775, 461.99872, 613.609, 429.46707, 555.0425, 264.19833, 411.63443, 521.9906, 544.52563]
2025-05-13 10:02:37,508 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [216.0, 157.0, 171.0, 272.0, 188.0, 217.0, 135.0, 165.0, 205.0, 224.0]
2025-05-13 10:02:37,517 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 17 minutes, 55 seconds)
2025-05-13 10:06:21,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:06:24,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 590.20203 ± 228.041
2025-05-13 10:06:24,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [482.48062, 670.45703, 770.36755, 633.3415, 106.53339, 641.6823, 297.24094, 674.29926, 952.5801, 673.0383]
2025-05-13 10:06:24,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [178.0, 252.0, 284.0, 272.0, 63.0, 242.0, 139.0, 252.0, 331.0, 252.0]
2025-05-13 10:06:24,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (590.20) for latency MM1Queue_a033_s075
2025-05-13 10:06:24,936 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 15 minutes, 7 seconds)
2025-05-13 10:10:05,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:10:09,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 641.39099 ± 128.547
2025-05-13 10:10:09,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [587.3332, 575.1931, 618.779, 698.7931, 676.954, 608.6633, 467.90192, 583.8415, 612.22626, 984.22473]
2025-05-13 10:10:09,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [197.0, 210.0, 226.0, 242.0, 248.0, 242.0, 188.0, 221.0, 236.0, 345.0]
2025-05-13 10:10:09,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (641.39) for latency MM1Queue_a033_s075
2025-05-13 10:10:09,590 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 10 minutes, 44 seconds)
2025-05-13 10:13:56,960 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:14:01,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 557.34644 ± 328.857
2025-05-13 10:14:01,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [812.3052, 681.46216, 759.46625, 245.26657, 272.4568, 163.96213, 1155.387, 371.2436, 213.46095, 898.4534]
2025-05-13 10:14:01,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [360.0, 285.0, 345.0, 164.0, 175.0, 110.0, 503.0, 158.0, 122.0, 372.0]
2025-05-13 10:14:01,120 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 8 minutes, 43 seconds)
2025-05-13 10:17:43,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:17:48,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 767.03436 ± 201.423
2025-05-13 10:17:48,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [703.7582, 753.30536, 694.856, 604.27374, 833.13043, 1149.5969, 953.89624, 695.4475, 369.64813, 912.4314]
2025-05-13 10:17:48,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [300.0, 269.0, 265.0, 231.0, 282.0, 463.0, 398.0, 257.0, 219.0, 371.0]
2025-05-13 10:17:48,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (767.03) for latency MM1Queue_a033_s075
2025-05-13 10:17:48,463 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 6 minutes, 3 seconds)
2025-05-13 10:21:28,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:21:33,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 855.38000 ± 221.225
2025-05-13 10:21:33,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1155.5011, 1037.6078, 651.51355, 869.66486, 851.00385, 1203.2546, 577.3443, 578.296, 966.3882, 663.22516]
2025-05-13 10:21:33,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [349.0, 440.0, 253.0, 311.0, 271.0, 372.0, 209.0, 224.0, 294.0, 234.0]
2025-05-13 10:21:33,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (855.38) for latency MM1Queue_a033_s075
2025-05-13 10:21:33,172 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 2 minutes, 50 seconds)
2025-05-13 10:25:19,359 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:25:24,033 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 702.18127 ± 334.221
2025-05-13 10:25:24,033 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [131.05573, 1268.2725, 864.32513, 1092.4078, 533.9848, 847.37445, 568.96515, 840.4508, 256.56445, 618.412]
2025-05-13 10:25:24,033 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [100.0, 499.0, 325.0, 356.0, 270.0, 307.0, 211.0, 313.0, 288.0, 225.0]
2025-05-13 10:25:24,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 59 minutes, 57 seconds)
2025-05-13 10:29:03,438 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:29:09,395 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1074.45850 ± 306.409
2025-05-13 10:29:09,395 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [782.4806, 775.06104, 926.00653, 1155.3925, 820.32385, 1741.9814, 947.9311, 1062.4773, 1006.34, 1526.5905]
2025-05-13 10:29:09,395 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [296.0, 271.0, 329.0, 358.0, 288.0, 650.0, 319.0, 424.0, 334.0, 457.0]
2025-05-13 10:29:09,395 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1074.46) for latency MM1Queue_a033_s075
2025-05-13 10:29:09,404 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 56 minutes, 21 seconds)
2025-05-13 10:32:51,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:32:57,176 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 827.11377 ± 495.553
2025-05-13 10:32:57,176 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1666.7708, 609.5427, 473.11215, 157.208, 642.6752, 1333.2936, 517.22186, 1245.805, 280.16815, 1345.3406]
2025-05-13 10:32:57,176 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [651.0, 280.0, 324.0, 118.0, 265.0, 460.0, 197.0, 389.0, 144.0, 518.0]
2025-05-13 10:32:57,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 51 minutes, 35 seconds)
2025-05-13 10:36:36,231 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:36:42,890 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1011.75635 ± 551.802
2025-05-13 10:36:42,890 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2044.0425, 781.1281, 824.4139, 816.55853, 171.64766, 256.62024, 1680.5743, 1018.68365, 1308.2128, 1215.682]
2025-05-13 10:36:42,890 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [678.0, 262.0, 346.0, 355.0, 185.0, 156.0, 876.0, 313.0, 402.0, 482.0]
2025-05-13 10:36:42,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 47 minutes, 23 seconds)
2025-05-13 10:40:28,219 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:40:36,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1505.89917 ± 567.028
2025-05-13 10:40:36,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1573.5728, 1703.7461, 1357.3463, 1381.7817, 1049.5925, 2080.49, 669.2191, 2504.3704, 713.1535, 2025.719]
2025-05-13 10:40:36,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [487.0, 513.0, 426.0, 462.0, 387.0, 683.0, 267.0, 793.0, 252.0, 615.0]
2025-05-13 10:40:36,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1505.90) for latency MM1Queue_a033_s075
2025-05-13 10:40:36,278 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 45 minutes, 46 seconds)
2025-05-13 10:44:23,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:44:29,570 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1228.03638 ± 178.338
2025-05-13 10:44:29,570 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [999.4626, 1414.1047, 1121.2305, 1020.7301, 1277.864, 1634.69, 1179.8136, 1194.954, 1167.0255, 1270.488]
2025-05-13 10:44:29,570 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [331.0, 423.0, 348.0, 323.0, 430.0, 518.0, 404.0, 394.0, 412.0, 422.0]
2025-05-13 10:44:29,577 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 42 minutes, 33 seconds)
2025-05-13 10:48:12,394 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:48:17,470 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 900.17908 ± 541.040
2025-05-13 10:48:17,470 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1243.5012, 633.5296, 2134.0767, 815.4162, 1406.7346, 350.92224, 1016.26385, 266.89835, 483.3722, 651.07654]
2025-05-13 10:48:17,470 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [397.0, 245.0, 667.0, 282.0, 440.0, 157.0, 357.0, 129.0, 221.0, 234.0]
2025-05-13 10:48:17,479 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 39 minutes, 21 seconds)
2025-05-13 10:51:54,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:52:02,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1452.24988 ± 626.251
2025-05-13 10:52:02,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2106.1553, 1304.4993, 1054.9404, 248.5025, 1138.3563, 2646.403, 1811.3392, 1409.7068, 1721.5676, 1081.0292]
2025-05-13 10:52:02,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [655.0, 483.0, 326.0, 230.0, 389.0, 823.0, 600.0, 491.0, 566.0, 352.0]
2025-05-13 10:52:02,283 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 34 minutes, 49 seconds)
2025-05-13 10:55:51,289 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:55:57,216 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1089.55920 ± 677.057
2025-05-13 10:55:57,216 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [944.74146, 829.9993, 700.725, 2900.0127, 1299.3617, 131.40279, 908.18066, 1211.8386, 1116.6665, 852.66425]
2025-05-13 10:55:57,216 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [330.0, 271.0, 240.0, 925.0, 439.0, 76.0, 321.0, 415.0, 356.0, 314.0]
2025-05-13 10:55:57,225 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 33 minutes, 11 seconds)
2025-05-13 10:59:37,888 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:59:44,734 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1314.26221 ± 441.245
2025-05-13 10:59:44,734 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [864.3548, 1135.1495, 2089.732, 2184.134, 1246.8036, 1308.916, 770.7345, 1153.7178, 1194.6348, 1194.4443]
2025-05-13 10:59:44,734 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [333.0, 366.0, 610.0, 646.0, 401.0, 404.0, 282.0, 341.0, 386.0, 450.0]
2025-05-13 10:59:44,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 27 minutes, 58 seconds)
2025-05-13 11:03:31,365 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:03:37,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1195.35132 ± 524.495
2025-05-13 11:03:37,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [667.421, 688.33984, 2067.776, 1704.0249, 1049.371, 1585.8745, 294.5488, 1563.6179, 1356.9633, 975.5765]
2025-05-13 11:03:37,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [232.0, 237.0, 644.0, 556.0, 417.0, 486.0, 148.0, 509.0, 441.0, 358.0]
2025-05-13 11:03:37,826 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 24 minutes, 5 seconds)
2025-05-13 11:07:12,229 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:07:19,181 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1369.05542 ± 470.712
2025-05-13 11:07:19,181 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2585.899, 1404.0787, 1239.5587, 1022.54816, 699.7868, 1626.5231, 1281.4135, 1438.218, 1123.6694, 1268.8585]
2025-05-13 11:07:19,181 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [758.0, 463.0, 426.0, 368.0, 250.0, 478.0, 389.0, 468.0, 357.0, 375.0]
2025-05-13 11:07:19,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 18 minutes, 47 seconds)
2025-05-13 11:10:58,923 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:11:05,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1262.22864 ± 824.988
2025-05-13 11:11:05,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [399.03778, 1145.0372, 552.96027, 1375.4531, 3254.7869, 654.9467, 1043.414, 1407.061, 657.1436, 2132.445]
2025-05-13 11:11:05,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [162.0, 374.0, 217.0, 430.0, 1000.0, 234.0, 326.0, 456.0, 247.0, 724.0]
2025-05-13 11:11:05,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 15 minutes, 22 seconds)
2025-05-13 11:14:54,090 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:15:04,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1931.57153 ± 863.439
2025-05-13 11:15:04,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1033.2524, 3042.3625, 1925.6735, 1324.0094, 602.942, 2578.6396, 3166.3481, 965.5544, 2148.2998, 2528.6328]
2025-05-13 11:15:04,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [369.0, 1000.0, 673.0, 439.0, 231.0, 842.0, 1000.0, 341.0, 681.0, 883.0]
2025-05-13 11:15:04,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (1931.57) for latency MM1Queue_a033_s075
2025-05-13 11:15:04,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 12 minutes, 28 seconds)
2025-05-13 11:18:47,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:18:57,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2028.96216 ± 876.692
2025-05-13 11:18:57,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1192.0668, 3067.7744, 1052.4381, 3084.2, 1855.8091, 1469.679, 3481.9185, 1227.717, 1327.5902, 2530.4268]
2025-05-13 11:18:57,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [356.0, 1000.0, 362.0, 994.0, 586.0, 485.0, 1000.0, 372.0, 462.0, 719.0]
2025-05-13 11:18:57,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (2028.96) for latency MM1Queue_a033_s075
2025-05-13 11:18:57,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 9 minutes, 46 seconds)
2025-05-13 11:22:41,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:22:54,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2310.36060 ± 875.970
2025-05-13 11:22:54,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3382.232, 2200.3328, 3023.325, 1194.7865, 2884.3926, 2928.9663, 1670.5665, 2212.055, 3051.342, 555.6079]
2025-05-13 11:22:54,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 736.0, 1000.0, 431.0, 1000.0, 979.0, 511.0, 726.0, 1000.0, 265.0]
2025-05-13 11:22:54,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (2310.36) for latency MM1Queue_a033_s075
2025-05-13 11:22:54,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 6 minutes, 43 seconds)
2025-05-13 11:26:42,205 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:26:51,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1806.37573 ± 829.308
2025-05-13 11:26:51,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2455.7363, 1503.5652, 1213.7306, 2324.3325, 2196.2397, 3310.486, 2418.383, 1264.1821, 403.31827, 973.78375]
2025-05-13 11:26:51,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [723.0, 433.0, 392.0, 619.0, 602.0, 1000.0, 733.0, 415.0, 173.0, 336.0]
2025-05-13 11:26:51,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 6 minutes, 5 seconds)
2025-05-13 11:30:24,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:30:32,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1448.81226 ± 883.475
2025-05-13 11:30:32,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2136.7332, 1515.7032, 1646.9103, 2097.9604, 2788.8044, 2445.7988, 167.9815, 343.61142, 603.505, 741.11444]
2025-05-13 11:30:32,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [617.0, 454.0, 545.0, 669.0, 908.0, 732.0, 98.0, 165.0, 283.0, 258.0]
2025-05-13 11:30:32,523 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 1 minute, 7 seconds)
2025-05-13 11:34:13,283 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:34:20,700 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1422.09106 ± 1067.656
2025-05-13 11:34:20,701 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [751.0812, 2277.295, 1016.18726, 2195.9612, 1162.0247, 367.97455, 254.32722, 126.00824, 2916.257, 3153.7935]
2025-05-13 11:34:20,701 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [262.0, 630.0, 346.0, 584.0, 415.0, 148.0, 208.0, 74.0, 845.0, 1000.0]
2025-05-13 11:34:20,710 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 55 minutes, 1 second)
2025-05-13 11:38:05,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:38:13,815 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1661.49097 ± 833.267
2025-05-13 11:38:13,815 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2076.9602, 2694.9543, 1130.3156, 206.80884, 3071.4075, 851.3509, 1484.3594, 1630.9476, 2289.531, 1178.2753]
2025-05-13 11:38:13,815 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [584.0, 800.0, 348.0, 106.0, 896.0, 283.0, 476.0, 543.0, 742.0, 375.0]
2025-05-13 11:38:13,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 51 minutes, 15 seconds)
2025-05-13 11:41:57,980 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:42:08,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1924.52954 ± 801.172
2025-05-13 11:42:08,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [977.11615, 1910.5448, 1627.2761, 1919.3469, 1021.0662, 3437.582, 2387.0193, 953.3291, 2009.1918, 3002.823]
2025-05-13 11:42:08,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [540.0, 582.0, 483.0, 572.0, 331.0, 987.0, 728.0, 338.0, 576.0, 952.0]
2025-05-13 11:42:08,028 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 46 minutes, 53 seconds)
2025-05-13 11:45:49,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:46:03,830 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2702.08911 ± 926.560
2025-05-13 11:46:03,830 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3069.9463, 3337.292, 3066.0833, 1529.5132, 3013.71, 3155.033, 3230.855, 3057.8882, 343.1003, 3217.4707]
2025-05-13 11:46:03,830 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 439.0, 1000.0, 991.0, 1000.0, 890.0, 155.0, 1000.0]
2025-05-13 11:46:03,830 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (2702.09) for latency MM1Queue_a033_s075
2025-05-13 11:46:03,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 42 minutes, 52 seconds)
2025-05-13 11:49:46,072 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:49:54,463 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1691.31030 ± 716.430
2025-05-13 11:49:54,463 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3467.3638, 1649.6581, 1770.9939, 1203.7173, 2497.055, 1000.1545, 1497.7198, 1425.5559, 1288.744, 1112.1412]
2025-05-13 11:49:54,463 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 468.0, 579.0, 382.0, 754.0, 321.0, 458.0, 470.0, 384.0, 380.0]
2025-05-13 11:49:54,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 40 minutes, 46 seconds)
2025-05-13 11:53:45,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:54:01,515 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2990.95972 ± 278.469
2025-05-13 11:54:01,515 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3109.0767, 3393.908, 3248.601, 2535.0479, 2876.8496, 2995.893, 2684.6133, 2676.9006, 3059.0168, 3329.6921]
2025-05-13 11:54:01,515 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 855.0, 1000.0, 1000.0, 867.0, 816.0, 959.0, 1000.0]
2025-05-13 11:54:01,515 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (2990.96) for latency MM1Queue_a033_s075
2025-05-13 11:54:01,526 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 40 minutes, 25 seconds)
2025-05-13 11:57:36,874 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:57:45,210 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1642.36450 ± 1156.811
2025-05-13 11:57:45,210 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [284.32736, 832.34674, 980.7683, 1138.3438, 1979.1837, 3505.3508, 1176.0226, 2682.7058, 317.7264, 3526.869]
2025-05-13 11:57:45,210 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [156.0, 264.0, 336.0, 357.0, 645.0, 1000.0, 360.0, 759.0, 185.0, 1000.0]
2025-05-13 11:57:45,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 34 minutes, 45 seconds)
2025-05-13 12:01:28,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:01:41,837 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2721.36816 ± 688.378
2025-05-13 12:01:41,837 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2603.0635, 3578.2559, 2626.3418, 3577.4875, 2994.4392, 1863.7305, 3289.6519, 1600.6007, 3169.9343, 1910.1781]
2025-05-13 12:01:41,837 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 945.0, 538.0, 912.0, 457.0, 899.0, 578.0]
2025-05-13 12:01:41,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 31 minutes, 17 seconds)
2025-05-13 12:05:19,290 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:05:30,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2423.64600 ± 891.432
2025-05-13 12:05:30,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1616.2948, 3374.805, 2565.6726, 1753.4468, 1638.0933, 3657.5642, 3295.0842, 3448.9053, 1598.6704, 1287.9268]
2025-05-13 12:05:30,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [467.0, 1000.0, 685.0, 529.0, 528.0, 1000.0, 1000.0, 1000.0, 469.0, 422.0]
2025-05-13 12:05:30,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 26 minutes, 11 seconds)
2025-05-13 12:09:11,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:09:18,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1354.64417 ± 837.500
2025-05-13 12:09:18,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [876.108, 900.8218, 1082.1926, 184.68889, 871.012, 2284.4614, 1152.4689, 1129.609, 1767.2863, 3297.7935]
2025-05-13 12:09:18,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [342.0, 319.0, 381.0, 113.0, 288.0, 770.0, 436.0, 380.0, 602.0, 966.0]
2025-05-13 12:09:18,854 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 21 minutes, 49 seconds)
2025-05-13 12:12:55,983 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:13:09,684 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2697.98218 ± 876.504
2025-05-13 12:13:09,684 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2314.0728, 3708.2231, 2691.5205, 1322.244, 3378.1091, 3504.6167, 3007.2888, 941.94763, 2864.6543, 3247.1445]
2025-05-13 12:13:09,684 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [634.0, 1000.0, 1000.0, 418.0, 1000.0, 1000.0, 787.0, 305.0, 1000.0, 1000.0]
2025-05-13 12:13:09,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 15 minutes, 11 seconds)
2025-05-13 12:16:55,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:17:05,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1788.00806 ± 1091.511
2025-05-13 12:17:05,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2974.1592, 3191.442, 289.62183, 762.4395, 253.50027, 1928.7598, 2248.6238, 3254.5823, 1890.6309, 1086.3219]
2025-05-13 12:17:05,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 903.0, 124.0, 362.0, 106.0, 635.0, 654.0, 1000.0, 606.0, 377.0]
2025-05-13 12:17:05,405 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 13 minutes, 21 seconds)
2025-05-13 12:20:45,253 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:20:56,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2573.66431 ± 1078.206
2025-05-13 12:20:56,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1842.3362, 1925.9059, 3665.6096, 685.87256, 2711.769, 3913.6199, 3397.4014, 3716.734, 1129.7867, 2747.6072]
2025-05-13 12:20:56,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [498.0, 563.0, 1000.0, 246.0, 821.0, 1000.0, 875.0, 1000.0, 343.0, 730.0]
2025-05-13 12:20:56,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 8 minutes, 39 seconds)
2025-05-13 12:24:48,766 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:25:02,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2956.75537 ± 604.584
2025-05-13 12:25:02,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3428.699, 3013.0903, 3514.7444, 3338.1501, 2237.7798, 1787.8804, 3436.545, 2340.552, 3637.8564, 2832.2573]
2025-05-13 12:25:02,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 840.0, 1000.0, 1000.0, 672.0, 498.0, 1000.0, 632.0, 1000.0, 774.0]
2025-05-13 12:25:02,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 7 minutes, 31 seconds)
2025-05-13 12:28:30,527 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:28:41,546 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2237.81763 ± 640.590
2025-05-13 12:28:41,546 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2508.1218, 2340.624, 1721.938, 1782.7054, 1707.2081, 3134.7314, 3414.002, 2641.9116, 1516.2313, 1610.7039]
2025-05-13 12:28:41,546 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [739.0, 704.0, 561.0, 573.0, 484.0, 880.0, 1000.0, 771.0, 514.0, 437.0]
2025-05-13 12:28:41,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 2 minutes, 9 seconds)
2025-05-13 12:32:25,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:32:35,316 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1927.50269 ± 882.622
2025-05-13 12:32:35,316 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1599.572, 1129.334, 1884.4542, 2592.594, 767.75964, 2660.4895, 2721.0435, 1301.6255, 1003.9902, 3614.1655]
2025-05-13 12:32:35,316 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [479.0, 344.0, 544.0, 738.0, 297.0, 795.0, 782.0, 402.0, 348.0, 1000.0]
2025-05-13 12:32:35,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 58 minutes, 43 seconds)
2025-05-13 12:36:09,687 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:36:24,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2970.18311 ± 1011.945
2025-05-13 12:36:24,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2868.85, 3438.984, 3382.5383, 3567.068, 1530.7174, 3660.7607, 3709.6775, 3534.7837, 554.52704, 3453.925]
2025-05-13 12:36:24,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [806.0, 1000.0, 1000.0, 1000.0, 481.0, 1000.0, 1000.0, 1000.0, 198.0, 1000.0]
2025-05-13 12:36:24,028 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 53 minutes, 47 seconds)
2025-05-13 12:40:02,301 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:40:17,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3051.03174 ± 778.797
2025-05-13 12:40:17,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2982.1562, 3492.6091, 3439.1306, 3522.0652, 794.65076, 3475.7283, 3032.7278, 3227.1814, 3057.1423, 3486.9246]
2025-05-13 12:40:17,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [891.0, 1000.0, 1000.0, 1000.0, 298.0, 1000.0, 912.0, 1000.0, 838.0, 1000.0]
2025-05-13 12:40:17,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (3051.03) for latency MM1Queue_a033_s075
2025-05-13 12:40:17,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 50 minutes, 10 seconds)
2025-05-13 12:44:11,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:44:23,380 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2512.47583 ± 1043.966
2025-05-13 12:44:23,380 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3451.6543, 1675.2313, 3664.2117, 2008.568, 3546.0764, 897.53424, 3436.298, 3481.3396, 1214.8973, 1748.9482]
2025-05-13 12:44:23,380 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 527.0, 1000.0, 571.0, 1000.0, 300.0, 1000.0, 1000.0, 521.0, 501.0]
2025-05-13 12:44:23,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 46 minutes, 19 seconds)
2025-05-13 12:47:49,138 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:47:59,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2068.79175 ± 1334.819
2025-05-13 12:47:59,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [366.7028, 128.5898, 3212.0588, 1145.5076, 2029.7006, 3606.507, 514.2248, 3438.6357, 2883.7563, 3362.2349]
2025-05-13 12:47:59,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [191.0, 99.0, 1000.0, 391.0, 615.0, 1000.0, 256.0, 1000.0, 816.0, 979.0]
2025-05-13 12:47:59,590 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 42 minutes, 7 seconds)
2025-05-13 12:51:34,916 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:51:44,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2153.43213 ± 597.367
2025-05-13 12:51:44,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1834.6791, 2011.7788, 1101.5808, 1905.2003, 2806.5383, 3252.9248, 1577.9476, 2594.128, 1992.6229, 2456.9202]
2025-05-13 12:51:44,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [486.0, 533.0, 328.0, 573.0, 703.0, 829.0, 422.0, 691.0, 528.0, 653.0]
2025-05-13 12:51:44,206 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 37 minutes)
2025-05-13 12:55:34,448 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:55:48,977 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3035.12646 ± 745.970
2025-05-13 12:55:48,977 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2435.3677, 3595.7234, 3473.9954, 3286.8157, 3471.069, 3451.0737, 2903.9666, 3672.2385, 2987.0862, 1073.928]
2025-05-13 12:55:48,977 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [739.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 756.0, 1000.0, 820.0, 387.0]
2025-05-13 12:55:48,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 35 minutes, 19 seconds)
2025-05-13 12:59:27,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:59:39,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2668.07251 ± 744.679
2025-05-13 12:59:39,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3444.1824, 3621.284, 2311.343, 3466.3257, 2215.554, 2286.6733, 1763.4634, 2054.0022, 1834.8582, 3683.0386]
2025-05-13 12:59:39,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 648.0, 1000.0, 635.0, 615.0, 537.0, 603.0, 525.0, 1000.0]
2025-05-13 12:59:39,635 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 31 minutes, 7 seconds)
2025-05-13 13:03:18,649 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:03:32,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3075.77930 ± 846.146
2025-05-13 13:03:32,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3590.2815, 3880.8672, 3755.1865, 3605.4282, 2364.985, 2924.339, 1370.3563, 3813.3132, 3526.5396, 1926.4955]
2025-05-13 13:03:32,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 993.0, 1000.0, 599.0, 766.0, 399.0, 1000.0, 1000.0, 527.0]
2025-05-13 13:03:32,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (3075.78) for latency MM1Queue_a033_s075
2025-05-13 13:03:32,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 25 minutes, 33 seconds)
2025-05-13 13:07:12,509 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:07:23,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2516.56079 ± 785.278
2025-05-13 13:07:23,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2373.3643, 2847.2385, 2770.9329, 2173.6658, 3958.2947, 2184.3848, 3710.5461, 2172.3513, 1447.5479, 1527.2837]
2025-05-13 13:07:23,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [677.0, 745.0, 794.0, 590.0, 1000.0, 587.0, 1000.0, 580.0, 464.0, 441.0]
2025-05-13 13:07:23,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 23 minutes, 35 seconds)
2025-05-13 13:11:03,278 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:11:11,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 1869.21655 ± 975.716
2025-05-13 13:11:11,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [934.7716, 1426.6814, 1658.3129, 1573.3971, 3906.9714, 1781.3153, 1995.418, 1421.4646, 3395.1326, 598.70154]
2025-05-13 13:11:11,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [318.0, 396.0, 449.0, 430.0, 1000.0, 477.0, 529.0, 401.0, 1000.0, 196.0]
2025-05-13 13:11:11,706 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 20 minutes, 6 seconds)
2025-05-13 13:14:48,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:15:00,794 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2478.74219 ± 1080.527
2025-05-13 13:15:00,794 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3433.4417, 3164.6487, 3489.375, 1524.0992, 1830.8064, 3360.827, 2775.9702, 3541.2996, 317.87195, 1349.0848]
2025-05-13 13:15:00,794 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 949.0, 1000.0, 482.0, 525.0, 1000.0, 783.0, 1000.0, 211.0, 439.0]
2025-05-13 13:15:00,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 14 minutes, 22 seconds)
2025-05-13 13:18:46,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:19:00,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2950.87573 ± 955.273
2025-05-13 13:19:00,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3483.4749, 3407.8335, 2236.6624, 3391.104, 3558.2625, 355.43274, 3553.4724, 3085.351, 2831.727, 3605.4368]
2025-05-13 13:19:00,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 590.0, 1000.0, 1000.0, 169.0, 1000.0, 900.0, 767.0, 1000.0]
2025-05-13 13:19:00,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 11 minutes, 32 seconds)
2025-05-13 13:22:50,293 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:23:04,669 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2955.43335 ± 795.079
2025-05-13 13:23:04,669 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3545.706, 3476.7026, 3554.2415, 3337.4106, 853.86316, 3193.269, 3377.346, 2528.8838, 2431.9456, 3254.9673]
2025-05-13 13:23:04,670 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 299.0, 1000.0, 1000.0, 705.0, 669.0, 1000.0]
2025-05-13 13:23:04,683 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 8 minutes, 55 seconds)
2025-05-13 13:26:30,835 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:26:41,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2341.10913 ± 807.050
2025-05-13 13:26:41,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1911.1111, 1855.6913, 3836.4146, 2167.051, 1802.8173, 2069.415, 2614.099, 1012.29004, 3598.7842, 2543.4202]
2025-05-13 13:26:41,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [589.0, 528.0, 1000.0, 599.0, 504.0, 584.0, 754.0, 304.0, 1000.0, 722.0]
2025-05-13 13:26:41,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 3 minutes, 30 seconds)
2025-05-13 13:30:33,143 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:30:47,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3114.58569 ± 1101.298
2025-05-13 13:30:47,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3395.0386, 3725.7869, 3437.7424, 3689.558, 1812.6976, 3881.0754, 3397.4905, 282.83228, 3656.3892, 3867.2456]
2025-05-13 13:30:47,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 956.0, 1000.0, 533.0, 1000.0, 1000.0, 139.0, 1000.0, 1000.0]
2025-05-13 13:30:47,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (3114.59) for latency MM1Queue_a033_s075
2025-05-13 13:30:47,619 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 1 minute, 30 seconds)
2025-05-13 13:34:20,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:34:31,995 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2482.71338 ± 1157.182
2025-05-13 13:34:31,995 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3899.0664, 741.631, 1701.0906, 1941.0804, 3740.1885, 3841.3162, 2852.852, 3445.858, 1725.169, 938.8825]
2025-05-13 13:34:31,995 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 260.0, 473.0, 570.0, 1000.0, 1000.0, 733.0, 1000.0, 443.0, 283.0]
2025-05-13 13:34:32,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 57 minutes, 7 seconds)
2025-05-13 13:38:13,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:38:29,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3064.30249 ± 469.423
2025-05-13 13:38:29,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2218.0828, 3574.4673, 3273.0547, 2934.3752, 2154.7393, 3232.408, 3398.588, 3170.2178, 3470.5884, 3216.5042]
2025-05-13 13:38:29,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [701.0, 1000.0, 1000.0, 856.0, 629.0, 1000.0, 1000.0, 900.0, 1000.0, 1000.0]
2025-05-13 13:38:29,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 53 minutes)
2025-05-13 13:42:23,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:42:38,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3190.16675 ± 1052.730
2025-05-13 13:42:38,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3427.3213, 3733.8447, 1191.9934, 3622.3652, 3913.6917, 3752.3318, 3810.0706, 1015.4631, 3571.9985, 3862.587]
2025-05-13 13:42:38,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 357.0, 1000.0, 1000.0, 1000.0, 1000.0, 355.0, 1000.0, 1000.0]
2025-05-13 13:42:38,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (3190.17) for latency MM1Queue_a033_s075
2025-05-13 13:42:38,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 49 minutes, 31 seconds)
2025-05-13 13:46:20,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:46:37,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3667.60815 ± 142.480
2025-05-13 13:46:37,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3613.395, 3636.3298, 3765.6104, 3337.1985, 3532.357, 3749.6936, 3854.8682, 3733.346, 3659.8484, 3793.4368]
2025-05-13 13:46:37,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 996.0, 1000.0, 998.0, 1000.0, 1000.0, 946.0, 1000.0, 1000.0]
2025-05-13 13:46:37,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1226 [INFO]: New best (3667.61) for latency MM1Queue_a033_s075
2025-05-13 13:46:37,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 47 minutes, 38 seconds)
2025-05-13 13:50:19,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:50:33,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3100.13525 ± 961.016
2025-05-13 13:50:33,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3639.4282, 3642.1746, 3604.1902, 1306.9255, 2207.0508, 3945.6091, 1515.6187, 3844.1133, 3603.4177, 3692.8237]
2025-05-13 13:50:33,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 372.0, 623.0, 1000.0, 429.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:50:33,144 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 42 minutes, 44 seconds)
2025-05-13 13:54:21,620 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:54:34,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2840.31152 ± 1117.516
2025-05-13 13:54:34,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3665.8474, 2245.0876, 3607.4287, 557.3363, 3833.321, 3113.3984, 3704.5588, 3117.8137, 3531.7212, 1026.6014]
2025-05-13 13:54:34,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 615.0, 1000.0, 195.0, 1000.0, 872.0, 1000.0, 951.0, 1000.0, 311.0]
2025-05-13 13:54:34,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 40 minutes, 14 seconds)
2025-05-13 13:58:25,346 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:58:40,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3279.82349 ± 724.215
2025-05-13 13:58:40,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2169.6343, 3574.064, 1574.4711, 3358.8433, 3792.367, 3681.616, 3672.4565, 3672.953, 3611.9468, 3689.8792]
2025-05-13 13:58:40,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [625.0, 1000.0, 491.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:58:40,551 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 36 minutes, 54 seconds)
2025-05-13 14:02:39,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:02:52,878 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3087.41479 ± 729.747
2025-05-13 14:02:52,878 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3695.067, 3328.3743, 3158.7075, 3729.0464, 3716.0586, 1656.9757, 2051.0386, 2517.458, 3886.6091, 3134.8118]
2025-05-13 14:02:52,878 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 886.0, 874.0, 1000.0, 1000.0, 463.0, 534.0, 666.0, 1000.0, 813.0]
2025-05-13 14:02:52,893 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 33 minutes, 7 seconds)
2025-05-13 14:06:58,581 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:07:13,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3345.60498 ± 742.266
2025-05-13 14:07:13,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3940.1052, 3940.1218, 3568.564, 2979.3826, 4109.4307, 3846.8984, 2073.5813, 3253.2275, 3796.9412, 1947.797]
2025-05-13 14:07:13,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [973.0, 1000.0, 970.0, 782.0, 1000.0, 1000.0, 556.0, 858.0, 1000.0, 512.0]
2025-05-13 14:07:13,216 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 30 minutes, 36 seconds)
2025-05-13 14:11:08,108 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:11:22,064 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3128.92163 ± 873.921
2025-05-13 14:11:22,064 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3560.977, 1534.9001, 3715.547, 3322.1216, 1668.1133, 3598.61, 3931.1443, 2346.0623, 3785.5881, 3826.153]
2025-05-13 14:11:22,064 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [922.0, 429.0, 1000.0, 1000.0, 462.0, 1000.0, 1000.0, 632.0, 1000.0, 1000.0]
2025-05-13 14:11:22,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 27 minutes, 25 seconds)
2025-05-13 14:14:57,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:15:11,503 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3225.72632 ± 627.618
2025-05-13 14:15:11,503 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3787.7346, 3160.2434, 1837.7655, 3375.2122, 3842.2705, 2805.9207, 3643.2861, 3690.8298, 2479.8772, 3634.1248]
2025-05-13 14:15:11,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 816.0, 503.0, 1000.0, 1000.0, 673.0, 1000.0, 1000.0, 690.0, 1000.0]
2025-05-13 14:15:11,517 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 22 minutes, 26 seconds)
2025-05-13 14:19:08,929 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:19:23,305 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3129.37085 ± 887.431
2025-05-13 14:19:23,305 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1938.3103, 3370.235, 3629.9995, 3718.8135, 3740.9963, 3486.3604, 1070.4388, 2735.377, 3780.213, 3822.9656]
2025-05-13 14:19:23,305 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [527.0, 863.0, 1000.0, 1000.0, 1000.0, 1000.0, 326.0, 831.0, 1000.0, 1000.0]
2025-05-13 14:19:23,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 18 minutes, 42 seconds)
2025-05-13 14:23:25,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:23:39,616 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2880.67041 ± 1120.906
2025-05-13 14:23:39,616 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2516.1343, 2126.5447, 3485.9585, 3761.9375, 96.21008, 3737.8533, 2172.5193, 3740.3862, 3593.7324, 3575.4282]
2025-05-13 14:23:39,616 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [710.0, 610.0, 1000.0, 1000.0, 59.0, 1000.0, 659.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:23:39,628 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 14 minutes, 48 seconds)
2025-05-13 14:27:26,131 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:27:41,866 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3404.70435 ± 537.506
2025-05-13 14:27:41,866 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3786.7874, 3748.968, 3522.8113, 3574.7961, 3661.7532, 3726.096, 1859.3904, 3309.699, 3512.727, 3344.0146]
2025-05-13 14:27:41,866 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 568.0, 898.0, 1000.0, 903.0]
2025-05-13 14:27:41,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 9 minutes, 37 seconds)
2025-05-13 14:31:30,797 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:31:45,027 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2897.29346 ± 783.646
2025-05-13 14:31:45,027 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [1979.6879, 2725.4744, 1523.6221, 3427.391, 1789.6454, 3417.6296, 3586.353, 3594.2173, 3460.8079, 3468.106]
2025-05-13 14:31:45,027 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [631.0, 739.0, 514.0, 1000.0, 556.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:31:45,042 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 5 minutes, 13 seconds)
2025-05-13 14:35:37,595 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:35:54,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3664.39648 ± 163.573
2025-05-13 14:35:54,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3842.7998, 3738.2468, 3700.31, 3750.4766, 3941.8306, 3430.0823, 3425.5283, 3495.7578, 3613.8477, 3705.0833]
2025-05-13 14:35:54,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 882.0, 1000.0, 936.0, 1000.0, 1000.0]
2025-05-13 14:35:54,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 2 minutes, 7 seconds)
2025-05-13 14:39:51,015 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:40:05,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3410.74365 ± 670.565
2025-05-13 14:40:05,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3951.132, 3849.998, 4017.463, 3861.2603, 2849.1807, 2372.621, 3794.5244, 3457.4932, 2104.662, 3849.104]
2025-05-13 14:40:05,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 717.0, 599.0, 1000.0, 922.0, 576.0, 1000.0]
2025-05-13 14:40:05,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 57 minutes, 59 seconds)
2025-05-13 14:43:52,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:44:06,887 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3222.44336 ± 814.215
2025-05-13 14:44:06,888 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2847.2427, 3755.0015, 3852.0415, 2064.9653, 3996.0532, 1753.3809, 2456.7102, 3769.8337, 3836.5273, 3892.6755]
2025-05-13 14:44:06,888 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [789.0, 1000.0, 1000.0, 568.0, 1000.0, 528.0, 692.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:44:06,902 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 53 minutes, 10 seconds)
2025-05-13 14:48:11,032 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:48:25,456 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3234.67798 ± 805.610
2025-05-13 14:48:25,456 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2470.413, 2868.3, 3844.083, 1286.8079, 4049.495, 3486.88, 3535.228, 3854.8062, 3854.7708, 3095.9958]
2025-05-13 14:48:25,456 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [680.0, 774.0, 1000.0, 397.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 812.0]
2025-05-13 14:48:25,471 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 49 minutes, 44 seconds)
2025-05-13 14:52:07,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:52:22,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3209.29053 ± 765.447
2025-05-13 14:52:22,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3805.6523, 2413.5752, 3709.1038, 4008.7825, 3491.5398, 3501.2075, 3280.5046, 3964.2815, 2199.4424, 1718.8136]
2025-05-13 14:52:22,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 672.0, 1000.0, 1000.0, 1000.0, 880.0, 844.0, 1000.0, 600.0, 473.0]
2025-05-13 14:52:22,068 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 45 minutes, 21 seconds)
2025-05-13 14:56:05,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:56:20,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3405.80127 ± 713.939
2025-05-13 14:56:20,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3748.1128, 3852.7046, 4083.882, 3990.3118, 3619.761, 3894.792, 2372.9907, 2481.9316, 3859.7432, 2153.7812]
2025-05-13 14:56:20,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 988.0, 1000.0, 931.0, 1000.0, 594.0, 701.0, 1000.0, 583.0]
2025-05-13 14:56:20,707 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 40 minutes, 53 seconds)
2025-05-13 15:00:11,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:00:25,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3191.97876 ± 1128.388
2025-05-13 15:00:25,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3824.6304, 2939.5918, 3958.9854, 94.05467, 3854.8013, 3566.3555, 3744.0862, 2479.0745, 3490.7083, 3967.5007]
2025-05-13 15:00:25,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 768.0, 1000.0, 68.0, 1000.0, 859.0, 1000.0, 611.0, 852.0, 1000.0]
2025-05-13 15:00:25,359 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 36 minutes, 35 seconds)
2025-05-13 15:04:27,543 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:04:42,227 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3196.94531 ± 1052.541
2025-05-13 15:04:42,227 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3869.5137, 3678.2695, 3625.6367, 3484.665, 3577.3167, 3642.1123, 3234.9053, 3647.7688, 103.06947, 3106.1963]
2025-05-13 15:04:42,227 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 831.0, 1000.0, 63.0, 787.0]
2025-05-13 15:04:42,240 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 32 minutes, 56 seconds)
2025-05-13 15:08:18,137 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:08:30,727 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2745.81494 ± 1455.564
2025-05-13 15:08:30,727 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3506.8547, 3827.9402, 3994.8867, 962.953, 3753.6042, 3692.1104, 3739.4084, 324.80707, 3285.968, 369.61295]
2025-05-13 15:08:30,727 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 285.0, 1000.0, 1000.0, 1000.0, 135.0, 942.0, 153.0]
2025-05-13 15:08:30,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 28 minutes, 7 seconds)
2025-05-13 15:12:10,392 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:12:23,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3034.12842 ± 1164.343
2025-05-13 15:12:23,576 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3888.7263, 3116.5784, 1696.5692, 3433.6438, 3819.353, 2730.2837, 172.35207, 4060.7417, 3731.9016, 3691.1338]
2025-05-13 15:12:23,576 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 784.0, 447.0, 867.0, 1000.0, 713.0, 119.0, 1000.0, 1000.0, 901.0]
2025-05-13 15:12:23,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 24 minutes, 1 second)
2025-05-13 15:16:09,854 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:16:22,964 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3123.77393 ± 1109.148
2025-05-13 15:16:22,964 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3056.7695, 1944.1555, 4021.4553, 4047.9001, 3799.6372, 914.4642, 4083.4114, 3989.291, 1725.2773, 3655.3765]
2025-05-13 15:16:22,964 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [801.0, 509.0, 1000.0, 1000.0, 1000.0, 273.0, 1000.0, 1000.0, 478.0, 852.0]
2025-05-13 15:16:22,980 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 20 minutes, 2 seconds)
2025-05-13 15:20:00,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:20:15,347 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3293.22607 ± 810.022
2025-05-13 15:20:15,347 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3848.5315, 3841.6672, 2384.877, 1459.9021, 3742.7993, 3956.4688, 3722.8682, 3611.9282, 2544.0564, 3819.1611]
2025-05-13 15:20:15,347 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 616.0, 452.0, 1000.0, 1000.0, 1000.0, 1000.0, 682.0, 1000.0]
2025-05-13 15:20:15,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 15 minutes, 52 seconds)
2025-05-13 15:24:16,272 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:24:30,188 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3200.91113 ± 794.328
2025-05-13 15:24:30,188 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2951.7812, 2929.2625, 2354.2307, 2821.1155, 4022.3376, 3980.497, 1549.5103, 3935.4575, 3417.5464, 4047.3708]
2025-05-13 15:24:30,188 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [733.0, 762.0, 655.0, 725.0, 1000.0, 1000.0, 456.0, 1000.0, 895.0, 1000.0]
2025-05-13 15:24:30,201 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 11 minutes, 52 seconds)
2025-05-13 15:28:13,979 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:28:28,044 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3180.68506 ± 1249.729
2025-05-13 15:28:28,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [260.18225, 3384.6384, 3554.3118, 4076.9465, 4116.687, 3973.9355, 3812.9048, 1259.0879, 3686.8867, 3681.267]
2025-05-13 15:28:28,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [144.0, 886.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 365.0, 930.0, 1000.0]
2025-05-13 15:28:28,059 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 7 minutes, 58 seconds)
2025-05-13 15:32:30,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:32:42,099 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 2496.00610 ± 841.877
2025-05-13 15:32:42,099 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [2219.965, 2247.2478, 2666.6042, 2736.6296, 2574.2927, 1161.8739, 1737.0782, 3568.524, 1842.2333, 4205.6133]
2025-05-13 15:32:42,099 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [594.0, 648.0, 721.0, 735.0, 718.0, 363.0, 468.0, 1000.0, 501.0, 1000.0]
2025-05-13 15:32:42,115 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 3 seconds)
2025-05-13 15:36:22,089 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:36:37,204 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1221 [DEBUG]: Total Reward: 3457.33740 ± 734.679
2025-05-13 15:36:37,204 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1222 [DEBUG]: All rewards: [3853.0781, 3806.5054, 3957.95, 2038.8416, 3991.356, 2199.8037, 3909.795, 4021.0698, 2917.8977, 3877.0796]
2025-05-13 15:36:37,205 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 555.0, 1000.0, 605.0, 1000.0, 1000.0, 768.0, 1000.0]
2025-05-13 15:36:37,220 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1251 [DEBUG]: Training session finished
