2026-01-23 01:59:34,300 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-walker2d/DatasetOffice-bpql-mda-mem2
2026-01-23 01:59:34,300 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-walker2d/DatasetOffice-bpql-mda-mem2
2026-01-23 01:59:34,300 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x145d45467690>}
2026-01-23 01:59:34,300 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1159 [DEBUG]: using device: cuda
2026-01-23 01:59:34,442 baseline-bpql-mda-noisy-walker2d:91 [WARNING]: args.assumed_delay != args.horizon: 2 != 32
2026-01-23 01:59:34,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1181 [INFO]: Creating new trainer
2026-01-23 01:59:34,460 baseline-bpql-mda-noisy-walker2d:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2026-01-23 01:59:34,460 baseline-bpql-mda-noisy-walker2d:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 01:59:34,465 baseline-bpql-mda-noisy-walker2d:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(6, 384, batch_first=True)
)
2026-01-23 01:59:35,309 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1242 [DEBUG]: Starting training session...
2026-01-23 01:59:35,310 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 1/100
2026-01-23 02:03:00,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:03:01,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 62.23919 ± 12.408
2026-01-23 02:03:01,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [69.76216, 51.237823, 66.40165, 59.532387, 56.637043, 57.935352, 55.030853, 95.81732, 56.3804, 53.656834]
2026-01-23 02:03:01,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [97.0, 63.0, 83.0, 70.0, 68.0, 70.0, 67.0, 120.0, 66.0, 65.0]
2026-01-23 02:03:01,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (62.24) for latency DatasetOffice
2026-01-23 02:03:01,867 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 40 minutes, 49 seconds)
2026-01-23 02:06:46,387 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:06:51,460 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 419.86386 ± 221.357
2026-01-23 02:06:51,460 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [296.5058, 279.45877, 245.5864, 271.83188, 388.97134, 876.4652, 832.2616, 370.5901, 330.22702, 306.7406]
2026-01-23 02:06:51,460 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [173.0, 157.0, 131.0, 153.0, 264.0, 1000.0, 1000.0, 250.0, 206.0, 181.0]
2026-01-23 02:06:51,460 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (419.86) for latency DatasetOffice
2026-01-23 02:06:51,464 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 56 minutes, 11 seconds)
2026-01-23 02:10:23,212 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:10:27,797 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 393.15363 ± 249.540
2026-01-23 02:10:27,797 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [572.2754, 359.46774, 295.38882, 389.0208, 360.9527, 34.33147, 393.2041, 260.5391, 1031.4911, 234.86462]
2026-01-23 02:10:27,797 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [445.0, 233.0, 176.0, 263.0, 452.0, 78.0, 254.0, 141.0, 991.0, 251.0]
2026-01-23 02:10:27,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 51 minutes, 37 seconds)
2026-01-23 02:14:01,197 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:14:06,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 351.78455 ± 216.874
2026-01-23 02:14:06,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [419.35825, 333.7561, 795.7001, 262.5849, 175.193, 537.30524, 58.488472, 48.51007, 440.42508, 446.52435]
2026-01-23 02:14:06,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [333.0, 234.0, 850.0, 171.0, 386.0, 549.0, 201.0, 244.0, 409.0, 346.0]
2026-01-23 02:14:06,337 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 48 minutes, 24 seconds)
2026-01-23 02:17:40,621 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:17:42,154 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 225.00285 ± 31.221
2026-01-23 02:17:42,154 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [232.65288, 190.04417, 197.82436, 271.54147, 245.84444, 203.64796, 186.99861, 200.32475, 267.2026, 253.9473]
2026-01-23 02:17:42,154 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [114.0, 102.0, 101.0, 127.0, 113.0, 106.0, 99.0, 104.0, 124.0, 150.0]
2026-01-23 02:17:42,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 44 minutes, 10 seconds)
2026-01-23 02:21:18,375 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:21:20,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 300.76517 ± 24.148
2026-01-23 02:21:20,453 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [282.28802, 275.71902, 295.42557, 305.91803, 306.10883, 324.591, 332.54633, 332.92, 297.83044, 254.30446]
2026-01-23 02:21:20,453 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [129.0, 132.0, 148.0, 144.0, 170.0, 154.0, 157.0, 207.0, 152.0, 131.0]
2026-01-23 02:21:20,458 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 44 minutes, 13 seconds)
2026-01-23 02:24:52,517 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:24:57,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 554.97034 ± 277.920
2026-01-23 02:24:57,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1165.4591, 974.0669, 551.90625, 467.64917, 589.9683, 409.20337, 364.00684, 242.20155, 451.54428, 333.69745]
2026-01-23 02:24:57,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 727.0, 304.0, 229.0, 301.0, 231.0, 322.0, 202.0, 266.0, 300.0]
2026-01-23 02:24:57,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (554.97) for latency DatasetOffice
2026-01-23 02:24:57,982 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 36 minutes, 49 seconds)
2026-01-23 02:28:34,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:28:36,284 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 304.01160 ± 192.783
2026-01-23 02:28:36,284 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [416.76556, 478.15164, 475.06952, 3.7919533, 134.26341, 36.89081, 122.95414, 479.13208, 496.41455, 396.68243]
2026-01-23 02:28:36,284 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [191.0, 216.0, 216.0, 13.0, 79.0, 50.0, 127.0, 212.0, 230.0, 189.0]
2026-01-23 02:28:36,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 33 minutes, 48 seconds)
2026-01-23 02:32:09,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:32:12,712 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 453.54913 ± 34.168
2026-01-23 02:32:12,712 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [391.07672, 438.84146, 450.8791, 421.48444, 439.58865, 474.34338, 473.97244, 438.67285, 515.58295, 491.04932]
2026-01-23 02:32:12,712 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [185.0, 192.0, 213.0, 179.0, 185.0, 205.0, 226.0, 197.0, 251.0, 233.0]
2026-01-23 02:32:12,715 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 29 minutes, 32 seconds)
2026-01-23 02:35:46,396 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:35:48,988 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 359.52115 ± 24.667
2026-01-23 02:35:48,988 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [363.48492, 356.5709, 340.53693, 343.19226, 383.37045, 352.78, 362.91962, 364.3082, 315.32065, 412.72775]
2026-01-23 02:35:48,988 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [178.0, 195.0, 176.0, 183.0, 205.0, 186.0, 197.0, 202.0, 155.0, 226.0]
2026-01-23 02:35:48,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 26 minutes, 3 seconds)
2026-01-23 02:39:23,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:39:25,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 372.54156 ± 56.307
2026-01-23 02:39:25,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [362.38245, 370.98108, 332.97296, 318.18967, 350.27374, 455.87015, 340.63068, 333.89844, 357.65024, 502.56613]
2026-01-23 02:39:25,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [179.0, 189.0, 183.0, 171.0, 185.0, 204.0, 178.0, 174.0, 176.0, 211.0]
2026-01-23 02:39:25,802 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 21 minutes, 59 seconds)
2026-01-23 02:43:01,639 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:43:04,450 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 370.44635 ± 33.692
2026-01-23 02:43:04,450 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [352.7033, 374.26172, 371.99933, 348.9357, 340.08044, 444.30252, 345.79028, 342.4557, 361.45096, 422.4835]
2026-01-23 02:43:04,450 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [193.0, 200.0, 206.0, 190.0, 214.0, 275.0, 193.0, 186.0, 202.0, 219.0]
2026-01-23 02:43:04,454 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 18 minutes, 41 seconds)
2026-01-23 02:46:40,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:46:43,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 389.35614 ± 59.535
2026-01-23 02:46:43,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [309.1305, 337.56458, 347.2531, 347.6758, 402.17618, 381.73477, 375.28943, 517.996, 412.5024, 462.23846]
2026-01-23 02:46:43,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [178.0, 206.0, 197.0, 198.0, 213.0, 196.0, 206.0, 257.0, 207.0, 215.0]
2026-01-23 02:46:43,514 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 15 minutes, 17 seconds)
2026-01-23 02:50:18,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:50:21,349 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 380.01233 ± 68.864
2026-01-23 02:50:21,349 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [363.21344, 355.6431, 349.77972, 312.17505, 520.4055, 372.6626, 406.8607, 332.14066, 299.06223, 488.18015]
2026-01-23 02:50:21,349 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [198.0, 195.0, 193.0, 176.0, 255.0, 185.0, 190.0, 179.0, 160.0, 219.0]
2026-01-23 02:50:21,353 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 12 minutes, 4 seconds)
2026-01-23 02:53:54,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:53:57,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 386.10074 ± 67.172
2026-01-23 02:53:57,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [499.6374, 371.39554, 374.6455, 399.81738, 342.73944, 518.7257, 363.12085, 302.90076, 368.82703, 319.19763]
2026-01-23 02:53:57,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [235.0, 203.0, 231.0, 215.0, 179.0, 310.0, 180.0, 162.0, 219.0, 183.0]
2026-01-23 02:53:57,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 8 minutes, 23 seconds)
2026-01-23 02:57:30,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:57:34,768 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 658.49451 ± 325.111
2026-01-23 02:57:34,768 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [767.87714, 597.9371, 1323.4298, 376.35712, 429.36752, 920.08386, 372.69104, 383.29935, 1050.8695, 363.03214]
2026-01-23 02:57:34,768 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [327.0, 263.0, 634.0, 181.0, 199.0, 340.0, 166.0, 188.0, 453.0, 176.0]
2026-01-23 02:57:34,768 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (658.49) for latency DatasetOffice
2026-01-23 02:57:34,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 4 minutes, 54 seconds)
2026-01-23 03:01:08,206 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:01:11,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 561.17297 ± 146.011
2026-01-23 03:01:11,557 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [676.0508, 418.17725, 742.1446, 362.21866, 650.4531, 510.0043, 630.7632, 466.83365, 780.45404, 374.63034]
2026-01-23 03:01:11,557 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [281.0, 223.0, 306.0, 183.0, 276.0, 219.0, 255.0, 215.0, 313.0, 198.0]
2026-01-23 03:01:11,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 46 seconds)
2026-01-23 03:04:50,596 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:04:54,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 720.74921 ± 154.646
2026-01-23 03:04:54,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [544.9555, 680.5949, 982.98315, 596.555, 920.6526, 731.5379, 816.6484, 559.89124, 836.83514, 536.8386]
2026-01-23 03:04:54,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [261.0, 320.0, 412.0, 276.0, 387.0, 338.0, 362.0, 272.0, 342.0, 228.0]
2026-01-23 03:04:54,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (720.75) for latency DatasetOffice
2026-01-23 03:04:54,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 58 minutes, 19 seconds)
2026-01-23 03:08:27,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:08:30,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 549.57373 ± 85.317
2026-01-23 03:08:30,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [547.63666, 343.079, 554.86176, 546.3292, 571.12744, 534.59344, 714.4051, 527.7621, 586.9524, 568.9905]
2026-01-23 03:08:30,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [241.0, 153.0, 227.0, 204.0, 234.0, 211.0, 273.0, 198.0, 253.0, 243.0]
2026-01-23 03:08:30,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 53 minutes, 59 seconds)
2026-01-23 03:12:08,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:12:11,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 483.00409 ± 78.873
2026-01-23 03:12:11,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [511.8117, 508.99753, 262.416, 517.98645, 427.4004, 512.3054, 530.7308, 517.60333, 503.07526, 537.7144]
2026-01-23 03:12:11,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [210.0, 201.0, 131.0, 217.0, 174.0, 209.0, 233.0, 211.0, 189.0, 228.0]
2026-01-23 03:12:11,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 51 minutes, 39 seconds)
2026-01-23 03:15:44,768 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:15:48,073 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 546.10046 ± 175.856
2026-01-23 03:15:48,073 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [555.0282, 564.6414, 572.0176, 538.4127, 521.17816, 546.85895, 919.66516, 550.455, 136.45964, 556.2881]
2026-01-23 03:15:48,073 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [225.0, 245.0, 227.0, 224.0, 194.0, 222.0, 376.0, 209.0, 223.0, 299.0]
2026-01-23 03:15:48,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 47 minutes, 54 seconds)
2026-01-23 03:19:20,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:19:24,651 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 650.23175 ± 139.155
2026-01-23 03:19:24,651 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [550.90625, 574.33264, 544.8684, 600.6932, 608.353, 699.37, 532.2827, 848.60535, 574.42664, 968.47925]
2026-01-23 03:19:24,651 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [238.0, 239.0, 239.0, 242.0, 269.0, 265.0, 233.0, 338.0, 262.0, 366.0]
2026-01-23 03:19:24,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 44 minutes, 12 seconds)
2026-01-23 03:22:59,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:23:05,637 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1179.98315 ± 156.629
2026-01-23 03:23:05,637 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1022.99774, 1277.3368, 978.8645, 998.6884, 1400.4606, 1242.2228, 1209.4253, 1423.1388, 1220.8053, 1025.8912]
2026-01-23 03:23:05,637 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [382.0, 442.0, 392.0, 370.0, 491.0, 425.0, 434.0, 542.0, 405.0, 370.0]
2026-01-23 03:23:05,637 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (1179.98) for latency DatasetOffice
2026-01-23 03:23:05,643 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 39 minutes, 56 seconds)
2026-01-23 03:26:44,991 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:26:49,855 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 932.14520 ± 396.773
2026-01-23 03:26:49,856 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1153.6613, 923.91473, 408.53436, 1160.9635, 1178.1003, 5.4527063, 946.0528, 1378.0447, 1214.9247, 951.8039]
2026-01-23 03:26:49,856 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [415.0, 338.0, 218.0, 436.0, 420.0, 15.0, 350.0, 479.0, 445.0, 391.0]
2026-01-23 03:26:49,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 38 minutes, 35 seconds)
2026-01-23 03:30:20,481 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:30:30,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1860.03149 ± 592.044
2026-01-23 03:30:30,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [2237.244, 938.74054, 1235.588, 2606.8633, 2353.7075, 1155.0381, 2162.7332, 2652.0571, 1620.4185, 1637.9253]
2026-01-23 03:30:30,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [799.0, 364.0, 427.0, 1000.0, 1000.0, 396.0, 735.0, 1000.0, 616.0, 625.0]
2026-01-23 03:30:30,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (1860.03) for latency DatasetOffice
2026-01-23 03:30:30,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 34 minutes, 47 seconds)
2026-01-23 03:34:07,992 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:34:11,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 594.08356 ± 283.624
2026-01-23 03:34:11,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [17.612875, 266.83176, 931.51965, 592.3284, 788.6385, 872.2122, 325.0101, 670.2687, 828.18896, 648.22473]
2026-01-23 03:34:11,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [29.0, 153.0, 334.0, 194.0, 292.0, 315.0, 173.0, 265.0, 303.0, 226.0]
2026-01-23 03:34:11,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 32 minutes, 4 seconds)
2026-01-23 03:37:50,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:37:54,888 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 835.34900 ± 146.788
2026-01-23 03:37:54,888 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [864.2208, 762.5329, 740.67896, 697.3941, 1075.6956, 832.32745, 709.9114, 634.9557, 995.20245, 1040.5703]
2026-01-23 03:37:54,888 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [319.0, 317.0, 303.0, 283.0, 373.0, 309.0, 294.0, 267.0, 348.0, 382.0]
2026-01-23 03:37:54,893 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 30 minutes, 9 seconds)
2026-01-23 03:41:32,378 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:41:36,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 742.45032 ± 348.537
2026-01-23 03:41:36,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [925.2998, 1120.1964, 370.36826, -3.6466224, 493.75714, 1214.3796, 755.67316, 974.663, 808.5099, 765.3031]
2026-01-23 03:41:36,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [336.0, 350.0, 190.0, 8.0, 238.0, 406.0, 296.0, 328.0, 255.0, 277.0]
2026-01-23 03:41:36,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 26 minutes, 30 seconds)
2026-01-23 03:45:11,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:45:17,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1320.82898 ± 926.511
2026-01-23 03:45:17,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1145.1128, 673.3373, 381.7297, 322.1039, 1263.1113, 2825.705, 225.02884, 2824.5964, 2012.8696, 1534.6967]
2026-01-23 03:45:17,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [395.0, 215.0, 137.0, 128.0, 416.0, 1000.0, 241.0, 1000.0, 765.0, 529.0]
2026-01-23 03:45:17,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 22 minutes, 14 seconds)
2026-01-23 03:48:51,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:48:56,155 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1057.57104 ± 164.747
2026-01-23 03:48:56,155 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [924.455, 928.1317, 1027.895, 1083.1995, 1372.1472, 1306.4364, 848.23944, 901.3545, 1047.6038, 1136.247]
2026-01-23 03:48:56,155 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [320.0, 332.0, 353.0, 366.0, 426.0, 436.0, 298.0, 306.0, 353.0, 375.0]
2026-01-23 03:48:56,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 18 minutes, 1 second)
2026-01-23 03:52:30,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:52:35,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1133.16541 ± 44.661
2026-01-23 03:52:35,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1188.0526, 1202.3353, 1154.2631, 1099.9369, 1180.9835, 1097.1636, 1150.1741, 1083.8157, 1079.87, 1095.0598]
2026-01-23 03:52:35,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [400.0, 404.0, 404.0, 363.0, 397.0, 365.0, 390.0, 366.0, 366.0, 372.0]
2026-01-23 03:52:35,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 13 minutes, 58 seconds)
2026-01-23 03:56:08,666 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:56:13,824 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1084.30994 ± 445.942
2026-01-23 03:56:13,824 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [16.690168, 1153.3281, 1083.3811, 1989.637, 1115.9845, 1133.7097, 1047.0515, 1212.2148, 1034.208, 1056.8949]
2026-01-23 03:56:13,824 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [27.0, 445.0, 377.0, 691.0, 371.0, 394.0, 366.0, 399.0, 374.0, 362.0]
2026-01-23 03:56:13,829 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 9 minutes, 5 seconds)
2026-01-23 03:59:46,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:59:54,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1607.94470 ± 391.632
2026-01-23 03:59:54,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1547.8882, 1039.7021, 1197.4656, 1985.3699, 1220.9452, 1829.001, 2238.4727, 1923.3119, 1859.7834, 1237.5073]
2026-01-23 03:59:54,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [689.0, 375.0, 408.0, 631.0, 426.0, 1000.0, 773.0, 661.0, 579.0, 435.0]
2026-01-23 03:59:54,960 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 5 minutes, 25 seconds)
2026-01-23 04:03:29,410 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:03:35,588 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1284.85095 ± 445.202
2026-01-23 04:03:35,588 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [155.4116, 1364.3091, 1386.7009, 1310.6418, 1049.3502, 1531.8635, 1373.145, 1298.2717, 2033.8468, 1344.9686]
2026-01-23 04:03:35,588 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [76.0, 463.0, 482.0, 456.0, 384.0, 508.0, 463.0, 461.0, 672.0, 499.0]
2026-01-23 04:03:35,593 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 1 minute, 29 seconds)
2026-01-23 04:07:14,103 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:07:17,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 661.53552 ± 668.088
2026-01-23 04:07:17,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1306.075, 1319.282, 1371.0753, 1735.8162, 667.12836, 22.240477, -0.23011938, -39.25799, 2.218413, 231.0075]
2026-01-23 04:07:17,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [437.0, 450.0, 451.0, 547.0, 261.0, 31.0, 11.0, 105.0, 14.0, 113.0]
2026-01-23 04:07:17,429 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 58 minutes, 36 seconds)
2026-01-23 04:10:56,593 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:11:02,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1252.47913 ± 315.503
2026-01-23 04:11:02,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1392.8906, 1373.3307, 312.81943, 1282.0023, 1359.2183, 1310.3451, 1376.5365, 1323.3917, 1385.7194, 1408.5364]
2026-01-23 04:11:02,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [464.0, 454.0, 180.0, 428.0, 452.0, 439.0, 490.0, 445.0, 453.0, 461.0]
2026-01-23 04:11:02,462 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 56 minutes, 11 seconds)
2026-01-23 04:14:37,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:14:44,211 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1438.16577 ± 619.965
2026-01-23 04:14:44,212 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [2321.3984, 2016.0382, 1319.7268, 1520.187, 1422.8007, 1534.7555, 335.4562, 1846.5564, 329.83435, 1734.9045]
2026-01-23 04:14:44,212 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [717.0, 631.0, 420.0, 499.0, 446.0, 459.0, 130.0, 577.0, 133.0, 563.0]
2026-01-23 04:14:44,216 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 53 minutes, 10 seconds)
2026-01-23 04:18:24,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:18:32,938 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1661.78003 ± 812.869
2026-01-23 04:18:32,938 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1799.1627, 1825.0597, 2672.2334, 26.443216, 1312.4592, 1404.6057, 3110.3438, 1594.9828, 1923.0804, 949.43036]
2026-01-23 04:18:32,938 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [612.0, 591.0, 1000.0, 35.0, 450.0, 426.0, 1000.0, 541.0, 685.0, 375.0]
2026-01-23 04:18:32,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 51 minutes, 2 seconds)
2026-01-23 04:21:59,619 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:22:08,309 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1949.87622 ± 1176.427
2026-01-23 04:22:08,309 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3189.361, 308.66956, 2313.9014, 215.96645, 2671.6255, 3294.9229, 597.43304, 2468.0881, 1245.2406, 3193.5537]
2026-01-23 04:22:08,309 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 128.0, 694.0, 151.0, 837.0, 1000.0, 191.0, 759.0, 378.0, 937.0]
2026-01-23 04:22:08,309 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (1949.88) for latency DatasetOffice
2026-01-23 04:22:08,316 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 46 minutes, 15 seconds)
2026-01-23 04:25:43,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:25:49,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1351.02185 ± 500.802
2026-01-23 04:25:49,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1403.7067, 1536.6794, 1603.1981, 2077.7717, 1379.7572, 721.5114, 1311.3383, 1683.5642, 204.863, 1587.828]
2026-01-23 04:25:49,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [465.0, 447.0, 517.0, 651.0, 436.0, 254.0, 388.0, 490.0, 128.0, 491.0]
2026-01-23 04:25:49,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 42 minutes, 22 seconds)
2026-01-23 04:29:38,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:29:45,209 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1516.78186 ± 566.229
2026-01-23 04:29:45,209 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1080.0516, 1350.5326, 1563.3245, 2472.691, 1080.3331, 1393.9303, 1350.1342, 1111.7338, 2721.2568, 1043.8315]
2026-01-23 04:29:45,209 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [373.0, 420.0, 456.0, 737.0, 345.0, 431.0, 456.0, 377.0, 957.0, 331.0]
2026-01-23 04:29:45,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 40 minutes, 48 seconds)
2026-01-23 04:33:04,564 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:33:08,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 786.43915 ± 707.461
2026-01-23 04:33:08,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [595.15704, 531.6654, 18.399717, 395.06036, 107.54381, 1943.2401, 357.58923, 1937.9927, 1609.121, 368.62164]
2026-01-23 04:33:08,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [220.0, 220.0, 32.0, 173.0, 62.0, 569.0, 132.0, 566.0, 534.0, 132.0]
2026-01-23 04:33:08,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 33 minutes, 25 seconds)
2026-01-23 04:36:42,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:36:50,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2077.75000 ± 1083.594
2026-01-23 04:36:50,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1536.8882, 3155.1638, 393.8002, 3292.8176, 357.44025, 3295.208, 1893.5575, 1855.3118, 3239.2434, 1758.0691]
2026-01-23 04:36:50,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [433.0, 1000.0, 152.0, 1000.0, 178.0, 1000.0, 607.0, 570.0, 1000.0, 542.0]
2026-01-23 04:36:50,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (2077.75) for latency DatasetOffice
2026-01-23 04:36:50,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 28 minutes, 37 seconds)
2026-01-23 04:40:28,768 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:40:34,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1404.14722 ± 565.821
2026-01-23 04:40:34,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [2311.9146, 1439.4406, 870.74475, 1856.5579, 1818.5125, 901.0319, 2089.4468, 859.04065, 574.432, 1320.3512]
2026-01-23 04:40:34,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [702.0, 465.0, 275.0, 509.0, 561.0, 336.0, 655.0, 319.0, 215.0, 403.0]
2026-01-23 04:40:34,851 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 26 minutes, 33 seconds)
2026-01-23 04:44:07,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:44:12,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1349.77161 ± 868.374
2026-01-23 04:44:12,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1752.403, 2487.0754, 347.79898, 1474.0759, 1231.7094, 2455.8696, 583.53314, 374.6439, 2464.467, 326.1398]
2026-01-23 04:44:12,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [548.0, 724.0, 164.0, 424.0, 410.0, 663.0, 222.0, 158.0, 779.0, 123.0]
2026-01-23 04:44:12,851 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 22 minutes, 18 seconds)
2026-01-23 04:47:44,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:47:52,738 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1862.22327 ± 924.525
2026-01-23 04:47:52,738 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [353.82718, 2818.501, 2653.1758, 1144.5715, 3133.4866, 1887.2809, 1531.9409, 777.3338, 1407.5028, 2914.6125]
2026-01-23 04:47:52,738 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [139.0, 800.0, 802.0, 378.0, 1000.0, 555.0, 477.0, 245.0, 419.0, 826.0]
2026-01-23 04:47:52,744 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 15 minutes, 45 seconds)
2026-01-23 04:51:22,661 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:51:31,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2137.66260 ± 720.456
2026-01-23 04:51:31,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [855.4349, 2039.3993, 2412.1133, 2379.4966, 3082.5896, 2497.097, 2742.9287, 2837.9348, 1280.0663, 1249.5668]
2026-01-23 04:51:31,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [299.0, 625.0, 752.0, 726.0, 1000.0, 738.0, 822.0, 874.0, 419.0, 409.0]
2026-01-23 04:51:31,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (2137.66) for latency DatasetOffice
2026-01-23 04:51:31,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 14 minutes, 58 seconds)
2026-01-23 04:55:02,828 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:55:13,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2513.06372 ± 956.305
2026-01-23 04:55:13,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3303.5042, 3407.6194, 1412.1353, 3429.274, 1254.2826, 1402.6058, 3534.6057, 3315.075, 1440.5165, 2631.0186]
2026-01-23 04:55:13,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 433.0, 1000.0, 382.0, 429.0, 1000.0, 1000.0, 444.0, 808.0]
2026-01-23 04:55:13,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (2513.06) for latency DatasetOffice
2026-01-23 04:55:13,138 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 11 minutes, 2 seconds)
2026-01-23 04:58:46,042 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:58:56,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2376.13599 ± 813.242
2026-01-23 04:58:56,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [2316.8416, 2954.5217, 1335.293, 3446.9307, 987.9973, 3015.5828, 2745.2764, 2099.179, 3293.3086, 1566.4294]
2026-01-23 04:58:56,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [689.0, 914.0, 411.0, 1000.0, 330.0, 881.0, 892.0, 686.0, 1000.0, 536.0]
2026-01-23 04:58:56,186 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 7 minutes, 13 seconds)
2026-01-23 05:02:31,662 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:02:42,899 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2647.07080 ± 716.420
2026-01-23 05:02:42,899 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [2052.7786, 3292.0269, 1031.411, 3254.0447, 2714.4966, 2095.9646, 3112.3909, 2376.3435, 3203.7866, 3337.4634]
2026-01-23 05:02:42,899 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [668.0, 1000.0, 337.0, 1000.0, 825.0, 628.0, 1000.0, 713.0, 1000.0, 1000.0]
2026-01-23 05:02:42,899 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (2647.07) for latency DatasetOffice
2026-01-23 05:02:42,906 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 5 minutes)
2026-01-23 05:06:29,387 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:06:35,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1328.67029 ± 1361.868
2026-01-23 05:06:35,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [961.3818, 2623.0422, 3208.3225, 2426.0588, 21.777723, 45.32142, 527.09015, 6.242003, 12.581531, 3454.885]
2026-01-23 05:06:35,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [355.0, 826.0, 1000.0, 784.0, 27.0, 47.0, 189.0, 16.0, 46.0, 1000.0]
2026-01-23 05:06:35,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 3 minutes, 21 seconds)
2026-01-23 05:10:05,464 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:10:16,636 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2627.74536 ± 753.474
2026-01-23 05:10:16,636 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1533.8712, 3323.3818, 1618.8657, 2569.064, 3188.4688, 3166.2097, 3231.1763, 1398.2068, 3088.8384, 3159.3708]
2026-01-23 05:10:16,637 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [430.0, 1000.0, 461.0, 793.0, 1000.0, 1000.0, 1000.0, 459.0, 1000.0, 1000.0]
2026-01-23 05:10:16,643 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 59 minutes, 58 seconds)
2026-01-23 05:13:37,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:13:48,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2645.92822 ± 606.309
2026-01-23 05:13:48,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1887.1494, 3452.9033, 2974.1882, 2382.5984, 3282.5198, 2019.903, 3324.4172, 2186.0754, 3084.884, 1864.6415]
2026-01-23 05:13:48,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [576.0, 1000.0, 952.0, 746.0, 1000.0, 619.0, 1000.0, 727.0, 958.0, 529.0]
2026-01-23 05:13:48,704 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 54 minutes, 46 seconds)
2026-01-23 05:17:17,344 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:17:27,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2119.19312 ± 1292.258
2026-01-23 05:17:27,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3015.4534, 2988.609, 3221.18, 2016.1475, 3176.0312, 3150.8208, 2994.8472, 206.96083, 93.21635, 328.66345]
2026-01-23 05:17:27,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 672.0, 1000.0, 1000.0, 1000.0, 115.0, 74.0, 146.0]
2026-01-23 05:17:27,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 50 minutes, 20 seconds)
2026-01-23 05:21:03,272 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:21:14,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2681.92041 ± 747.936
2026-01-23 05:21:14,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3149.0505, 2678.756, 3161.3782, 2244.0398, 3385.1091, 1861.8279, 3616.6243, 3423.864, 1988.6587, 1309.8961]
2026-01-23 05:21:14,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [928.0, 750.0, 1000.0, 720.0, 1000.0, 522.0, 1000.0, 1000.0, 525.0, 429.0]
2026-01-23 05:21:14,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (2681.92) for latency DatasetOffice
2026-01-23 05:21:14,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 46 minutes, 40 seconds)
2026-01-23 05:24:51,674 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:25:04,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3067.49414 ± 469.189
2026-01-23 05:25:04,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3301.8267, 2275.4485, 2635.3657, 3539.036, 3292.808, 3269.117, 3242.5007, 2232.121, 3505.0352, 3381.6846]
2026-01-23 05:25:04,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 673.0, 764.0, 1000.0, 1000.0, 1000.0, 948.0, 599.0, 1000.0, 1000.0]
2026-01-23 05:25:04,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (3067.49) for latency DatasetOffice
2026-01-23 05:25:04,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 42 minutes, 37 seconds)
2026-01-23 05:28:44,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:28:52,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1829.49182 ± 1096.099
2026-01-23 05:28:52,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [2506.355, 1698.0017, 3184.8635, 3336.4116, 2973.883, 1495.3827, 1681.8195, 429.89096, 975.2252, 13.086674]
2026-01-23 05:28:52,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [691.0, 503.0, 1000.0, 1000.0, 874.0, 480.0, 516.0, 170.0, 333.0, 23.0]
2026-01-23 05:28:52,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 39 minutes, 56 seconds)
2026-01-23 05:32:13,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:32:22,923 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2449.79639 ± 942.745
2026-01-23 05:32:22,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1270.8302, 1740.6074, 3520.559, 3164.9463, 1259.0552, 3577.1047, 1499.4352, 1925.7915, 3520.2964, 3019.336]
2026-01-23 05:32:22,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [371.0, 484.0, 1000.0, 850.0, 366.0, 1000.0, 451.0, 542.0, 1000.0, 867.0]
2026-01-23 05:32:22,930 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 35 minutes, 59 seconds)
2026-01-23 05:35:48,344 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:36:00,769 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2991.42310 ± 598.947
2026-01-23 05:36:00,769 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3166.1543, 3094.1897, 3487.202, 1889.1663, 3275.6663, 3234.6611, 3338.5867, 1736.9236, 3341.1572, 3350.5244]
2026-01-23 05:36:00,769 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [942.0, 921.0, 1000.0, 571.0, 953.0, 1000.0, 1000.0, 501.0, 1000.0, 1000.0]
2026-01-23 05:36:00,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 32 minutes, 11 seconds)
2026-01-23 05:39:49,712 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:39:58,515 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2089.37109 ± 1227.228
2026-01-23 05:39:58,515 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1641.6588, 1178.7001, 29.684847, 459.62, 2468.7827, 3409.8572, 3339.9482, 3495.1577, 1579.0577, 3291.2412]
2026-01-23 05:39:58,515 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [468.0, 403.0, 35.0, 193.0, 803.0, 1000.0, 1000.0, 1000.0, 449.0, 1000.0]
2026-01-23 05:39:58,521 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 29 minutes, 55 seconds)
2026-01-23 05:43:29,405 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:43:41,524 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3066.01538 ± 720.371
2026-01-23 05:43:41,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3520.8123, 3636.241, 3544.6643, 3597.7446, 3365.949, 1858.2943, 1903.6388, 3470.3757, 2169.1682, 3593.265]
2026-01-23 05:43:41,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 526.0, 557.0, 1000.0, 609.0, 1000.0]
2026-01-23 05:43:41,533 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 25 minutes, 16 seconds)
2026-01-23 05:47:02,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:47:14,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3060.08813 ± 737.025
2026-01-23 05:47:14,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3285.5781, 3515.468, 3276.706, 3435.6833, 3272.7847, 1023.4304, 3534.3938, 2471.896, 3411.7468, 3373.1934]
2026-01-23 05:47:14,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 311.0, 1000.0, 723.0, 1000.0, 1000.0]
2026-01-23 05:47:14,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 19 minutes, 37 seconds)
2026-01-23 05:50:46,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:50:57,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2679.56519 ± 1050.437
2026-01-23 05:50:57,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [2042.817, 1088.3925, 410.68985, 3445.6086, 3351.3179, 3226.033, 3184.8335, 3430.345, 3262.7178, 3352.8982]
2026-01-23 05:50:57,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [627.0, 391.0, 163.0, 1000.0, 1000.0, 1000.0, 880.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:50:57,336 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 17 minutes, 26 seconds)
2026-01-23 05:54:46,987 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:54:58,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2933.78076 ± 1028.440
2026-01-23 05:54:58,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3014.3523, 3376.3599, 989.5942, 3513.7217, 3599.473, 820.1563, 3372.5525, 3561.3804, 3628.564, 3461.6519]
2026-01-23 05:54:58,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [862.0, 1000.0, 323.0, 878.0, 1000.0, 267.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:54:58,631 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 16 minutes, 32 seconds)
2026-01-23 05:58:16,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:58:28,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2995.85034 ± 743.150
2026-01-23 05:58:28,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3281.0432, 1609.9053, 3540.7405, 3283.6714, 3537.2056, 1792.3815, 2343.9985, 3305.1792, 3817.852, 3446.5266]
2026-01-23 05:58:28,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 465.0, 1000.0, 1000.0, 1000.0, 510.0, 668.0, 915.0, 1000.0, 1000.0]
2026-01-23 05:58:28,902 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 9 minutes, 32 seconds)
2026-01-23 06:02:18,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:02:31,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3308.76904 ± 384.015
2026-01-23 06:02:31,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3599.447, 3403.0786, 3398.297, 2317.0159, 3515.411, 2879.4172, 3607.0845, 3485.8748, 3501.492, 3380.5732]
2026-01-23 06:02:31,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 681.0, 1000.0, 806.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:02:31,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (3308.77) for latency DatasetOffice
2026-01-23 06:02:31,666 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 8 minutes, 4 seconds)
2026-01-23 06:05:49,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:06:02,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3101.71729 ± 621.602
2026-01-23 06:06:02,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3133.1077, 2908.0947, 3350.6418, 3164.925, 1317.2202, 3310.3452, 3451.1323, 3418.589, 3582.6755, 3380.4404]
2026-01-23 06:06:02,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [812.0, 871.0, 1000.0, 1000.0, 392.0, 1000.0, 1000.0, 1000.0, 1000.0, 952.0]
2026-01-23 06:06:02,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 3 minutes, 59 seconds)
2026-01-23 06:09:49,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:09:57,736 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2173.38013 ± 989.764
2026-01-23 06:09:57,736 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [164.42444, 3451.4155, 3586.376, 2028.7378, 3312.8826, 1643.1082, 2076.431, 2122.7432, 1576.6941, 1770.9907]
2026-01-23 06:09:57,736 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [84.0, 1000.0, 1000.0, 548.0, 935.0, 488.0, 544.0, 617.0, 510.0, 499.0]
2026-01-23 06:09:57,744 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 1 minute, 38 seconds)
2026-01-23 06:13:22,550 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:13:35,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3226.07495 ± 403.156
2026-01-23 06:13:35,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3322.1274, 3486.5173, 3281.3513, 2608.3203, 3788.819, 3482.3772, 2808.6814, 3489.7705, 3461.143, 2531.6426]
2026-01-23 06:13:35,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 874.0, 754.0, 1000.0, 1000.0, 800.0, 1000.0, 1000.0, 748.0]
2026-01-23 06:13:35,371 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 55 minutes, 23 seconds)
2026-01-23 06:17:08,404 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:17:21,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3174.04468 ± 607.268
2026-01-23 06:17:21,269 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3431.072, 3157.0962, 3182.044, 3162.4924, 3244.7974, 3593.9788, 3515.973, 1418.5228, 3569.7424, 3464.7278]
2026-01-23 06:17:21,269 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 956.0, 1000.0, 1000.0, 1000.0, 435.0, 1000.0, 1000.0]
2026-01-23 06:17:21,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 53 minutes, 14 seconds)
2026-01-23 06:20:53,829 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:21:06,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3403.06250 ± 304.012
2026-01-23 06:21:06,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3551.419, 2697.6804, 3616.9915, 3616.845, 3420.997, 2964.8987, 3689.8794, 3417.2954, 3588.704, 3465.9158]
2026-01-23 06:21:06,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 756.0, 1000.0, 1000.0, 1000.0, 847.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:21:06,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (3403.06) for latency DatasetOffice
2026-01-23 06:21:06,927 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 47 minutes, 48 seconds)
2026-01-23 06:24:38,367 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:24:47,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2543.16113 ± 1352.223
2026-01-23 06:24:47,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3911.1892, 3603.4785, 2203.276, 3615.489, 693.1002, 3945.6802, 3737.963, 2416.244, 507.76495, 797.4261]
2026-01-23 06:24:47,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 594.0, 1000.0, 249.0, 1000.0, 1000.0, 692.0, 199.0, 346.0]
2026-01-23 06:24:47,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 45 minutes, 4 seconds)
2026-01-23 06:28:13,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:28:25,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3144.36768 ± 566.029
2026-01-23 06:28:25,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [2263.339, 2814.3408, 3573.6584, 3659.2402, 3630.6187, 3673.231, 2503.7144, 3622.1943, 3372.792, 2330.5486]
2026-01-23 06:28:25,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [715.0, 731.0, 1000.0, 1000.0, 1000.0, 1000.0, 711.0, 1000.0, 1000.0, 626.0]
2026-01-23 06:28:25,626 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 39 minutes, 42 seconds)
2026-01-23 06:31:53,661 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:32:01,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2122.67944 ± 1283.049
2026-01-23 06:32:01,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1935.7372, 277.00903, 2848.1465, 839.27716, 3354.9907, 956.7754, 572.29944, 3487.2693, 3351.7283, 3603.562]
2026-01-23 06:32:01,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [522.0, 115.0, 824.0, 266.0, 1000.0, 294.0, 193.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:32:01,856 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 35 minutes, 53 seconds)
2026-01-23 06:35:39,532 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:35:50,814 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3113.66650 ± 896.709
2026-01-23 06:35:50,814 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [2435.0142, 3845.4915, 1517.7328, 3668.5686, 3582.739, 3609.0974, 3702.4175, 1469.072, 3866.1936, 3440.3389]
2026-01-23 06:35:50,814 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [669.0, 1000.0, 434.0, 1000.0, 1000.0, 1000.0, 1000.0, 402.0, 1000.0, 1000.0]
2026-01-23 06:35:50,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 32 minutes, 27 seconds)
2026-01-23 06:39:25,397 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:39:36,550 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3100.67456 ± 771.868
2026-01-23 06:39:36,550 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [2383.781, 3777.4595, 3406.6577, 3604.5808, 2947.9702, 3387.0806, 1999.4517, 3906.0845, 3911.9456, 1681.7339]
2026-01-23 06:39:36,550 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [678.0, 1000.0, 861.0, 1000.0, 744.0, 1000.0, 551.0, 1000.0, 1000.0, 506.0]
2026-01-23 06:39:36,561 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 28 minutes, 46 seconds)
2026-01-23 06:43:02,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:43:14,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3300.80127 ± 901.722
2026-01-23 06:43:14,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3458.736, 3512.927, 3462.4368, 3677.5557, 3485.733, 630.9152, 3523.4763, 3689.7585, 3962.3462, 3604.1265]
2026-01-23 06:43:14,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 219.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:43:14,590 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 24 minutes, 50 seconds)
2026-01-23 06:46:44,367 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:46:55,648 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3146.78564 ± 863.007
2026-01-23 06:46:55,648 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3615.5881, 1853.8552, 1811.8325, 3611.4138, 3825.9468, 3735.6543, 3672.331, 1830.4996, 3735.2024, 3775.5305]
2026-01-23 06:46:55,648 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 497.0, 480.0, 1000.0, 1000.0, 1000.0, 1000.0, 527.0, 1000.0, 1000.0]
2026-01-23 06:46:55,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 21 minutes, 24 seconds)
2026-01-23 06:50:13,737 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:50:22,758 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2392.38330 ± 1325.758
2026-01-23 06:50:22,758 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [21.981112, 3605.2485, 3600.1238, 3509.6072, 2933.555, 2702.5283, 3775.0962, 2252.4133, 1026.6644, 496.61484]
2026-01-23 06:50:22,758 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [26.0, 1000.0, 1000.0, 1000.0, 867.0, 688.0, 1000.0, 709.0, 364.0, 181.0]
2026-01-23 06:50:22,766 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 17 minutes, 3 seconds)
2026-01-23 06:53:59,059 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:54:11,600 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3374.38159 ± 500.445
2026-01-23 06:54:11,600 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3676.0151, 3491.318, 3315.5403, 3631.9866, 3637.4348, 3622.8306, 3663.4214, 3734.7522, 2935.0112, 2035.5045]
2026-01-23 06:54:11,600 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 825.0, 549.0]
2026-01-23 06:54:11,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 13 minutes, 23 seconds)
2026-01-23 06:57:47,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:57:58,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3188.80103 ± 1113.618
2026-01-23 06:57:58,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3673.2466, 3692.3452, 3375.2864, 2211.8684, 3628.6616, 3744.8955, 152.78795, 3735.8035, 4017.5823, 3655.5325]
2026-01-23 06:57:58,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 586.0, 1000.0, 1000.0, 78.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:57:58,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 9 minutes, 48 seconds)
2026-01-23 07:01:31,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:01:44,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3604.03467 ± 87.284
2026-01-23 07:01:44,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3596.0066, 3691.609, 3514.3022, 3677.7632, 3555.3557, 3601.8376, 3782.1978, 3537.8337, 3607.6653, 3475.7744]
2026-01-23 07:01:44,828 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:01:44,828 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (3604.03) for latency DatasetOffice
2026-01-23 07:01:44,837 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 6 minutes, 36 seconds)
2026-01-23 07:05:03,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:05:14,738 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2884.90283 ± 849.032
2026-01-23 07:05:14,738 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3646.5688, 3508.6572, 3404.1697, 3490.6523, 3790.6145, 1892.6155, 1864.523, 2269.7712, 3507.4663, 1473.9897]
2026-01-23 07:05:14,738 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 543.0, 484.0, 618.0, 1000.0, 430.0]
2026-01-23 07:05:14,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 2 minutes, 16 seconds)
2026-01-23 07:08:45,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:08:54,514 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2540.70898 ± 1470.314
2026-01-23 07:08:54,514 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3622.2854, 3758.391, 3637.9001, 3709.2317, 567.0252, 3629.4702, 3581.6067, 2288.6016, 3.5638392, 609.01434]
2026-01-23 07:08:54,514 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 193.0, 1000.0, 1000.0, 615.0, 13.0, 228.0]
2026-01-23 07:08:54,524 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 85/100 (estimated time remaining: 59 minutes, 17 seconds)
2026-01-23 07:12:23,779 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:12:34,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3055.39697 ± 1116.991
2026-01-23 07:12:34,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1846.4148, 2794.1284, 3515.9275, 3868.9954, 3528.5125, 3804.5972, 3830.1396, 207.06862, 3465.2913, 3692.8962]
2026-01-23 07:12:34,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [589.0, 834.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 100.0, 1000.0, 1000.0]
2026-01-23 07:12:34,990 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 86/100 (estimated time remaining: 55 minutes, 10 seconds)
2026-01-23 07:16:03,155 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:16:14,814 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3244.91260 ± 607.824
2026-01-23 07:16:14,814 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3613.7622, 3594.4016, 2349.909, 3490.4768, 2087.571, 3890.4429, 2723.7314, 3816.1099, 3173.2612, 3709.456]
2026-01-23 07:16:14,814 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 588.0, 1000.0, 577.0, 1000.0, 752.0, 1000.0, 869.0, 1000.0]
2026-01-23 07:16:14,824 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 87/100 (estimated time remaining: 51 minutes, 8 seconds)
2026-01-23 07:19:52,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:20:00,545 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2007.68713 ± 1322.583
2026-01-23 07:20:00,545 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1260.5192, 1685.3616, 2738.371, 1689.972, 3629.7527, 3849.1545, 3463.2068, 1740.9286, 23.429907, -3.8252647]
2026-01-23 07:20:00,545 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [372.0, 481.0, 796.0, 578.0, 1000.0, 1000.0, 1000.0, 508.0, 28.0, 8.0]
2026-01-23 07:20:00,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 88/100 (estimated time remaining: 47 minutes, 28 seconds)
2026-01-23 07:23:40,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:23:52,666 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3449.51318 ± 816.897
2026-01-23 07:23:52,667 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1029.01, 3862.5144, 3665.9023, 3500.4363, 3580.4238, 3713.906, 3686.7031, 3913.6191, 3651.7761, 3890.8438]
2026-01-23 07:23:52,667 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [311.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:23:52,677 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 89/100 (estimated time remaining: 44 minutes, 43 seconds)
2026-01-23 07:27:09,930 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:27:22,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3459.61133 ± 653.700
2026-01-23 07:27:22,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3704.5547, 3765.305, 3785.6206, 3939.2766, 3577.7307, 1880.8374, 3772.4326, 3926.7595, 2516.557, 3727.038]
2026-01-23 07:27:22,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 942.0, 505.0, 1000.0, 1000.0, 662.0, 1000.0]
2026-01-23 07:27:22,059 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 90/100 (estimated time remaining: 40 minutes, 36 seconds)
2026-01-23 07:30:55,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:31:05,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2554.75977 ± 1306.389
2026-01-23 07:31:05,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3603.5725, 2152.5908, 3621.0188, 3781.6523, 1947.8673, 3328.509, 14.730519, 446.0991, 3092.9126, 3558.646]
2026-01-23 07:31:05,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 566.0, 1000.0, 1000.0, 539.0, 1000.0, 30.0, 186.0, 1000.0, 1000.0]
2026-01-23 07:31:05,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 91/100 (estimated time remaining: 37 minutes)
2026-01-23 07:34:40,595 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:34:53,948 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3553.36255 ± 120.334
2026-01-23 07:34:53,948 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3412.925, 3562.8376, 3528.823, 3547.2869, 3696.5813, 3498.8179, 3688.2678, 3755.7786, 3363.1243, 3479.181]
2026-01-23 07:34:53,948 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:34:53,960 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 92/100 (estimated time remaining: 33 minutes, 34 seconds)
2026-01-23 07:38:24,744 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:38:37,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3357.50391 ± 808.001
2026-01-23 07:38:37,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3604.348, 3672.9988, 3668.0737, 968.50616, 3821.988, 3701.451, 3257.9292, 3669.4917, 3620.9136, 3589.338]
2026-01-23 07:38:37,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 313.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:38:37,181 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 93/100 (estimated time remaining: 29 minutes, 46 seconds)
2026-01-23 07:42:10,105 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:42:22,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3560.57886 ± 625.008
2026-01-23 07:42:22,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3756.459, 1701.8309, 3731.6333, 3707.0344, 3668.4778, 3736.243, 3783.7847, 3940.2866, 3891.375, 3688.6646]
2026-01-23 07:42:22,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 449.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:42:22,838 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 94/100 (estimated time remaining: 25 minutes, 54 seconds)
2026-01-23 07:45:46,251 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:45:57,985 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3201.70459 ± 1143.974
2026-01-23 07:45:57,985 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3855.6064, 3447.8562, 3852.308, 3592.4387, 3629.2576, 3619.705, 3874.0947, 3763.8997, 2376.3901, 5.4888062]
2026-01-23 07:45:57,985 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 672.0, 15.0]
2026-01-23 07:45:57,995 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 95/100 (estimated time remaining: 22 minutes, 19 seconds)
2026-01-23 07:49:36,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:49:46,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2893.00952 ± 1140.451
2026-01-23 07:49:46,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1692.556, 3823.5566, 3745.4136, 2413.9355, 731.1871, 3608.9963, 3691.0808, 3807.0398, 1480.0846, 3936.2456]
2026-01-23 07:49:46,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [489.0, 1000.0, 1000.0, 670.0, 245.0, 1000.0, 1000.0, 1000.0, 420.0, 1000.0]
2026-01-23 07:49:46,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 96/100 (estimated time remaining: 18 minutes, 41 seconds)
2026-01-23 07:53:23,704 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:53:37,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3897.15161 ± 165.703
2026-01-23 07:53:37,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3882.4368, 4011.0703, 4058.3994, 4135.3145, 3899.1987, 4035.0244, 3556.7485, 3888.1575, 3721.827, 3783.34]
2026-01-23 07:53:37,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:53:37,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (3897.15) for latency DatasetOffice
2026-01-23 07:53:37,229 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 97/100 (estimated time remaining: 14 minutes, 58 seconds)
2026-01-23 07:56:56,836 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:57:07,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2990.71582 ± 1201.439
2026-01-23 07:57:07,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3783.1348, 3009.608, 660.2586, 3058.5623, 3724.4014, 3640.39, 3751.439, 3822.497, 3806.0215, 650.8473]
2026-01-23 07:57:07,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 876.0, 257.0, 805.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 219.0]
2026-01-23 07:57:07,539 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 98/100 (estimated time remaining: 11 minutes, 6 seconds)
2026-01-23 08:00:41,885 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:00:45,763 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 940.97736 ± 1403.739
2026-01-23 08:00:45,763 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [2263.8855, 7.066625, 22.250202, -3.386017, 14.33403, 5.881946, 16.195919, 246.56258, 3419.7827, 3417.2004]
2026-01-23 08:00:45,763 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [633.0, 15.0, 34.0, 8.0, 48.0, 41.0, 24.0, 127.0, 1000.0, 1000.0]
2026-01-23 08:00:45,773 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 99/100 (estimated time remaining: 7 minutes, 21 seconds)
2026-01-23 08:04:11,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:04:23,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3483.13794 ± 668.979
2026-01-23 08:04:23,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1516.1753, 3573.2993, 3548.9023, 3815.2715, 3652.1836, 3894.3757, 3475.814, 3705.1392, 3861.9077, 3788.309]
2026-01-23 08:04:23,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [448.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:04:23,689 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 41 seconds)
2026-01-23 08:07:50,980 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:08:03,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3735.43677 ± 374.166
2026-01-23 08:08:03,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4137.467, 3783.7026, 3894.3828, 2663.6912, 3756.3691, 3905.6008, 3760.9539, 3902.221, 3798.9907, 3750.989]
2026-01-23 08:08:03,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 712.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:08:03,861 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1299 [DEBUG]: Training session finished
