2026-01-22 23:01:53,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-walker2d/DatasetOffice-mbpac_memdelay
2026-01-22 23:01:53,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-walker2d/DatasetOffice-mbpac_memdelay
2026-01-22 23:01:53,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x14c246e42b10>}
2026-01-22 23:01:53,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1159 [DEBUG]: using device: cuda
2026-01-22 23:01:53,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1181 [INFO]: Creating new trainer
2026-01-22 23:01:53,794 baseline-mbpac-noisy-walker2d:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2026-01-22 23:01:53,794 baseline-mbpac-noisy-walker2d:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:01:53,801 baseline-mbpac-noisy-walker2d:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2026-01-22 23:01:54,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1242 [DEBUG]: Starting training session...
2026-01-22 23:01:54,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 1/100
2026-01-22 23:14:09,548 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-22 23:14:09,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:14:52,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 185.62924 ± 63.057
2026-01-22 23:14:52,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [260.11038, 171.81924, 136.89023, 219.65668, 318.27182, 154.51263, 83.201515, 182.1854, 181.53197, 148.1125]
2026-01-22 23:14:52,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [149.0, 100.0, 86.0, 117.0, 235.0, 96.0, 89.0, 104.0, 122.0, 88.0]
2026-01-22 23:14:52,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1274 [INFO]: New best (185.63) for latency DatasetOffice
2026-01-22 23:14:52,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 2/100 (estimated time remaining: 21 hours, 23 minutes, 45 seconds)
2026-01-22 23:27:26,270 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-22 23:27:26,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:28:16,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 41.60749 ± 57.085
2026-01-22 23:28:16,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [39.38596, 8.105982, -0.56982154, -3.7541628, 77.87887, -14.856711, 41.33015, 35.04729, 193.50818, 39.99918]
2026-01-22 23:28:16,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [175.0, 31.0, 135.0, 131.0, 215.0, 102.0, 129.0, 135.0, 144.0, 194.0]
2026-01-22 23:28:16,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 3/100 (estimated time remaining: 21 hours, 31 minutes, 40 seconds)
2026-01-22 23:41:08,712 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-22 23:41:08,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:43:42,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 293.52307 ± 178.113
2026-01-22 23:43:42,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [417.66074, 175.77098, 111.31554, 361.92413, 261.52945, 45.004807, 475.441, 275.9656, 151.42883, 659.1897]
2026-01-22 23:43:42,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [874.0, 149.0, 290.0, 299.0, 256.0, 254.0, 362.0, 579.0, 365.0, 849.0]
2026-01-22 23:43:42,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1274 [INFO]: New best (293.52) for latency DatasetOffice
2026-01-22 23:43:42,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 4/100 (estimated time remaining: 22 hours, 31 minutes, 34 seconds)
2026-01-22 23:55:44,909 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-22 23:55:44,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:56:53,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 193.03857 ± 99.630
2026-01-22 23:56:53,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [266.45975, 257.58398, 267.67288, 214.67256, 345.1829, 130.95198, 60.521915, 19.000122, 118.38005, 249.95964]
2026-01-22 23:56:53,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [171.0, 161.0, 178.0, 172.0, 449.0, 124.0, 143.0, 287.0, 108.0, 167.0]
2026-01-22 23:56:53,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 5/100 (estimated time remaining: 21 hours, 59 minutes, 47 seconds)
2026-01-23 00:09:05,333 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 00:09:05,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:09:50,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 206.67500 ± 28.538
2026-01-23 00:09:50,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [205.02538, 207.14641, 185.11783, 278.523, 178.33794, 214.37752, 217.3097, 206.48575, 206.97884, 167.44756]
2026-01-23 00:09:50,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [130.0, 136.0, 130.0, 143.0, 115.0, 135.0, 115.0, 140.0, 119.0, 136.0]
2026-01-23 00:09:50,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 6/100 (estimated time remaining: 21 hours, 30 minutes, 48 seconds)
2026-01-23 00:22:05,131 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 00:22:05,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:23:32,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 292.25647 ± 105.093
2026-01-23 00:23:32,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [379.57224, 335.37866, 288.08694, 304.2396, 377.8746, 388.3592, 119.35121, 333.82977, 329.87778, 65.994545]
2026-01-23 00:23:32,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [375.0, 171.0, 147.0, 229.0, 244.0, 255.0, 149.0, 176.0, 579.0, 178.0]
2026-01-23 00:23:32,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 7/100 (estimated time remaining: 21 hours, 30 minutes, 56 seconds)
2026-01-23 00:35:43,246 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 00:35:43,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:36:42,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 290.05957 ± 98.596
2026-01-23 00:36:42,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [266.93863, 232.41849, 120.08876, 195.75348, 310.7178, 387.7081, 434.96252, 238.72797, 275.67477, 437.6051]
2026-01-23 00:36:42,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [164.0, 140.0, 98.0, 118.0, 172.0, 203.0, 206.0, 180.0, 147.0, 284.0]
2026-01-23 00:36:42,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 8/100 (estimated time remaining: 21 hours, 13 minutes, 5 seconds)
2026-01-23 00:49:02,012 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 00:49:02,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:49:54,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 278.47086 ± 198.873
2026-01-23 00:49:54,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [191.68323, 360.6741, 214.16849, 152.20871, 243.48373, 6.151381, 806.2039, 206.23766, 280.35342, 323.5439]
2026-01-23 00:49:54,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [115.0, 236.0, 126.0, 130.0, 114.0, 14.0, 329.0, 135.0, 130.0, 157.0]
2026-01-23 00:49:54,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 9/100 (estimated time remaining: 20 hours, 18 minutes, 7 seconds)
2026-01-23 01:02:08,290 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 01:02:08,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:03:01,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 284.23871 ± 129.159
2026-01-23 01:03:01,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [446.38446, 21.428188, 311.6272, 372.01007, 304.36105, 358.8678, 318.40857, 80.20812, 235.8103, 393.2816]
2026-01-23 01:03:01,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [227.0, 43.0, 145.0, 173.0, 136.0, 200.0, 187.0, 93.0, 142.0, 151.0]
2026-01-23 01:03:01,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 10/100 (estimated time remaining: 20 hours, 3 minutes, 24 seconds)
2026-01-23 01:15:34,390 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 01:15:34,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:16:41,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 329.27759 ± 130.035
2026-01-23 01:16:41,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [379.96356, 337.06592, 152.3074, 317.49023, 330.64282, 265.35098, 340.82797, 182.32822, 327.0915, 659.7072]
2026-01-23 01:16:41,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [205.0, 179.0, 112.0, 153.0, 143.0, 131.0, 217.0, 205.0, 162.0, 413.0]
2026-01-23 01:16:41,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1274 [INFO]: New best (329.28) for latency DatasetOffice
2026-01-23 01:16:41,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 11/100 (estimated time remaining: 20 hours, 3 minutes, 16 seconds)
2026-01-23 01:28:54,008 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 01:28:54,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:29:35,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 234.55569 ± 53.192
2026-01-23 01:29:35,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [200.4304, 300.29395, 161.3018, 174.56404, 202.03496, 265.94278, 269.4758, 324.587, 256.84714, 190.07906]
2026-01-23 01:29:35,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [118.0, 164.0, 90.0, 85.0, 108.0, 128.0, 142.0, 154.0, 126.0, 91.0]
2026-01-23 01:29:35,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 12/100 (estimated time remaining: 19 hours, 35 minutes, 45 seconds)
2026-01-23 01:41:57,758 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 01:41:57,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:43:21,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 405.44785 ± 143.850
2026-01-23 01:43:21,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [439.96124, 330.095, 640.32355, 361.3115, 643.4043, 320.35864, 452.7305, 137.38977, 335.90622, 392.99777]
2026-01-23 01:43:21,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [232.0, 174.0, 407.0, 171.0, 338.0, 237.0, 211.0, 194.0, 180.0, 243.0]
2026-01-23 01:43:21,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1274 [INFO]: New best (405.45) for latency DatasetOffice
2026-01-23 01:43:21,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 13/100 (estimated time remaining: 19 hours, 32 minutes, 56 seconds)
2026-01-23 01:55:33,035 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 01:55:33,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:56:28,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 291.80722 ± 139.506
2026-01-23 01:56:28,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [302.04364, 154.62363, 231.90883, 272.98514, 223.46585, 397.90732, 168.35541, 264.0592, 662.1922, 240.53105]
2026-01-23 01:56:28,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [158.0, 106.0, 121.0, 147.0, 117.0, 214.0, 118.0, 138.0, 318.0, 131.0]
2026-01-23 01:56:28,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 14/100 (estimated time remaining: 19 hours, 18 minutes, 14 seconds)
2026-01-23 02:08:48,312 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 02:08:48,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:10:11,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 421.75665 ± 220.615
2026-01-23 02:10:11,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [345.65778, 380.34528, 254.21596, 337.03662, 302.61475, 481.25348, 1022.89557, 322.01917, 224.0332, 547.49445]
2026-01-23 02:10:11,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [210.0, 191.0, 148.0, 192.0, 171.0, 192.0, 624.0, 221.0, 114.0, 275.0]
2026-01-23 02:10:11,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1274 [INFO]: New best (421.76) for latency DatasetOffice
2026-01-23 02:10:11,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 15/100 (estimated time remaining: 19 hours, 15 minutes, 16 seconds)
2026-01-23 02:22:20,175 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 02:22:20,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:23:29,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 419.15924 ± 103.688
2026-01-23 02:23:29,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [300.3484, 658.45294, 360.06143, 395.9383, 394.94626, 318.32562, 477.33646, 427.97562, 335.05695, 523.1505]
2026-01-23 02:23:29,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [137.0, 303.0, 170.0, 202.0, 176.0, 146.0, 221.0, 209.0, 190.0, 245.0]
2026-01-23 02:23:29,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 16/100 (estimated time remaining: 18 hours, 55 minutes, 30 seconds)
2026-01-23 02:35:37,845 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 02:35:37,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:36:28,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 320.20505 ± 87.118
2026-01-23 02:36:28,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [414.71463, 300.0343, 275.969, 417.3383, 270.91833, 151.33188, 461.01837, 271.0727, 351.2476, 288.40555]
2026-01-23 02:36:28,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [172.0, 134.0, 129.0, 197.0, 137.0, 85.0, 172.0, 134.0, 172.0, 151.0]
2026-01-23 02:36:28,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 17/100 (estimated time remaining: 18 hours, 43 minutes, 31 seconds)
2026-01-23 02:48:28,661 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 02:48:28,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:49:36,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 425.95908 ± 218.809
2026-01-23 02:49:36,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [629.5168, 538.6162, 768.29193, 416.89276, 178.86487, 499.9982, 271.85727, 310.3933, 20.62013, 624.5393]
2026-01-23 02:49:36,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [232.0, 257.0, 323.0, 184.0, 132.0, 233.0, 137.0, 137.0, 32.0, 302.0]
2026-01-23 02:49:36,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1274 [INFO]: New best (425.96) for latency DatasetOffice
2026-01-23 02:49:36,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 18/100 (estimated time remaining: 18 hours, 19 minutes, 39 seconds)
2026-01-23 03:01:49,250 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 03:01:49,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:02:56,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 490.41977 ± 145.034
2026-01-23 03:02:56,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [447.2551, 456.39377, 533.4063, 458.48938, 842.86884, 555.4995, 249.71494, 392.56876, 419.16208, 548.8388]
2026-01-23 03:02:56,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [186.0, 153.0, 206.0, 198.0, 331.0, 200.0, 107.0, 173.0, 155.0, 236.0]
2026-01-23 03:02:56,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1274 [INFO]: New best (490.42) for latency DatasetOffice
2026-01-23 03:02:56,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 19/100 (estimated time remaining: 18 hours, 9 minutes, 58 seconds)
2026-01-23 03:14:57,756 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 03:14:57,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:16:13,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 499.47861 ± 150.033
2026-01-23 03:16:13,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [656.43463, 630.1112, 387.5157, 458.05, 193.21704, 562.3787, 340.36356, 697.5677, 487.80432, 581.3431]
2026-01-23 03:16:13,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [274.0, 257.0, 197.0, 178.0, 122.0, 235.0, 186.0, 311.0, 196.0, 236.0]
2026-01-23 03:16:13,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1274 [INFO]: New best (499.48) for latency DatasetOffice
2026-01-23 03:16:13,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 20/100 (estimated time remaining: 17 hours, 49 minutes, 49 seconds)
2026-01-23 03:28:22,881 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 03:28:22,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:30:14,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 838.58313 ± 382.980
2026-01-23 03:30:14,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [864.63025, 1308.2904, 583.3289, 1157.3987, 395.50247, 1624.4069, 561.51953, 812.8204, 476.41196, 601.52155]
2026-01-23 03:30:14,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [321.0, 474.0, 212.0, 391.0, 210.0, 578.0, 273.0, 361.0, 194.0, 260.0]
2026-01-23 03:30:14,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1274 [INFO]: New best (838.58) for latency DatasetOffice
2026-01-23 03:30:14,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 21/100 (estimated time remaining: 17 hours, 48 minutes, 5 seconds)
2026-01-23 03:42:56,578 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 03:42:56,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:45:59,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1436.82617 ± 966.581
2026-01-23 03:45:59,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [396.02505, 919.1638, 2617.2998, 2703.3499, 1688.0204, 126.81572, 3012.6484, 749.2208, 1055.5393, 1100.1799]
2026-01-23 03:45:59,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [170.0, 340.0, 982.0, 983.0, 602.0, 79.0, 1000.0, 273.0, 375.0, 441.0]
2026-01-23 03:45:59,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1274 [INFO]: New best (1436.83) for latency DatasetOffice
2026-01-23 03:45:59,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 22/100 (estimated time remaining: 18 hours, 18 minutes, 18 seconds)
2026-01-23 03:57:40,439 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 03:57:40,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:01:09,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1604.29309 ± 1046.343
2026-01-23 04:01:09,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [837.78625, 330.4074, 2536.5693, 166.0297, 2746.6812, 1824.1183, 2044.5264, 2586.6262, 2761.7483, 208.43823]
2026-01-23 04:01:09,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [314.0, 146.0, 1000.0, 92.0, 1000.0, 638.0, 698.0, 1000.0, 1000.0, 114.0]
2026-01-23 04:01:09,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1274 [INFO]: New best (1604.29) for latency DatasetOffice
2026-01-23 04:01:09,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 23/100 (estimated time remaining: 18 hours, 36 minutes, 10 seconds)
2026-01-23 04:13:46,826 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 04:13:46,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:17:19,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1967.09741 ± 1492.301
2026-01-23 04:17:19,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [3477.3625, 126.06565, 2180.8645, 3421.4402, 374.2248, 121.81159, 3320.4182, 3182.709, 3326.3767, 139.70224]
2026-01-23 04:17:19,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 79.0, 664.0, 1000.0, 194.0, 81.0, 1000.0, 1000.0, 1000.0, 80.0]
2026-01-23 04:17:19,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1274 [INFO]: New best (1967.10) for latency DatasetOffice
2026-01-23 04:17:19,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 24/100 (estimated time remaining: 19 hours, 5 minutes, 27 seconds)
2026-01-23 04:29:40,330 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 04:29:40,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:34:55,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3304.04224 ± 958.796
2026-01-23 04:34:55,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [3818.8364, 3692.135, 3550.5576, 3479.418, 628.64703, 3563.2644, 2653.3547, 3913.5725, 3981.112, 3759.5251]
2026-01-23 04:34:55,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 240.0, 1000.0, 839.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:34:55,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1274 [INFO]: New best (3304.04) for latency DatasetOffice
2026-01-23 04:34:55,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 25/100 (estimated time remaining: 19 hours, 56 minutes, 20 seconds)
2026-01-23 04:46:53,356 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 04:46:53,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:52:06,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3399.42920 ± 1062.673
2026-01-23 04:52:06,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [3663.0483, 230.53021, 3899.8723, 3733.8535, 3826.8433, 3742.0815, 3614.395, 3726.6082, 3580.7158, 3976.3428]
2026-01-23 04:52:06,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 115.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:52:06,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1274 [INFO]: New best (3399.43) for latency DatasetOffice
2026-01-23 04:52:06,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 26/100 (estimated time remaining: 20 hours, 27 minutes, 53 seconds)
2026-01-23 05:03:16,521 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 05:03:16,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:08:26,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3427.62183 ± 1119.258
2026-01-23 05:08:26,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [3686.2134, 4012.9492, 3418.5303, 3786.147, 3803.0293, 3902.9604, 101.009155, 3795.3704, 3919.705, 3850.3044]
2026-01-23 05:08:26,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 69.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:08:26,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1274 [INFO]: New best (3427.62) for latency DatasetOffice
2026-01-23 05:08:26,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 27/100 (estimated time remaining: 20 hours, 20 minutes, 23 seconds)
2026-01-23 05:20:34,973 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 05:20:34,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:26:18,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3731.85400 ± 70.657
2026-01-23 05:26:18,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [3773.847, 3832.8442, 3674.3333, 3787.4814, 3695.841, 3767.9395, 3608.1646, 3806.7612, 3729.1482, 3642.1787]
2026-01-23 05:26:18,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 986.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:26:18,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1274 [INFO]: New best (3731.85) for latency DatasetOffice
2026-01-23 05:26:18,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 28/100 (estimated time remaining: 20 hours, 43 minutes, 9 seconds)
2026-01-23 05:38:37,374 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 05:38:37,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:44:19,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3592.33447 ± 109.655
2026-01-23 05:44:19,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [3714.5186, 3411.1162, 3582.737, 3551.5166, 3597.5935, 3530.7153, 3499.7708, 3793.4695, 3530.8691, 3711.0364]
2026-01-23 05:44:19,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:44:20,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 29/100 (estimated time remaining: 20 hours, 53 minutes)
2026-01-23 05:56:30,030 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 05:56:30,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:00:52,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2865.67773 ± 1626.941
2026-01-23 06:00:52,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [3738.053, 88.32915, 450.65338, 3655.9543, 4264.318, 4000.6492, 3704.1658, 4053.6794, 4030.9895, 669.98474]
2026-01-23 06:00:52,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 70.0, 187.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 268.0]
2026-01-23 06:00:52,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 30/100 (estimated time remaining: 20 hours, 20 minutes, 19 seconds)
2026-01-23 06:12:34,305 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 06:12:34,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:17:02,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3143.57227 ± 1568.860
2026-01-23 06:17:02,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [1622.2664, 673.7304, 4143.4644, 158.0654, 4214.9814, 3637.3115, 4156.478, 4368.9805, 4181.384, 4279.062]
2026-01-23 06:17:02,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [466.0, 256.0, 1000.0, 92.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:17:02,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 31/100 (estimated time remaining: 19 hours, 49 minutes, 15 seconds)
2026-01-23 06:29:08,732 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 06:29:08,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:34:00,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3192.96899 ± 1231.702
2026-01-23 06:34:00,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [3847.1292, 3922.9365, 904.4468, 3963.0933, 3726.3154, 3859.274, 3712.8843, 3671.4795, 3745.7327, 576.3988]
2026-01-23 06:34:00,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 324.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 237.0]
2026-01-23 06:34:00,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 32/100 (estimated time remaining: 19 hours, 40 minutes, 43 seconds)
2026-01-23 06:46:30,024 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 06:46:30,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:50:26,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2742.20361 ± 1723.279
2026-01-23 06:50:26,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [1063.455, 736.6763, 295.46283, 4404.2505, 578.76575, 4152.5596, 4339.9917, 4149.049, 4323.902, 3377.9211]
2026-01-23 06:50:26,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [315.0, 297.0, 169.0, 1000.0, 272.0, 1000.0, 1000.0, 1000.0, 1000.0, 837.0]
2026-01-23 06:50:26,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 33/100 (estimated time remaining: 19 hours, 4 minutes, 17 seconds)
2026-01-23 07:02:29,470 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 07:02:29,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:08:13,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4210.87207 ± 82.872
2026-01-23 07:08:13,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [4264.768, 4344.1035, 4264.841, 4264.9854, 4143.0254, 4261.9316, 4062.8145, 4104.112, 4225.2764, 4172.865]
2026-01-23 07:08:13,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:08:13,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1274 [INFO]: New best (4210.87) for latency DatasetOffice
2026-01-23 07:08:13,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 34/100 (estimated time remaining: 18 hours, 44 minutes, 14 seconds)
2026-01-23 07:20:30,088 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 07:20:30,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:26:11,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4278.27588 ± 327.218
2026-01-23 07:26:11,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [4441.869, 4408.786, 4417.8447, 4096.214, 4183.887, 4551.251, 4537.812, 3393.9502, 4481.7417, 4269.397]
2026-01-23 07:26:11,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 780.0, 1000.0, 1000.0]
2026-01-23 07:26:11,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1274 [INFO]: New best (4278.28) for latency DatasetOffice
2026-01-23 07:26:11,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 35/100 (estimated time remaining: 18 hours, 46 minutes, 13 seconds)
2026-01-23 07:38:51,334 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 07:38:51,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:44:40,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4478.77832 ± 85.475
2026-01-23 07:44:40,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [4646.647, 4511.031, 4344.192, 4535.492, 4410.17, 4457.2236, 4564.5557, 4412.306, 4407.03, 4499.1353]
2026-01-23 07:44:40,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:44:40,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1274 [INFO]: New best (4478.78) for latency DatasetOffice
2026-01-23 07:44:40,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 36/100 (estimated time remaining: 18 hours, 59 minutes, 13 seconds)
2026-01-23 07:57:09,507 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 07:57:09,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:02:22,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4090.40088 ± 1180.865
2026-01-23 08:02:22,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [4710.781, 4742.818, 4741.75, 4444.4146, 4671.788, 4827.137, 1019.6421, 2709.9043, 4481.0024, 4554.771]
2026-01-23 08:02:22,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 319.0, 608.0, 1000.0, 1000.0]
2026-01-23 08:02:22,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 37/100 (estimated time remaining: 18 hours, 51 minutes, 8 seconds)
2026-01-23 08:14:10,822 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 08:14:10,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:19:36,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4349.58496 ± 1218.775
2026-01-23 08:19:36,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [4795.5234, 4952.382, 4909.65, 720.3972, 4808.783, 4506.7886, 4528.6963, 4616.2324, 4897.1367, 4760.2563]
2026-01-23 08:19:36,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 241.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:19:36,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 38/100 (estimated time remaining: 18 hours, 43 minutes, 35 seconds)
2026-01-23 08:31:20,145 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 08:31:20,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:37:06,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4700.81250 ± 227.360
2026-01-23 08:37:06,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [4866.8423, 4469.8096, 4319.0845, 4912.477, 4313.552, 4907.4136, 4788.448, 4873.274, 4738.0137, 4819.2124]
2026-01-23 08:37:06,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 884.0, 1000.0, 887.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:37:06,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1274 [INFO]: New best (4700.81) for latency DatasetOffice
2026-01-23 08:37:06,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 39/100 (estimated time remaining: 18 hours, 22 minutes, 4 seconds)
2026-01-23 08:50:00,826 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 08:50:00,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:55:39,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4590.96631 ± 662.868
2026-01-23 08:55:39,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [4790.094, 4892.1836, 4829.715, 4879.0957, 2622.7417, 4676.972, 4614.2446, 4842.5327, 4809.9155, 4952.174]
2026-01-23 08:55:39,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 586.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:55:39,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 40/100 (estimated time remaining: 18 hours, 11 minutes, 23 seconds)
2026-01-23 09:07:10,023 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 09:07:10,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:12:49,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4689.78809 ± 410.449
2026-01-23 09:12:49,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [4597.0713, 4827.039, 4971.439, 3511.597, 4637.2256, 4806.904, 4815.6177, 4856.781, 4876.8677, 4997.3447]
2026-01-23 09:12:49,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 777.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:12:49,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 41/100 (estimated time remaining: 17 hours, 37 minutes, 44 seconds)
2026-01-23 09:25:57,048 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 09:25:57,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:31:45,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4854.73877 ± 50.944
2026-01-23 09:31:45,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [4745.1733, 4899.964, 4855.9067, 4831.534, 4909.667, 4932.7437, 4814.9077, 4841.366, 4873.4478, 4842.6772]
2026-01-23 09:31:45,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:31:45,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1274 [INFO]: New best (4854.74) for latency DatasetOffice
2026-01-23 09:31:45,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 42/100 (estimated time remaining: 17 hours, 34 minutes, 41 seconds)
2026-01-23 09:43:48,068 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 09:43:48,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:47:50,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3320.47388 ± 2097.886
2026-01-23 09:47:50,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [4975.172, 4904.8887, 5057.892, 5032.544, 4963.8594, 164.16585, 194.16821, 4980.0107, 2076.0168, 856.0215]
2026-01-23 09:47:50,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 94.0, 105.0, 1000.0, 479.0, 266.0]
2026-01-23 09:47:50,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 43/100 (estimated time remaining: 17 hours, 3 minutes, 23 seconds)
2026-01-23 09:59:42,135 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 09:59:42,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:05:29,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4872.29590 ± 75.603
2026-01-23 10:05:29,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [4919.3926, 4916.0225, 4857.421, 4859.462, 4874.9053, 4757.541, 4751.7227, 4987.0015, 4971.3813, 4828.106]
2026-01-23 10:05:29,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 10:05:29,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1274 [INFO]: New best (4872.30) for latency DatasetOffice
2026-01-23 10:05:29,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 44/100 (estimated time remaining: 16 hours, 47 minutes, 36 seconds)
2026-01-23 10:17:56,956 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 10:17:56,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:23:35,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4685.26562 ± 459.262
2026-01-23 10:23:35,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [4947.26, 4990.139, 4751.5503, 4812.2397, 4912.684, 4791.789, 4704.298, 4930.166, 3342.3284, 4670.204]
2026-01-23 10:23:35,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 974.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 735.0, 1000.0]
2026-01-23 10:23:35,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 45/100 (estimated time remaining: 16 hours, 24 minutes, 51 seconds)
2026-01-23 10:35:54,836 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 10:35:54,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:41:06,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4251.67578 ± 934.827
2026-01-23 10:41:06,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [4952.0547, 4927.333, 3509.4727, 2530.121, 2593.5615, 4850.6, 4862.2266, 4725.6606, 4844.3096, 4721.4185]
2026-01-23 10:41:06,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 750.0, 568.0, 671.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 10:41:06,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 46/100 (estimated time remaining: 16 hours, 11 minutes, 8 seconds)
2026-01-23 10:53:12,342 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 10:53:12,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:58:10,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4160.34424 ± 1020.230
2026-01-23 10:58:10,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [4934.4907, 5267.0767, 3624.4333, 2317.4119, 5203.1787, 2927.5537, 5036.239, 5099.688, 3530.8293, 3662.5444]
2026-01-23 10:58:10,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 796.0, 544.0, 1000.0, 634.0, 1000.0, 1000.0, 741.0, 768.0]
2026-01-23 10:58:10,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 47/100 (estimated time remaining: 15 hours, 33 minutes, 23 seconds)
2026-01-23 11:10:32,661 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 11:10:32,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:15:21,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4226.03027 ± 2049.105
2026-01-23 11:15:21,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5364.675, 5165.686, 21.52804, 239.77573, 5232.0215, 5313.4395, 5202.9487, 5189.033, 5305.526, 5225.667]
2026-01-23 11:15:21,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 51.0, 115.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 11:15:21,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 48/100 (estimated time remaining: 15 hours, 27 minutes, 40 seconds)
2026-01-23 11:27:55,777 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 11:27:55,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:33:01,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4609.95459 ± 827.397
2026-01-23 11:33:01,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5376.862, 5256.155, 5216.6577, 3813.939, 4311.91, 4698.162, 4427.5356, 5273.334, 2607.2942, 5117.6997]
2026-01-23 11:33:01,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 750.0, 821.0, 882.0, 816.0, 1000.0, 549.0, 1000.0]
2026-01-23 11:33:01,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 49/100 (estimated time remaining: 15 hours, 10 minutes, 18 seconds)
2026-01-23 11:45:20,323 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 11:45:20,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:50:37,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4654.54736 ± 1547.894
2026-01-23 11:50:37,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5161.115, 5180.6313, 5191.9863, 5032.0464, 4891.4917, 5275.2983, 5268.0776, 24.891539, 5187.958, 5331.9756]
2026-01-23 11:50:37,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 44.0, 1000.0, 1000.0]
2026-01-23 11:50:37,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 50/100 (estimated time remaining: 14 hours, 47 minutes, 51 seconds)
2026-01-23 12:02:35,922 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 12:02:35,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 12:07:19,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4315.23584 ± 1421.862
2026-01-23 12:07:19,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [3134.9917, 4894.651, 5315.851, 5501.3345, 762.72095, 4359.873, 4824.7173, 5415.2495, 5456.2134, 3486.7546]
2026-01-23 12:07:19,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [624.0, 916.0, 1000.0, 1000.0, 236.0, 825.0, 892.0, 1000.0, 1000.0, 665.0]
2026-01-23 12:07:19,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 51/100 (estimated time remaining: 14 hours, 22 minutes, 2 seconds)
2026-01-23 12:19:38,219 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 12:19:38,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 12:25:19,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5106.65186 ± 549.956
2026-01-23 12:25:19,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5211.5146, 5342.691, 5323.929, 3471.5105, 5406.1426, 5146.1743, 5226.582, 5286.2466, 5363.946, 5287.7866]
2026-01-23 12:25:19,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 698.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 12:25:19,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1274 [INFO]: New best (5106.65) for latency DatasetOffice
2026-01-23 12:25:19,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 52/100 (estimated time remaining: 14 hours, 13 minutes, 57 seconds)
2026-01-23 12:38:28,124 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 12:38:28,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 12:42:54,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3816.24170 ± 1931.056
2026-01-23 12:42:54,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5425.605, 822.4425, 5102.9346, 107.32276, 5185.195, 5278.272, 2634.6494, 5310.885, 3037.6013, 5257.512]
2026-01-23 12:42:55,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 273.0, 1000.0, 65.0, 1000.0, 1000.0, 550.0, 1000.0, 626.0, 1000.0]
2026-01-23 12:42:55,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 53/100 (estimated time remaining: 14 hours, 36 seconds)
2026-01-23 12:54:31,808 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 12:54:31,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 12:58:30,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3509.90552 ± 2204.621
2026-01-23 12:58:30,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5643.3203, 2804.7737, 331.18738, 5382.8438, 5386.4316, 3980.035, 826.37573, 5506.7085, 5209.3726, 28.006466]
2026-01-23 12:58:30,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 556.0, 135.0, 1000.0, 1000.0, 744.0, 253.0, 1000.0, 1000.0, 43.0]
2026-01-23 12:58:30,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 54/100 (estimated time remaining: 13 hours, 23 minutes, 30 seconds)
2026-01-23 13:10:51,235 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 13:10:51,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 13:16:43,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5540.25586 ± 121.444
2026-01-23 13:16:43,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5639.157, 5447.133, 5503.9893, 5672.0005, 5259.792, 5617.264, 5488.122, 5558.9375, 5525.9097, 5690.2485]
2026-01-23 13:16:43,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 13:16:43,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1274 [INFO]: New best (5540.26) for latency DatasetOffice
2026-01-23 13:16:43,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 55/100 (estimated time remaining: 13 hours, 12 minutes, 5 seconds)
2026-01-23 13:29:29,016 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 13:29:29,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 13:34:52,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4866.67285 ± 1537.788
2026-01-23 13:34:52,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5406.193, 5500.353, 5405.849, 258.0132, 5459.1553, 5416.8774, 5294.508, 5321.932, 5271.12, 5332.7236]
2026-01-23 13:34:52,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 110.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 13:34:52,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 56/100 (estimated time remaining: 13 hours, 8 minutes, 3 seconds)
2026-01-23 13:47:09,045 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 13:47:09,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 13:51:23,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3668.74072 ± 2233.095
2026-01-23 13:51:23,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [153.88585, 5222.2515, 4953.893, 5182.3237, 606.6409, 5190.1875, 5133.697, 37.077106, 5182.092, 5025.356]
2026-01-23 13:51:23,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [78.0, 1000.0, 1000.0, 1000.0, 188.0, 1000.0, 1000.0, 36.0, 1000.0, 1000.0]
2026-01-23 13:51:23,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 57/100 (estimated time remaining: 12 hours, 37 minutes, 24 seconds)
2026-01-23 14:03:26,927 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 14:03:26,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 14:08:46,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4683.97754 ± 1506.914
2026-01-23 14:08:46,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [169.59041, 5066.24, 5151.0566, 5049.773, 5194.5405, 5145.3584, 5256.7573, 5240.7905, 5258.523, 5307.1455]
2026-01-23 14:08:46,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [89.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 14:08:46,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 58/100 (estimated time remaining: 12 hours, 18 minutes, 19 seconds)
2026-01-23 14:21:19,539 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 14:21:19,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 14:26:28,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4782.29932 ± 1527.582
2026-01-23 14:26:28,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5512.2417, 5270.298, 5394.781, 5451.3457, 5079.7437, 5451.2236, 5387.282, 5404.7944, 4608.005, 263.27917]
2026-01-23 14:26:28,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 861.0, 125.0]
2026-01-23 14:26:28,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 59/100 (estimated time remaining: 12 hours, 18 minutes, 57 seconds)
2026-01-23 14:38:54,117 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 14:38:54,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 14:44:40,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5331.62793 ± 68.186
2026-01-23 14:44:40,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5283.9995, 5359.856, 5253.1445, 5256.635, 5349.7896, 5324.732, 5426.3496, 5453.358, 5249.561, 5358.8506]
2026-01-23 14:44:40,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 14:44:40,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 60/100 (estimated time remaining: 12 hours, 1 minute, 14 seconds)
2026-01-23 14:55:45,725 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 14:55:45,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 15:01:31,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5487.78223 ± 76.031
2026-01-23 15:01:31,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5679.1143, 5412.2324, 5413.3433, 5484.4355, 5533.7856, 5429.365, 5478.604, 5489.2305, 5524.4863, 5433.2275]
2026-01-23 15:01:31,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 15:01:31,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 61/100 (estimated time remaining: 11 hours, 33 minutes, 8 seconds)
2026-01-23 15:14:18,548 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 15:14:18,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 15:18:53,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3999.47534 ± 1728.427
2026-01-23 15:18:53,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [3880.4546, 5253.5767, 5276.473, 1635.3436, 3120.6887, 152.21535, 5283.2964, 5069.965, 5164.4946, 5158.245]
2026-01-23 15:18:53,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [775.0, 1000.0, 1000.0, 377.0, 664.0, 83.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 15:18:53,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 62/100 (estimated time remaining: 11 hours, 22 minutes, 32 seconds)
2026-01-23 15:31:11,821 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 15:31:11,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 15:36:49,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5111.81299 ± 520.431
2026-01-23 15:36:49,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5291.3125, 5241.8584, 5396.696, 5306.28, 4911.2256, 3602.7166, 5275.734, 5307.075, 5386.777, 5398.4526]
2026-01-23 15:36:49,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 684.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 15:36:49,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 63/100 (estimated time remaining: 11 hours, 9 minutes, 11 seconds)
2026-01-23 15:49:28,672 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 15:49:28,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 15:55:16,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5182.31104 ± 99.499
2026-01-23 15:55:16,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5045.2188, 5198.0415, 5126.86, 5325.823, 5189.425, 5298.5864, 5000.382, 5195.057, 5278.111, 5165.6074]
2026-01-23 15:55:16,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 15:55:16,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 64/100 (estimated time remaining: 10 hours, 57 minutes, 4 seconds)
2026-01-23 16:06:26,237 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 16:06:26,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 16:12:13,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5314.30518 ± 86.746
2026-01-23 16:12:13,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5358.43, 5233.4355, 5345.3457, 5377.0146, 5308.409, 5133.0386, 5477.2534, 5290.9385, 5339.8535, 5279.3335]
2026-01-23 16:12:13,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 16:12:13,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 65/100 (estimated time remaining: 10 hours, 30 minutes, 15 seconds)
2026-01-23 16:24:26,917 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 16:24:26,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 16:30:14,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5312.19775 ± 101.118
2026-01-23 16:30:14,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5411.527, 5334.4453, 5351.591, 5342.367, 5403.651, 5330.919, 5403.805, 5282.2007, 5078.0034, 5183.465]
2026-01-23 16:30:14,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 16:30:14,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 66/100 (estimated time remaining: 10 hours, 21 minutes, 1 second)
2026-01-23 16:43:27,021 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 16:43:27,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 16:48:31,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4479.99219 ± 1463.532
2026-01-23 16:48:31,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5315.857, 5352.5293, 4770.667, 5334.092, 5221.8315, 1453.9229, 1703.1731, 4955.4077, 5331.64, 5360.802]
2026-01-23 16:48:31,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 891.0, 1000.0, 1000.0, 359.0, 396.0, 1000.0, 1000.0, 1000.0]
2026-01-23 16:48:31,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 67/100 (estimated time remaining: 10 hours, 9 minutes, 28 seconds)
2026-01-23 17:00:21,495 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 17:00:21,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 17:06:11,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5370.17676 ± 57.549
2026-01-23 17:06:11,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5459.6094, 5367.2246, 5401.642, 5313.907, 5342.309, 5356.7046, 5286.96, 5302.6953, 5432.0225, 5438.6924]
2026-01-23 17:06:11,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 17:06:11,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 68/100 (estimated time remaining: 9 hours, 49 minutes, 50 seconds)
2026-01-23 17:17:47,558 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 17:17:47,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 17:23:24,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5041.16895 ± 604.357
2026-01-23 17:23:24,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5350.7783, 5431.482, 5356.96, 5254.8984, 5044.558, 5215.3916, 3308.3337, 4815.4165, 5210.346, 5423.524]
2026-01-23 17:23:24,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 658.0, 1000.0, 1000.0, 1000.0]
2026-01-23 17:23:24,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 69/100 (estimated time remaining: 9 hours, 24 minutes, 5 seconds)
2026-01-23 17:35:27,943 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 17:35:27,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 17:41:12,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5300.82129 ± 89.121
2026-01-23 17:41:12,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5341.3164, 5337.088, 5127.4355, 5346.8774, 5201.799, 5341.594, 5250.793, 5405.139, 5233.3564, 5422.8145]
2026-01-23 17:41:12,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 17:41:12,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 70/100 (estimated time remaining: 9 hours, 11 minutes, 47 seconds)
2026-01-23 17:53:25,681 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 17:53:25,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 17:59:16,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5188.50293 ± 88.397
2026-01-23 17:59:16,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5207.709, 5133.2505, 4989.6245, 5139.433, 5176.1475, 5280.2607, 5232.6836, 5148.019, 5303.363, 5274.54]
2026-01-23 17:59:16,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 17:59:16,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 71/100 (estimated time remaining: 8 hours, 54 minutes, 10 seconds)
2026-01-23 18:11:26,273 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 18:11:26,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 18:15:13,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3225.42627 ± 2329.968
2026-01-23 18:15:13,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5008.875, 1024.5795, 58.179817, 440.24634, 38.515533, 5198.962, 5094.8667, 5248.5977, 5182.3867, 4959.0527]
2026-01-23 18:15:13,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 273.0, 46.0, 171.0, 43.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 18:15:13,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 72/100 (estimated time remaining: 8 hours, 22 minutes, 52 seconds)
2026-01-23 18:28:10,310 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 18:28:10,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 18:33:25,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4756.23926 ± 1567.594
2026-01-23 18:33:25,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5305.952, 57.222763, 5322.9385, 5305.644, 5226.191, 5348.5815, 5247.126, 5121.7026, 5303.8784, 5323.1577]
2026-01-23 18:33:25,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 48.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 18:33:25,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 73/100 (estimated time remaining: 8 hours, 8 minutes, 33 seconds)
2026-01-23 18:45:21,795 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 18:45:21,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 18:51:02,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5221.78223 ± 234.813
2026-01-23 18:51:02,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5464.1943, 5277.394, 4858.029, 4726.0435, 5390.5757, 5400.424, 5115.5693, 5285.0005, 5313.378, 5387.2144]
2026-01-23 18:51:02,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 911.0, 889.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 18:51:02,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 53 minutes, 12 seconds)
2026-01-23 19:03:30,319 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 19:03:30,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 19:09:12,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5307.69531 ± 288.746
2026-01-23 19:09:12,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5358.918, 5372.7, 5282.4473, 5536.2427, 5504.5615, 5357.6885, 5429.808, 4467.2793, 5380.8374, 5386.4697]
2026-01-23 19:09:12,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 853.0, 1000.0, 1000.0]
2026-01-23 19:09:12,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 75/100 (estimated time remaining: 7 hours, 37 minutes, 31 seconds)
2026-01-23 19:21:21,141 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 19:21:21,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 19:26:03,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4207.87842 ± 2000.162
2026-01-23 19:26:03,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [224.29178, 5251.9937, 197.12611, 5319.256, 5296.481, 5120.8584, 5191.98, 5193.0547, 5032.852, 5250.89]
2026-01-23 19:26:03,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [92.0, 1000.0, 89.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 19:26:03,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 76/100 (estimated time remaining: 7 hours, 13 minutes, 54 seconds)
2026-01-23 19:38:04,382 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 19:38:04,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 19:42:31,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3902.80151 ± 1694.466
2026-01-23 19:42:31,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5179.745, 3797.3074, 5362.3057, 5073.9717, 5029.2856, 5192.443, 5041.0327, 2319.9854, 1305.7991, 726.137]
2026-01-23 19:42:31,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 736.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 483.0, 311.0, 214.0]
2026-01-23 19:42:31,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 59 minutes)
2026-01-23 19:55:09,277 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 19:55:09,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 20:01:00,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5252.68604 ± 55.125
2026-01-23 20:01:00,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5314.5635, 5314.0825, 5171.239, 5293.0356, 5147.7666, 5211.931, 5290.9517, 5267.903, 5250.1787, 5265.2075]
2026-01-23 20:01:00,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 20:01:00,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 42 minutes, 50 seconds)
2026-01-23 20:12:31,707 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 20:12:31,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 20:18:20,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5337.13184 ± 95.653
2026-01-23 20:18:20,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5185.653, 5268.4043, 5343.8853, 5404.7446, 5432.9077, 5399.8394, 5413.582, 5381.8726, 5391.248, 5149.1753]
2026-01-23 20:18:20,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 20:18:20,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 79/100 (estimated time remaining: 6 hours, 24 minutes, 4 seconds)
2026-01-23 20:30:33,930 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 20:30:33,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 20:34:29,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3505.47144 ± 2305.504
2026-01-23 20:34:29,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5185.67, 5207.8306, 5318.5796, 2226.3394, 193.14273, -4.9559956, 654.38525, 5398.9507, 5303.8022, 5570.9717]
2026-01-23 20:34:29,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 453.0, 96.0, 20.0, 193.0, 1000.0, 1000.0, 1000.0]
2026-01-23 20:34:29,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 58 minutes, 11 seconds)
2026-01-23 20:46:40,442 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 20:46:40,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 20:52:27,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5327.79688 ± 74.105
2026-01-23 20:52:27,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5409.744, 5317.817, 5319.013, 5165.9604, 5284.473, 5296.532, 5343.1675, 5350.3545, 5462.8267, 5328.078]
2026-01-23 20:52:27,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 20:52:27,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 45 minutes, 37 seconds)
2026-01-23 21:04:36,673 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 21:04:36,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 21:10:19,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5353.96240 ± 82.633
2026-01-23 21:10:19,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5294.317, 5383.968, 5422.3877, 5375.6943, 5440.426, 5399.3706, 5149.855, 5371.8145, 5291.165, 5410.6245]
2026-01-23 21:10:19,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 21:10:19,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 33 minutes, 40 seconds)
2026-01-23 21:22:15,256 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 21:22:15,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 21:27:07,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4298.12207 ± 1652.468
2026-01-23 21:27:07,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [22.280975, 2298.1267, 4820.8794, 5089.2046, 5227.841, 5180.2114, 5101.9614, 5059.7056, 5079.2217, 5101.7847]
2026-01-23 21:27:07,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [35.0, 482.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 21:27:07,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 83/100 (estimated time remaining: 5 hours, 10 minutes, 2 seconds)
2026-01-23 21:39:21,657 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 21:39:21,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 21:45:07,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5188.39990 ± 128.673
2026-01-23 21:45:07,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5169.813, 4832.265, 5177.736, 5260.5396, 5279.0522, 5259.4307, 5261.3887, 5128.551, 5231.938, 5283.289]
2026-01-23 21:45:07,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 888.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 21:45:07,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 55 minutes, 4 seconds)
2026-01-23 21:57:20,191 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 21:57:20,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 22:02:21,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4534.48730 ± 1426.325
2026-01-23 22:02:21,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [4287.6167, 5129.215, 5292.9653, 5309.3076, 5232.468, 5279.26, 5263.354, 5186.358, 3868.7126, 495.61908]
2026-01-23 22:02:21,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [826.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 759.0, 174.0]
2026-01-23 22:02:21,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 41 minutes, 12 seconds)
2026-01-23 22:14:37,840 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 22:14:37,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 22:20:28,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5325.18359 ± 134.058
2026-01-23 22:20:28,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5346.4995, 5501.279, 5064.362, 5195.913, 5343.675, 5199.196, 5321.73, 5361.7017, 5387.0483, 5530.429]
2026-01-23 22:20:28,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 22:20:28,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 24 minutes, 2 seconds)
2026-01-23 22:32:42,615 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 22:32:42,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 22:37:36,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4372.14355 ± 1740.978
2026-01-23 22:37:36,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [47.32658, 5079.46, 5168.6523, 5119.8257, 5298.4624, 5263.6665, 5333.127, 5306.025, 5152.4375, 1952.4532]
2026-01-23 22:37:36,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [42.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 429.0]
2026-01-23 22:37:36,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 87/100 (estimated time remaining: 4 hours, 4 minutes, 23 seconds)
2026-01-23 22:49:53,736 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 22:49:53,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 22:54:17,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3852.10156 ± 1793.798
2026-01-23 22:54:17,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5135.6973, 5177.7686, 5147.058, 2999.158, 28.638596, 2906.6118, 1528.1472, 5199.1143, 5131.696, 5267.1235]
2026-01-23 22:54:17,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 606.0, 45.0, 629.0, 372.0, 1000.0, 1000.0, 1000.0]
2026-01-23 22:54:17,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 46 minutes, 36 seconds)
2026-01-23 23:06:32,094 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 23:06:32,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 23:11:20,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4290.60449 ± 1953.688
2026-01-23 23:11:20,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5331.044, 5362.5454, 5386.9033, 5280.2046, 5282.443, 5092.0366, 5142.6875, 267.3031, 5251.1025, 509.77808]
2026-01-23 23:11:20,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 122.0, 1000.0, 164.0]
2026-01-23 23:11:20,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 26 minutes, 54 seconds)
2026-01-23 23:23:28,718 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 23:23:28,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 23:28:30,673 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4541.51416 ± 1621.172
2026-01-23 23:28:30,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5386.7217, 5324.4707, 5203.8105, 5244.5244, 2672.7996, 5331.738, 5322.5728, 5326.186, 289.01538, 5313.302]
2026-01-23 23:28:30,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 528.0, 1000.0, 1000.0, 1000.0, 118.0, 1000.0]
2026-01-23 23:28:30,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 9 minutes, 32 seconds)
2026-01-23 23:40:38,821 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 23:40:38,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 23:46:28,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5296.87744 ± 85.126
2026-01-23 23:46:28,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5152.5146, 5212.248, 5410.3506, 5285.8696, 5414.9746, 5235.3013, 5239.7666, 5342.631, 5387.86, 5287.2583]
2026-01-23 23:46:28,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 23:46:28,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 51 minutes, 59 seconds)
2026-01-23 23:58:39,897 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 23:58:39,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 00:04:25,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5279.71582 ± 64.669
2026-01-24 00:04:25,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5242.611, 5291.937, 5341.9893, 5163.7725, 5297.524, 5271.4004, 5214.0366, 5237.9985, 5389.9067, 5345.9834]
2026-01-24 00:04:25,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 00:04:25,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 36 minutes, 16 seconds)
2026-01-24 00:16:35,613 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 00:16:35,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 00:20:58,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4010.66992 ± 2136.877
2026-01-24 00:20:58,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5472.0815, 5439.5874, 5413.9683, 1971.6947, 468.0929, 28.788012, 5325.6953, 5257.9316, 5401.2183, 5327.6416]
2026-01-24 00:20:58,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 401.0, 150.0, 41.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 00:20:58,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 18 minutes, 41 seconds)
2026-01-24 00:33:56,950 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 00:33:56,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 00:39:39,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5309.20312 ± 51.304
2026-01-24 00:39:39,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5321.271, 5274.6514, 5200.679, 5372.694, 5370.558, 5357.2046, 5296.3833, 5304.465, 5333.272, 5260.85]
2026-01-24 00:39:39,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 00:39:39,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 94/100 (estimated time remaining: 2 hours, 3 minutes, 39 seconds)
2026-01-24 00:51:53,380 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 00:51:53,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 00:57:27,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5134.79443 ± 441.728
2026-01-24 00:57:27,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5266.4478, 5341.7827, 5196.6924, 5212.882, 5398.8037, 5222.1787, 5244.497, 3823.5916, 5362.41, 5278.6587]
2026-01-24 00:57:27,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 722.0, 1000.0, 1000.0]
2026-01-24 00:57:27,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 46 minutes, 44 seconds)
2026-01-24 01:09:44,784 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 01:09:44,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 01:15:31,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5236.61328 ± 65.666
2026-01-24 01:15:31,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5098.5605, 5270.307, 5339.032, 5277.458, 5312.785, 5183.6904, 5234.102, 5237.6885, 5193.7207, 5218.7886]
2026-01-24 01:15:31,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 01:15:31,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 29 minutes, 3 seconds)
2026-01-24 01:27:45,608 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 01:27:45,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 01:33:30,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5264.78613 ± 55.605
2026-01-24 01:33:30,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5272.4897, 5274.8584, 5320.4736, 5288.3267, 5341.329, 5293.3164, 5253.6523, 5136.8447, 5202.0405, 5264.5317]
2026-01-24 01:33:30,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 01:33:30,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 11 minutes, 15 seconds)
2026-01-24 01:45:42,922 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 01:45:42,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 01:51:18,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4978.91895 ± 642.172
2026-01-24 01:51:18,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5117.288, 5184.0703, 5095.242, 5221.118, 5188.611, 3057.7363, 5215.172, 5219.8687, 5245.554, 5244.5244]
2026-01-24 01:51:18,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 632.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 01:51:18,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 98/100 (estimated time remaining: 54 minutes, 12 seconds)
2026-01-24 02:03:18,490 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 02:03:18,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 02:09:05,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5287.04883 ± 40.631
2026-01-24 02:09:05,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5270.844, 5288.138, 5267.364, 5308.108, 5316.634, 5196.38, 5321.053, 5319.0923, 5246.1797, 5336.697]
2026-01-24 02:09:05,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 02:09:05,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 99/100 (estimated time remaining: 35 minutes, 46 seconds)
2026-01-24 02:21:12,622 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 02:21:12,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 02:26:41,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 5019.82568 ± 1124.823
2026-01-24 02:26:41,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [5410.4463, 5365.247, 1647.3994, 5382.351, 5408.691, 5440.2236, 5460.0312, 5325.4346, 5412.6294, 5345.807]
2026-01-24 02:26:41,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 351.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 02:26:41,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1247 [INFO]: Iteration 100/100 (estimated time remaining: 17 minutes, 50 seconds)
2026-01-24 02:38:55,094 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 02:38:55,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 02:44:23,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1269 [DEBUG]: Total Reward: 4895.92871 ± 1334.330
2026-01-24 02:44:23,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1270 [DEBUG]: All rewards: [894.71906, 5291.6387, 5370.747, 5391.8384, 5312.8135, 5332.1914, 5332.315, 5325.6733, 5288.7705, 5418.582]
2026-01-24 02:44:23,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [343.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 02:44:23,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1299 [DEBUG]: Training session finished
