2026-01-22 23:01:40,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-ant/DatasetOffice-mbpac-highdim-memdelay
2026-01-22 23:01:40,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-ant/DatasetOffice-mbpac-highdim-memdelay
2026-01-22 23:01:40,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x14980859e650>}
2026-01-22 23:01:40,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1159 [DEBUG]: using device: cuda
2026-01-22 23:01:40,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1181 [INFO]: Creating new trainer
2026-01-22 23:01:40,969 baseline-mbpac-noisy-ant:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2026-01-22 23:01:40,970 baseline-mbpac-noisy-ant:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:01:40,978 baseline-mbpac-noisy-ant:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=27, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=8, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2026-01-22 23:01:41,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1242 [DEBUG]: Starting training session...
2026-01-22 23:01:41,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 1/100
2026-01-22 23:15:08,476 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-22 23:15:08,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:17:55,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: -122.74919 ± 129.918
2026-01-22 23:17:55,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [-269.0504, -321.5178, 6.0450115, -9.698872, -272.84274, -13.395711, -19.605776, -75.33698, -250.76674, -1.3219148]
2026-01-22 23:17:55,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 29.0, 121.0, 1000.0, 85.0, 77.0, 139.0, 1000.0, 48.0]
2026-01-22 23:17:55,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1274 [INFO]: New best (-122.75) for latency DatasetOffice
2026-01-22 23:17:55,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 2/100 (estimated time remaining: 26 hours, 47 minutes, 16 seconds)
2026-01-22 23:30:13,088 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-22 23:30:13,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:35:19,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 264.13400 ± 142.314
2026-01-22 23:35:19,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [290.8726, 65.19756, 255.8213, 550.05054, 212.79515, 284.23877, 33.15076, 287.1619, 245.72472, 416.32693]
2026-01-22 23:35:19,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 195.0, 1000.0, 1000.0, 1000.0, 851.0, 191.0, 1000.0, 1000.0, 843.0]
2026-01-22 23:35:19,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1274 [INFO]: New best (264.13) for latency DatasetOffice
2026-01-22 23:35:19,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 3/100 (estimated time remaining: 27 hours, 28 minutes, 7 seconds)
2026-01-22 23:50:22,069 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-22 23:50:22,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:53:57,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 201.15314 ± 134.023
2026-01-22 23:53:57,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [74.59877, 309.298, 256.0958, 47.731552, 38.000465, 402.46036, 367.2864, 250.67241, 35.778362, 229.60922]
2026-01-22 23:53:57,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [120.0, 1000.0, 497.0, 312.0, 110.0, 1000.0, 1000.0, 677.0, 91.0, 904.0]
2026-01-22 23:53:57,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 4/100 (estimated time remaining: 28 hours, 10 minutes, 8 seconds)
2026-01-23 00:08:26,729 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 00:08:26,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:12:37,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 328.48907 ± 172.121
2026-01-23 00:12:37,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [143.0844, 416.38797, 434.04776, 93.8834, 515.981, 483.75333, 136.38664, 109.297424, 465.42, 486.64902]
2026-01-23 00:12:37,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [254.0, 1000.0, 1000.0, 113.0, 1000.0, 1000.0, 155.0, 142.0, 1000.0, 1000.0]
2026-01-23 00:12:37,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1274 [INFO]: New best (328.49) for latency DatasetOffice
2026-01-23 00:12:37,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 5/100 (estimated time remaining: 28 hours, 22 minutes, 11 seconds)
2026-01-23 00:26:27,248 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 00:26:27,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:31:20,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 441.15585 ± 158.195
2026-01-23 00:31:20,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [512.46204, 429.65024, 504.65192, 419.48062, 599.73914, 39.371403, 537.495, 520.5483, 561.5423, 286.61774]
2026-01-23 00:31:20,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 55.0, 1000.0, 1000.0, 580.0, 299.0]
2026-01-23 00:31:20,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1274 [INFO]: New best (441.16) for latency DatasetOffice
2026-01-23 00:31:20,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 6/100 (estimated time remaining: 28 hours, 23 minutes, 27 seconds)
2026-01-23 00:44:58,452 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 00:44:58,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:50:47,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 841.46277 ± 180.832
2026-01-23 00:50:47,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [954.67896, 519.854, 876.3027, 857.88965, 1071.6403, 722.81195, 1015.58276, 532.69556, 944.0148, 919.157]
2026-01-23 00:50:47,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 494.0, 1000.0, 1000.0]
2026-01-23 00:50:47,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1274 [INFO]: New best (841.46) for latency DatasetOffice
2026-01-23 00:50:47,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 7/100 (estimated time remaining: 29 hours, 5 minutes, 48 seconds)
2026-01-23 01:03:37,094 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 01:03:37,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:08:57,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 550.18646 ± 198.216
2026-01-23 01:08:57,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [931.5217, 391.157, 584.5138, 744.9927, 356.74582, 717.046, 429.41266, 417.69705, 652.7201, 276.05765]
2026-01-23 01:08:57,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 444.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 691.0, 1000.0, 491.0]
2026-01-23 01:08:57,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 8/100 (estimated time remaining: 29 hours, 1 minute, 21 seconds)
2026-01-23 01:23:03,764 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 01:23:03,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:29:02,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 1098.80737 ± 184.748
2026-01-23 01:29:02,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [1241.4797, 1253.857, 682.7045, 951.2112, 1154.7325, 908.5914, 1220.3007, 1199.8563, 1292.3682, 1082.9705]
2026-01-23 01:29:02,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 711.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:29:02,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1274 [INFO]: New best (1098.81) for latency DatasetOffice
2026-01-23 01:29:02,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 9/100 (estimated time remaining: 29 hours, 9 minutes, 24 seconds)
2026-01-23 01:43:07,680 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 01:43:07,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:48:06,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 696.95593 ± 425.563
2026-01-23 01:48:06,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [619.47266, 1463.3854, 633.14703, 1121.4058, 928.4467, 79.21241, 17.218554, 623.2717, 993.658, 490.34116]
2026-01-23 01:48:06,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 60.0, 19.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:48:06,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 10/100 (estimated time remaining: 28 hours, 58 minutes, 1 second)
2026-01-23 02:00:51,935 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 02:00:51,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:06:37,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 1286.04858 ± 458.067
2026-01-23 02:06:37,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [1720.5327, 443.19745, 1668.237, 667.11035, 1613.57, 1474.6868, 1257.9398, 765.60547, 1510.9692, 1738.6366]
2026-01-23 02:06:37,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 281.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:06:37,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1274 [INFO]: New best (1286.05) for latency DatasetOffice
2026-01-23 02:06:37,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 11/100 (estimated time remaining: 28 hours, 34 minutes, 49 seconds)
2026-01-23 02:20:57,997 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 02:20:58,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:26:05,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 1026.65356 ± 498.371
2026-01-23 02:26:05,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [1550.5157, 46.144093, 1301.7767, 852.1672, 442.6701, 682.99176, 1474.1216, 1479.1356, 899.0947, 1537.919]
2026-01-23 02:26:05,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 31.0, 1000.0, 1000.0, 242.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:26:05,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 12/100 (estimated time remaining: 28 hours, 16 minutes, 29 seconds)
2026-01-23 02:39:17,181 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 02:39:17,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:45:25,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 1429.52197 ± 384.618
2026-01-23 02:45:25,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [1266.4088, 1699.9579, 1152.1346, 1684.3931, 1799.6929, 1741.6882, 1614.4775, 781.6328, 775.9447, 1778.8893]
2026-01-23 02:45:25,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 962.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:45:25,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1274 [INFO]: New best (1429.52) for latency DatasetOffice
2026-01-23 02:45:25,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 13/100 (estimated time remaining: 28 hours, 18 minutes, 3 seconds)
2026-01-23 02:59:25,903 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 02:59:25,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:04:14,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 1189.24146 ± 521.124
2026-01-23 03:04:14,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [557.4588, 1060.0548, 1676.1749, 872.5792, 1772.5269, 1771.7408, 918.5547, 1354.2435, 1686.7587, 222.32092]
2026-01-23 03:04:14,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [373.0, 1000.0, 1000.0, 461.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 114.0]
2026-01-23 03:04:14,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 14/100 (estimated time remaining: 27 hours, 36 minutes, 30 seconds)
2026-01-23 03:18:14,142 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 03:18:14,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:23:15,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 1554.16772 ± 621.721
2026-01-23 03:23:15,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [1753.2659, 1415.7501, 1927.4237, 1916.3083, 728.0313, 1947.9442, 1950.4364, 2022.5647, 54.63479, 1825.318]
2026-01-23 03:23:15,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 770.0, 1000.0, 1000.0, 385.0, 1000.0, 1000.0, 1000.0, 30.0, 1000.0]
2026-01-23 03:23:15,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1274 [INFO]: New best (1554.17) for latency DatasetOffice
2026-01-23 03:23:15,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 15/100 (estimated time remaining: 27 hours, 16 minutes, 31 seconds)
2026-01-23 03:36:25,056 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 03:36:25,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:40:46,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 1303.01831 ± 720.519
2026-01-23 03:40:46,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [444.00516, 538.20715, 1908.7694, 1990.6089, 1222.9221, 1994.1138, 486.86484, 2123.515, 397.3722, 1923.8055]
2026-01-23 03:40:46,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [219.0, 1000.0, 959.0, 1000.0, 603.0, 1000.0, 252.0, 1000.0, 221.0, 1000.0]
2026-01-23 03:40:46,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 16/100 (estimated time remaining: 26 hours, 40 minutes, 43 seconds)
2026-01-23 03:53:41,979 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 03:53:41,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:58:31,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 1209.53613 ± 547.148
2026-01-23 03:58:31,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [1096.5829, 1942.8213, 1109.5236, 1233.5328, 1856.2926, 718.33923, 1094.473, 881.6988, 1980.8699, 181.22623]
2026-01-23 03:58:31,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [580.0, 1000.0, 1000.0, 1000.0, 1000.0, 387.0, 1000.0, 1000.0, 1000.0, 102.0]
2026-01-23 03:58:31,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 17/100 (estimated time remaining: 25 hours, 52 minutes, 49 seconds)
2026-01-23 04:12:57,670 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 04:12:57,673 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:17:25,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 1205.27466 ± 784.556
2026-01-23 04:17:25,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [2116.6006, 1344.3473, 1989.535, 95.09648, 132.04271, 994.0162, 2009.32, 2130.7788, 427.56335, 813.4461]
2026-01-23 04:17:25,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 676.0, 1000.0, 50.0, 88.0, 458.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:17:25,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 18/100 (estimated time remaining: 25 hours, 27 minutes)
2026-01-23 04:31:06,956 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 04:31:06,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:36:42,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2038.94397 ± 547.584
2026-01-23 04:36:42,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [2202.506, 2287.1797, 2203.1182, 2220.6265, 2123.4343, 2297.142, 2094.8396, 2211.989, 2338.6106, 409.99387]
2026-01-23 04:36:42,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 157.0]
2026-01-23 04:36:42,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1274 [INFO]: New best (2038.94) for latency DatasetOffice
2026-01-23 04:36:42,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 19/100 (estimated time remaining: 25 hours, 16 minutes, 30 seconds)
2026-01-23 04:48:58,333 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 04:48:58,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:52:43,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 1344.58301 ± 943.850
2026-01-23 04:52:43,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [2280.3708, 1061.738, 1110.6925, 1110.9489, 2515.3013, 197.76443, 225.61394, 92.57813, 2395.575, 2455.2468]
2026-01-23 04:52:43,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 436.0, 485.0, 1000.0, 87.0, 87.0, 43.0, 1000.0, 1000.0]
2026-01-23 04:52:43,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 20/100 (estimated time remaining: 24 hours, 9 minutes, 12 seconds)
2026-01-23 05:06:18,042 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 05:06:18,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:10:44,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 1535.83215 ± 1024.083
2026-01-23 05:10:44,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [2630.9326, 2478.1423, 2333.1182, 2476.171, 197.60324, 2509.7815, 213.05988, 1581.613, 706.26904, 231.62997]
2026-01-23 05:10:44,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 102.0, 1000.0, 96.0, 1000.0, 1000.0, 98.0]
2026-01-23 05:10:44,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 21/100 (estimated time remaining: 23 hours, 59 minutes, 17 seconds)
2026-01-23 05:24:11,210 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 05:24:11,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:27:04,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 1012.55408 ± 690.565
2026-01-23 05:27:04,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [2355.381, 961.05817, 373.5364, 2134.5342, 451.08182, 958.66296, 570.11145, 214.83157, 1319.8203, 786.52386]
2026-01-23 05:27:04,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [973.0, 395.0, 169.0, 1000.0, 204.0, 361.0, 219.0, 97.0, 1000.0, 330.0]
2026-01-23 05:27:04,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 22/100 (estimated time remaining: 23 hours, 18 minutes, 59 seconds)
2026-01-23 05:41:23,157 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 05:41:23,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:46:57,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2048.91968 ± 811.890
2026-01-23 05:46:57,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [537.47906, 2667.9443, 2522.1118, 2175.9487, 2635.114, 2329.689, 388.94644, 2204.581, 2671.3816, 2356.0022]
2026-01-23 05:46:57,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [226.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 868.0, 1000.0, 1000.0]
2026-01-23 05:46:57,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1274 [INFO]: New best (2048.92) for latency DatasetOffice
2026-01-23 05:46:57,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 23/100 (estimated time remaining: 23 hours, 16 minutes, 40 seconds)
2026-01-23 06:00:35,123 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 06:00:35,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:04:18,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 1241.82690 ± 796.653
2026-01-23 06:04:18,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [1317.4608, 543.46375, 2666.8032, 2423.4727, 1717.2618, 344.03485, 876.1843, 805.1301, 1487.3788, 237.0794]
2026-01-23 06:04:18,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [453.0, 1000.0, 1000.0, 862.0, 700.0, 130.0, 1000.0, 298.0, 537.0, 111.0]
2026-01-23 06:04:18,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 24/100 (estimated time remaining: 22 hours, 28 minutes, 56 seconds)
2026-01-23 06:16:56,730 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 06:16:56,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:23:03,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2435.06641 ± 104.723
2026-01-23 06:23:03,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [2715.289, 2421.0647, 2432.083, 2292.2935, 2425.094, 2376.1472, 2489.61, 2411.9958, 2389.8257, 2397.2588]
2026-01-23 06:23:03,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:23:03,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1274 [INFO]: New best (2435.07) for latency DatasetOffice
2026-01-23 06:23:03,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 25/100 (estimated time remaining: 22 hours, 53 minutes, 5 seconds)
2026-01-23 06:36:57,352 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 06:36:57,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:42:40,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2301.29663 ± 797.775
2026-01-23 06:42:40,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [2091.011, 2762.0654, 2900.8452, 1363.5729, 3070.6353, 1271.1299, 862.7727, 2959.0417, 2670.5369, 3061.3562]
2026-01-23 06:42:40,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 327.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:42:40,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 26/100 (estimated time remaining: 22 hours, 59 minutes, 4 seconds)
2026-01-23 06:55:58,362 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 06:55:58,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:00:54,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 1938.36743 ± 1049.327
2026-01-23 07:00:54,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [3205.9739, 357.07394, 1939.0126, 3088.369, 635.91705, 2631.7664, 2342.2163, 320.11673, 2691.861, 2171.3674]
2026-01-23 07:00:54,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 128.0, 1000.0, 984.0, 1000.0, 893.0, 1000.0, 123.0, 1000.0, 1000.0]
2026-01-23 07:00:54,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 27/100 (estimated time remaining: 23 hours, 8 minutes, 38 seconds)
2026-01-23 07:14:20,603 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 07:14:20,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:19:43,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2628.58447 ± 892.739
2026-01-23 07:19:43,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [3133.2302, 520.66223, 3087.8555, 2964.8428, 3159.7817, 1303.4585, 3119.8325, 3228.725, 2585.393, 3182.0645]
2026-01-23 07:19:43,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 171.0, 1000.0, 1000.0, 1000.0, 451.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:19:43,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1274 [INFO]: New best (2628.58) for latency DatasetOffice
2026-01-23 07:19:43,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 28/100 (estimated time remaining: 22 hours, 34 minutes, 37 seconds)
2026-01-23 07:33:50,161 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 07:33:50,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:39:29,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2859.01172 ± 876.234
2026-01-23 07:39:29,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [3168.0513, 2965.9185, 3173.758, 3254.899, 3155.0107, 2956.0886, 3136.568, 3179.664, 3348.555, 251.60313]
2026-01-23 07:39:29,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 926.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 86.0]
2026-01-23 07:39:29,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1274 [INFO]: New best (2859.01) for latency DatasetOffice
2026-01-23 07:39:29,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 29/100 (estimated time remaining: 22 hours, 50 minutes, 41 seconds)
2026-01-23 07:53:41,567 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 07:53:41,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:58:02,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2268.16650 ± 1215.943
2026-01-23 07:58:02,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [3429.472, 1379.0833, 2831.9143, 3178.1836, 357.2801, 3282.5361, 3258.3528, 3320.064, 296.94208, 1347.8368]
2026-01-23 07:58:02,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 481.0, 866.0, 920.0, 125.0, 1000.0, 1000.0, 1000.0, 133.0, 457.0]
2026-01-23 07:58:02,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 30/100 (estimated time remaining: 22 hours, 28 minutes, 44 seconds)
2026-01-23 08:11:29,707 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 08:11:29,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:16:57,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2338.39844 ± 1017.896
2026-01-23 08:16:57,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [3269.9133, 526.3254, 3356.8354, 2753.887, 3165.3828, 1615.0791, 1778.4092, 3044.8203, 783.96075, 3089.3718]
2026-01-23 08:16:57,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 217.0, 1000.0, 826.0, 1000.0, 1000.0, 757.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:16:57,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 31/100 (estimated time remaining: 21 hours, 59 minutes, 56 seconds)
2026-01-23 08:30:11,021 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 08:30:11,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:35:13,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2617.37256 ± 895.885
2026-01-23 08:35:13,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [3422.869, 1370.5371, 3332.914, 1409.4182, 3131.851, 1026.2964, 3260.1177, 2962.7874, 3143.2595, 3113.6753]
2026-01-23 08:35:13,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 417.0, 1000.0, 473.0, 1000.0, 330.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:35:13,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 32/100 (estimated time remaining: 21 hours, 41 minutes, 46 seconds)
2026-01-23 08:49:55,595 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 08:49:55,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:56:03,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3141.65015 ± 256.676
2026-01-23 08:56:03,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [2990.5725, 3155.1, 3304.016, 2870.7585, 3215.4443, 3370.449, 3242.2634, 2536.4927, 3337.9277, 3393.4768]
2026-01-23 08:56:03,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 901.0, 1000.0, 868.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:56:03,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1274 [INFO]: New best (3141.65) for latency DatasetOffice
2026-01-23 08:56:03,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 33/100 (estimated time remaining: 21 hours, 49 minutes, 58 seconds)
2026-01-23 09:09:07,989 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 09:09:07,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:14:28,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2666.94824 ± 1110.677
2026-01-23 09:14:28,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [359.774, 3432.2234, 3243.4116, 3145.46, 3476.9426, 1315.3075, 3357.5322, 3262.7983, 1416.0027, 3660.032]
2026-01-23 09:14:28,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [117.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 403.0, 1000.0]
2026-01-23 09:14:28,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 34/100 (estimated time remaining: 21 hours, 12 minutes, 45 seconds)
2026-01-23 09:27:57,498 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 09:27:57,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:33:52,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3416.97729 ± 497.735
2026-01-23 09:33:52,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [1968.3733, 3443.0586, 3658.39, 3486.2986, 3676.2366, 3611.8135, 3693.3022, 3781.3293, 3406.099, 3444.8735]
2026-01-23 09:33:52,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [559.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 998.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:33:52,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1274 [INFO]: New best (3416.98) for latency DatasetOffice
2026-01-23 09:33:52,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 35/100 (estimated time remaining: 21 hours, 5 minutes, 4 seconds)
2026-01-23 09:48:03,875 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 09:48:03,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:53:20,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2864.01123 ± 1297.452
2026-01-23 09:53:20,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [3626.0918, 3521.886, 245.95798, 3407.675, 3503.013, 353.4444, 3583.5088, 3597.5867, 3805.3208, 2995.63]
2026-01-23 09:53:20,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 83.0, 1000.0, 1000.0, 265.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:53:20,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 36/100 (estimated time remaining: 20 hours, 53 minutes, 5 seconds)
2026-01-23 10:07:37,804 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 10:07:37,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:13:24,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3268.65210 ± 689.936
2026-01-23 10:13:24,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [3673.66, 3611.738, 3376.2842, 3779.9785, 3565.2495, 3638.1174, 3287.9033, 1335.2568, 2866.0984, 3552.236]
2026-01-23 10:13:24,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 377.0, 810.0, 1000.0]
2026-01-23 10:13:24,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 37/100 (estimated time remaining: 20 hours, 56 minutes, 45 seconds)
2026-01-23 10:27:20,402 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 10:27:20,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:32:07,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2180.91357 ± 1371.661
2026-01-23 10:32:07,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [898.7332, 3681.694, 3647.7754, 31.305279, 1537.5717, 3193.5413, 2257.6228, 3377.1877, 79.32759, 3104.3745]
2026-01-23 10:32:07,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 23.0, 407.0, 865.0, 621.0, 1000.0, 1000.0, 775.0]
2026-01-23 10:32:07,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 38/100 (estimated time remaining: 20 hours, 10 minutes, 27 seconds)
2026-01-23 10:45:23,894 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 10:45:23,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:51:21,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2911.54980 ± 1066.647
2026-01-23 10:51:21,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [3589.7244, 3734.8474, 2712.0615, 3527.5496, 432.6715, 1877.6962, 3793.4944, 2113.9495, 3828.0334, 3505.4692]
2026-01-23 10:51:21,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 600.0, 1000.0, 1000.0]
2026-01-23 10:51:21,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 39/100 (estimated time remaining: 20 hours, 1 minute, 22 seconds)
2026-01-23 11:05:03,734 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 11:05:03,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:09:52,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2579.77368 ± 1376.167
2026-01-23 11:09:52,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [3028.2258, 1551.3169, 3801.229, 3728.7952, 153.17596, 1581.9426, 3923.2214, 3557.6174, 3815.037, 657.1758]
2026-01-23 11:09:52,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 47.0, 417.0, 1000.0, 1000.0, 1000.0, 177.0]
2026-01-23 11:09:52,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 40/100 (estimated time remaining: 19 hours, 31 minutes, 11 seconds)
2026-01-23 11:24:04,927 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 11:24:04,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:29:04,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3075.73804 ± 1258.025
2026-01-23 11:29:04,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [825.4255, 3860.0427, 902.9446, 4048.4944, 3800.898, 1873.9069, 3732.8528, 4015.046, 3855.1912, 3842.5806]
2026-01-23 11:29:04,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [213.0, 1000.0, 230.0, 1000.0, 1000.0, 517.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 11:29:04,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 41/100 (estimated time remaining: 19 hours, 8 minutes, 49 seconds)
2026-01-23 11:43:36,498 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 11:43:36,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:49:12,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2854.02832 ± 1054.715
2026-01-23 11:49:12,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [897.794, 1870.0309, 3581.8235, 4029.0383, 2054.8394, 2002.2174, 4047.737, 3855.2695, 3603.425, 2598.1074]
2026-01-23 11:49:12,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [342.0, 558.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 11:49:12,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 42/100 (estimated time remaining: 18 hours, 50 minutes, 25 seconds)
2026-01-23 12:02:17,688 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 12:02:17,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 12:06:15,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2418.09326 ± 1314.750
2026-01-23 12:06:15,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [836.2131, 2797.9666, 1063.179, 408.9354, 1452.3468, 3659.93, 4234.9375, 3949.171, 3207.5776, 2570.6736]
2026-01-23 12:06:15,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [234.0, 723.0, 278.0, 112.0, 373.0, 1000.0, 1000.0, 1000.0, 1000.0, 626.0]
2026-01-23 12:06:15,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 43/100 (estimated time remaining: 18 hours, 12 minutes)
2026-01-23 12:20:00,661 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 12:20:00,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 12:25:42,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3001.93921 ± 1378.524
2026-01-23 12:25:42,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [254.41537, 3833.6338, 2909.7393, 3897.3306, 3827.668, 3614.3735, 3624.386, 358.38464, 4052.4832, 3646.9792]
2026-01-23 12:25:42,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 134.0, 1000.0, 1000.0]
2026-01-23 12:25:42,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 44/100 (estimated time remaining: 17 hours, 55 minutes, 35 seconds)
2026-01-23 12:40:23,133 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 12:40:23,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 12:45:23,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3005.97217 ± 1409.646
2026-01-23 12:45:23,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [3950.7644, 3505.4924, 3931.4758, 2734.237, 3781.351, 3909.385, 3570.3389, 217.21469, 348.2006, 4111.261]
2026-01-23 12:45:23,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 738.0, 1000.0, 1000.0, 1000.0, 92.0, 99.0, 1000.0]
2026-01-23 12:45:23,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 45/100 (estimated time remaining: 17 hours, 49 minutes, 45 seconds)
2026-01-23 12:59:13,104 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 12:59:13,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 13:04:55,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3712.63599 ± 722.378
2026-01-23 13:04:55,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4009.9343, 1794.7651, 3906.387, 4209.9775, 4154.481, 4009.599, 4164.629, 3966.1667, 3946.268, 2964.154]
2026-01-23 13:04:55,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 484.0, 973.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 725.0]
2026-01-23 13:04:55,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1274 [INFO]: New best (3712.64) for latency DatasetOffice
2026-01-23 13:04:55,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 46/100 (estimated time remaining: 17 hours, 34 minutes, 15 seconds)
2026-01-23 13:19:15,465 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 13:19:15,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 13:24:22,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2913.04395 ± 1336.094
2026-01-23 13:24:22,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [3867.6484, 2889.2834, 3888.3418, 594.4184, 3208.2822, 3507.4858, 3770.5742, 3760.1052, 3602.4473, 41.852642]
2026-01-23 13:24:22,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 193.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 43.0]
2026-01-23 13:24:22,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 47/100 (estimated time remaining: 17 hours, 7 minutes, 42 seconds)
2026-01-23 13:37:42,265 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 13:37:42,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 13:42:17,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2769.78662 ± 1487.208
2026-01-23 13:42:17,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4135.191, 4169.51, 1503.6213, 1607.4908, 4376.921, 495.10687, 1044.5206, 2001.997, 4303.851, 4059.6555]
2026-01-23 13:42:17,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 394.0, 1000.0, 144.0, 259.0, 491.0, 1000.0, 1000.0]
2026-01-23 13:42:17,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 48/100 (estimated time remaining: 16 hours, 57 minutes, 57 seconds)
2026-01-23 13:56:03,233 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 13:56:03,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 14:01:08,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3364.45752 ± 1565.206
2026-01-23 14:01:08,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4127.7744, 175.84903, 4268.632, 298.02734, 4100.4785, 4197.8467, 4144.1025, 4014.396, 4160.9824, 4156.4844]
2026-01-23 14:01:08,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 63.0, 1000.0, 84.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 14:01:08,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 49/100 (estimated time remaining: 16 hours, 32 minutes, 23 seconds)
2026-01-23 14:15:04,693 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 14:15:04,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 14:20:23,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2671.37720 ± 1563.848
2026-01-23 14:20:23,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4282.129, 3607.9072, 285.2404, 3759.061, 4014.6775, 4046.5808, 953.5054, 963.02527, 918.20557, 3883.4397]
2026-01-23 14:20:23,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 160.0, 1000.0, 1000.0, 1000.0, 1000.0, 220.0, 1000.0, 1000.0]
2026-01-23 14:20:23,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 50/100 (estimated time remaining: 16 hours, 9 minutes, 5 seconds)
2026-01-23 14:34:00,232 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 14:34:00,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 14:39:05,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2875.65259 ± 1578.956
2026-01-23 14:39:05,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4211.6865, 4217.8047, 2612.4392, 1417.2972, 4392.906, 4137.908, 115.99709, 3909.0237, 3379.0325, 362.42966]
2026-01-23 14:39:05,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 361.0, 1000.0, 1000.0, 45.0, 1000.0, 767.0, 1000.0]
2026-01-23 14:39:05,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 51/100 (estimated time remaining: 15 hours, 41 minutes, 40 seconds)
2026-01-23 14:52:43,591 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 14:52:43,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 14:57:13,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2355.68994 ± 1219.145
2026-01-23 14:57:13,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [1412.2487, 3823.8494, 1425.6122, 4014.4219, 767.5841, 637.88654, 2671.5527, 3925.042, 2164.0547, 2714.6445]
2026-01-23 14:57:13,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 191.0, 178.0, 744.0, 1000.0, 497.0, 661.0]
2026-01-23 14:57:13,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 52/100 (estimated time remaining: 15 hours, 9 minutes, 56 seconds)
2026-01-23 15:11:28,129 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 15:11:28,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 15:15:14,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2423.97290 ± 1373.736
2026-01-23 15:15:14,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [1702.9817, 502.0593, 3607.8206, 909.3083, 4048.785, 1755.7626, 4328.8584, 4129.9023, 1444.1177, 1810.1337]
2026-01-23 15:15:14,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [401.0, 145.0, 1000.0, 366.0, 1000.0, 437.0, 1000.0, 1000.0, 388.0, 416.0]
2026-01-23 15:15:14,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 53/100 (estimated time remaining: 14 hours, 52 minutes, 17 seconds)
2026-01-23 15:28:11,089 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 15:28:11,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 15:33:02,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3286.73975 ± 1385.432
2026-01-23 15:33:02,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4161.909, 4294.355, 1321.753, 4383.363, 642.44586, 3923.0544, 4168.61, 4084.6707, 1658.6638, 4228.573]
2026-01-23 15:33:02,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 308.0, 1000.0, 186.0, 1000.0, 1000.0, 1000.0, 418.0, 1000.0]
2026-01-23 15:33:02,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 54/100 (estimated time remaining: 14 hours, 23 minutes, 53 seconds)
2026-01-23 15:46:25,890 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 15:46:25,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 15:51:48,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2751.48413 ± 1561.770
2026-01-23 15:51:48,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4115.777, 1958.5355, 4403.0435, 2278.5051, 519.17566, 4235.167, 1097.9933, 4288.148, 537.41046, 4081.0862]
2026-01-23 15:51:48,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 562.0, 1000.0, 1000.0, 269.0, 1000.0, 1000.0, 1000.0]
2026-01-23 15:51:48,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 55/100 (estimated time remaining: 14 hours, 57 seconds)
2026-01-23 16:06:24,897 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 16:06:24,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 16:12:03,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3898.55127 ± 727.687
2026-01-23 16:12:03,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4115.1416, 2969.2583, 4176.631, 4172.856, 2050.8875, 4286.295, 4312.74, 4334.683, 4353.71, 4213.31]
2026-01-23 16:12:03,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 729.0, 1000.0, 1000.0, 477.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 16:12:03,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1274 [INFO]: New best (3898.55) for latency DatasetOffice
2026-01-23 16:12:03,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 56/100 (estimated time remaining: 13 hours, 56 minutes, 44 seconds)
2026-01-23 16:25:45,383 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 16:25:45,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 16:30:51,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3747.53198 ± 1447.368
2026-01-23 16:30:51,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4558.7144, 4451.5146, 269.25995, 4520.265, 4234.919, 4657.368, 1617.2454, 4659.867, 4529.3667, 3976.8003]
2026-01-23 16:30:51,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 84.0, 1000.0, 939.0, 1000.0, 355.0, 1000.0, 1000.0, 1000.0]
2026-01-23 16:30:51,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 57/100 (estimated time remaining: 13 hours, 43 minutes, 57 seconds)
2026-01-23 16:43:37,298 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 16:43:37,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 16:47:36,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2660.45435 ± 1572.861
2026-01-23 16:47:36,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4113.468, 193.46454, 3832.1475, 4408.9624, 2113.6758, 2303.8772, 1325.0505, 221.7866, 3806.0823, 4286.0317]
2026-01-23 16:47:36,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 65.0, 936.0, 1000.0, 513.0, 544.0, 327.0, 104.0, 1000.0, 1000.0]
2026-01-23 16:47:36,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 58/100 (estimated time remaining: 13 hours, 14 minutes, 19 seconds)
2026-01-23 17:01:25,251 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 17:01:25,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 17:07:01,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3830.26562 ± 585.905
2026-01-23 17:07:01,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4134.564, 4270.9517, 4066.8975, 4409.059, 2906.1594, 2806.0874, 3255.2827, 3780.3723, 4392.3887, 4280.89]
2026-01-23 17:07:01,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 657.0, 659.0, 754.0, 1000.0, 1000.0, 1000.0]
2026-01-23 17:07:01,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 59/100 (estimated time remaining: 13 hours, 9 minutes, 31 seconds)
2026-01-23 17:21:23,142 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 17:21:23,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 17:26:52,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3683.02979 ± 974.671
2026-01-23 17:26:52,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [2972.0674, 4395.5337, 4477.315, 2831.0996, 4141.4893, 4259.6226, 1405.0354, 4465.9575, 3370.1387, 4512.035]
2026-01-23 17:26:52,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 694.0, 1000.0, 1000.0, 310.0, 1000.0, 807.0, 1000.0]
2026-01-23 17:26:52,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 60/100 (estimated time remaining: 12 hours, 59 minutes, 31 seconds)
2026-01-23 17:40:51,500 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 17:40:51,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 17:44:35,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 1281.54431 ± 1434.954
2026-01-23 17:44:35,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [585.5703, 1461.7616, 268.46048, 3324.1904, 79.969444, 1381.0271, 293.02533, 862.52313, 4546.3115, 12.604135]
2026-01-23 17:44:35,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 344.0, 1000.0, 1000.0, 1000.0, 350.0, 119.0, 183.0, 1000.0, 13.0]
2026-01-23 17:44:35,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 61/100 (estimated time remaining: 12 hours, 20 minutes, 14 seconds)
2026-01-23 17:57:46,377 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 17:57:46,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 18:03:52,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 4050.13428 ± 973.454
2026-01-23 18:03:52,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [1246.2487, 4303.3667, 4311.4155, 4793.545, 4552.3706, 4374.1094, 4434.443, 3699.3215, 4211.8022, 4574.7207]
2026-01-23 18:03:52,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 845.0, 1000.0, 1000.0]
2026-01-23 18:03:52,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1274 [INFO]: New best (4050.13) for latency DatasetOffice
2026-01-23 18:03:52,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 62/100 (estimated time remaining: 12 hours, 5 minutes, 28 seconds)
2026-01-23 18:17:50,854 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 18:17:50,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 18:22:42,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3020.76221 ± 1554.889
2026-01-23 18:22:42,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [2530.1296, 1256.4728, 4442.807, 4546.5913, 1400.5148, 4656.719, 173.7059, 2720.3599, 4211.9834, 4268.3374]
2026-01-23 18:22:42,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [581.0, 299.0, 1000.0, 1000.0, 322.0, 1000.0, 1000.0, 626.0, 1000.0, 1000.0]
2026-01-23 18:22:42,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 63/100 (estimated time remaining: 12 hours, 2 minutes, 48 seconds)
2026-01-23 18:36:53,170 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 18:36:53,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 18:41:24,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2709.60327 ± 1456.730
2026-01-23 18:41:24,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [1268.2355, 4570.959, 3997.9514, 660.9627, 1304.0598, 3232.2644, 4347.9023, 4365.929, 1667.2213, 1680.5474]
2026-01-23 18:41:24,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [314.0, 1000.0, 1000.0, 198.0, 374.0, 1000.0, 1000.0, 1000.0, 1000.0, 419.0]
2026-01-23 18:41:24,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 64/100 (estimated time remaining: 11 hours, 38 minutes, 22 seconds)
2026-01-23 18:54:14,575 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 18:54:14,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 19:00:21,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 4238.07910 ± 272.114
2026-01-23 19:00:21,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [3785.0923, 4456.411, 4593.3706, 3680.359, 4309.1753, 4274.2456, 4300.2393, 4309.9414, 4424.73, 4247.225]
2026-01-23 19:00:21,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [889.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 995.0]
2026-01-23 19:00:21,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1274 [INFO]: New best (4238.08) for latency DatasetOffice
2026-01-23 19:00:21,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 65/100 (estimated time remaining: 11 hours, 13 minutes, 4 seconds)
2026-01-23 19:14:57,378 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 19:14:57,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 19:21:12,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 4338.03076 ± 148.574
2026-01-23 19:21:12,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4288.687, 4405.8247, 4461.058, 4203.7295, 4008.8328, 4486.038, 4372.8057, 4551.5684, 4282.2705, 4319.493]
2026-01-23 19:21:12,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 19:21:12,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1274 [INFO]: New best (4338.03) for latency DatasetOffice
2026-01-23 19:21:12,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 66/100 (estimated time remaining: 11 hours, 16 minutes, 15 seconds)
2026-01-23 19:35:20,679 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 19:35:20,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 19:39:35,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2819.33521 ± 1700.864
2026-01-23 19:39:35,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [2568.4275, 492.8732, 1193.1467, 1668.829, 4241.109, 4609.69, 4242.409, 4621.1045, 228.34544, 4327.417]
2026-01-23 19:39:35,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 126.0, 275.0, 400.0, 1000.0, 1000.0, 1000.0, 992.0, 93.0, 1000.0]
2026-01-23 19:39:35,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 67/100 (estimated time remaining: 10 hours, 50 minutes, 55 seconds)
2026-01-23 19:52:55,089 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 19:52:55,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 19:57:57,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3789.47729 ± 1684.384
2026-01-23 19:57:57,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4809.301, 4514.482, 4498.9897, 4495.9546, 4685.7686, 4723.9004, 34.672665, 4731.4185, 860.61835, 4539.666]
2026-01-23 19:57:57,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 22.0, 1000.0, 189.0, 1000.0]
2026-01-23 19:57:57,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 68/100 (estimated time remaining: 10 hours, 28 minutes, 38 seconds)
2026-01-23 20:12:09,145 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 20:12:09,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 20:16:28,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2982.75732 ± 1788.718
2026-01-23 20:16:28,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4592.975, 4166.347, 752.99255, 354.66183, 4578.8755, 1893.4368, 439.88663, 3912.0051, 4693.8325, 4442.561]
2026-01-23 20:16:28,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 182.0, 122.0, 1000.0, 445.0, 243.0, 1000.0, 1000.0, 1000.0]
2026-01-23 20:16:28,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 69/100 (estimated time remaining: 10 hours, 8 minutes, 27 seconds)
2026-01-23 20:29:09,992 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 20:29:09,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 20:34:51,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 4154.43213 ± 1080.377
2026-01-23 20:34:51,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4592.1855, 4427.168, 4358.0693, 943.3517, 4399.4746, 4265.347, 4790.814, 4549.0586, 4552.353, 4666.502]
2026-01-23 20:34:51,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 200.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 20:34:51,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 70/100 (estimated time remaining: 9 hours, 45 minutes, 54 seconds)
2026-01-23 20:49:10,398 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 20:49:10,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 20:54:11,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2867.65283 ± 1584.146
2026-01-23 20:54:11,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [585.0891, 724.5084, 3201.258, 2545.0444, 4319.2607, 4400.1016, 4662.3745, 939.70734, 4758.698, 2540.487]
2026-01-23 20:54:11,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 175.0, 667.0, 1000.0, 1000.0, 1000.0, 1000.0, 216.0, 1000.0, 1000.0]
2026-01-23 20:54:11,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 71/100 (estimated time remaining: 9 hours, 17 minutes, 58 seconds)
2026-01-23 21:08:05,713 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 21:08:05,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 21:12:07,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2818.73389 ± 1845.451
2026-01-23 21:12:07,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4709.5146, 4577.1895, 4412.1035, 1874.9905, 97.69176, 429.19748, 4577.8335, 4712.941, 1649.4005, 1146.4783]
2026-01-23 21:12:07,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 505.0, 59.0, 210.0, 1000.0, 1000.0, 389.0, 284.0]
2026-01-23 21:12:07,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 72/100 (estimated time remaining: 8 hours, 56 minutes, 43 seconds)
2026-01-23 21:26:34,686 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 21:26:34,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 21:32:18,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3982.52539 ± 836.022
2026-01-23 21:32:18,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [2960.6453, 4593.1636, 3794.3943, 4679.677, 4523.2197, 4686.8984, 4595.4727, 3105.813, 2316.9817, 4568.988]
2026-01-23 21:32:18,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [684.0, 1000.0, 808.0, 1000.0, 1000.0, 1000.0, 1000.0, 705.0, 1000.0, 1000.0]
2026-01-23 21:32:18,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 73/100 (estimated time remaining: 8 hours, 48 minutes, 21 seconds)
2026-01-23 21:45:32,593 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 21:45:32,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 21:50:51,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3956.47339 ± 1283.850
2026-01-23 21:50:51,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4483.303, 4726.9688, 4621.8916, 710.70264, 4653.7583, 4797.9077, 4504.153, 4355.9873, 4417.208, 2292.8538]
2026-01-23 21:50:51,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 159.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 485.0]
2026-01-23 21:50:51,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 74/100 (estimated time remaining: 8 hours, 29 minutes, 42 seconds)
2026-01-23 22:04:16,890 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 22:04:16,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 22:10:27,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 4117.82129 ± 746.880
2026-01-23 22:10:27,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4433.1855, 4291.6895, 3982.7795, 1931.9244, 4561.716, 4496.961, 4366.683, 4565.5166, 4260.165, 4287.596]
2026-01-23 22:10:27,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 22:10:27,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 75/100 (estimated time remaining: 8 hours, 17 minutes, 7 seconds)
2026-01-23 22:24:41,199 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 22:24:41,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 22:28:56,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2160.19165 ± 1241.357
2026-01-23 22:28:56,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [3176.5317, 959.9844, 913.7619, 1535.3113, 1836.7664, 3303.2346, 2817.0813, 2033.8452, 4620.427, 404.97034]
2026-01-23 22:28:56,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 210.0, 1000.0, 1000.0, 406.0, 704.0, 570.0, 1000.0, 1000.0, 98.0]
2026-01-23 22:28:56,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 76/100 (estimated time remaining: 7 hours, 53 minutes, 42 seconds)
2026-01-23 22:42:44,334 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 22:42:44,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 22:48:51,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 4037.02344 ± 1117.799
2026-01-23 22:48:51,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4489.847, 4517.5024, 4698.34, 4752.8696, 4385.5156, 1544.5643, 4710.349, 4354.705, 4790.7573, 2125.7803]
2026-01-23 22:48:51,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 22:48:51,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 77/100 (estimated time remaining: 7 hours, 44 minutes, 17 seconds)
2026-01-23 23:02:49,998 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 23:02:50,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 23:07:35,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3022.60376 ± 1793.559
2026-01-23 23:07:35,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [3996.2964, 4347.757, 831.76776, 410.42972, 462.61087, 4721.081, 1827.8407, 4502.8003, 4704.917, 4420.5366]
2026-01-23 23:07:35,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [868.0, 1000.0, 230.0, 116.0, 1000.0, 1000.0, 434.0, 1000.0, 1000.0, 1000.0]
2026-01-23 23:07:35,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 78/100 (estimated time remaining: 7 hours, 18 minutes, 18 seconds)
2026-01-23 23:21:04,950 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 23:21:04,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 23:26:24,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3564.06885 ± 1659.368
2026-01-23 23:26:24,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [313.86438, 2913.8452, 4550.373, 4939.5005, 579.75806, 4808.619, 4532.22, 4680.935, 3753.4004, 4568.1743]
2026-01-23 23:26:24,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 622.0, 1000.0, 1000.0, 139.0, 1000.0, 1000.0, 1000.0, 829.0, 1000.0]
2026-01-23 23:26:24,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 79/100 (estimated time remaining: 7 hours, 25 seconds)
2026-01-23 23:40:29,669 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 23:40:29,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 23:45:49,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3293.23755 ± 1686.469
2026-01-23 23:45:49,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [245.34631, 4483.0244, 3171.0857, 4629.058, 4282.6235, 4522.149, 4572.2397, 4396.8247, 138.65822, 2491.3667]
2026-01-23 23:45:49,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 939.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 86.0, 615.0]
2026-01-23 23:45:49,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 80/100 (estimated time remaining: 6 hours, 40 minutes, 34 seconds)
2026-01-23 23:59:01,727 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 23:59:01,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 00:03:36,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3394.28394 ± 1251.471
2026-01-24 00:03:36,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [2726.1904, 4563.4453, 2221.5496, 4355.0923, 2943.5461, 4724.9844, 1593.905, 4654.114, 4588.327, 1571.6874]
2026-01-24 00:03:36,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [587.0, 1000.0, 504.0, 1000.0, 654.0, 1000.0, 339.0, 1000.0, 1000.0, 403.0]
2026-01-24 00:03:36,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 81/100 (estimated time remaining: 6 hours, 18 minutes, 42 seconds)
2026-01-24 00:17:56,292 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 00:17:56,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 00:23:27,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 4055.16992 ± 1170.743
2026-01-24 00:23:27,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4613.394, 4606.886, 4358.865, 4297.275, 4284.436, 4330.495, 4579.787, 4532.635, 4385.448, 562.47723]
2026-01-24 00:23:27,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 149.0]
2026-01-24 00:23:27,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 59 minutes, 27 seconds)
2026-01-24 00:36:45,582 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 00:36:45,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 00:41:16,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 2697.75049 ± 1847.713
2026-01-24 00:41:16,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [1532.6506, 287.25082, 4731.258, 4645.4106, 4718.2593, 4769.291, 424.522, 3506.504, 1533.7292, 828.6309]
2026-01-24 00:41:16,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 102.0, 766.0, 375.0, 233.0]
2026-01-24 00:41:16,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 83/100 (estimated time remaining: 5 hours, 37 minutes, 15 seconds)
2026-01-24 00:54:49,290 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 00:54:49,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 01:00:19,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 4170.27686 ± 947.067
2026-01-24 01:00:19,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4707.166, 4859.734, 4736.6367, 4458.96, 4415.796, 1672.8597, 4461.5825, 3175.0923, 4760.3755, 4454.566]
2026-01-24 01:00:19,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 422.0, 1000.0, 685.0, 1000.0, 1000.0]
2026-01-24 01:00:19,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 84/100 (estimated time remaining: 5 hours, 19 minutes, 16 seconds)
2026-01-24 01:14:11,545 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 01:14:11,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 01:20:22,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 4600.45361 ± 176.679
2026-01-24 01:20:22,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4480.6772, 4706.432, 4222.729, 4485.0522, 4903.9785, 4694.428, 4665.6685, 4714.0176, 4636.738, 4494.813]
2026-01-24 01:20:22,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 01:20:22,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1274 [INFO]: New best (4600.45) for latency DatasetOffice
2026-01-24 01:20:22,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 85/100 (estimated time remaining: 5 hours, 2 minutes, 32 seconds)
2026-01-24 01:34:00,845 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 01:34:00,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 01:38:50,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3609.46558 ± 1270.603
2026-01-24 01:38:50,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4809.679, 4727.0737, 4710.82, 2768.3723, 4576.5713, 2378.3804, 4291.845, 2033.3301, 1298.5223, 4500.0635]
2026-01-24 01:38:50,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 620.0, 1000.0, 517.0, 1000.0, 475.0, 353.0, 1000.0]
2026-01-24 01:38:50,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 45 minutes, 39 seconds)
2026-01-24 01:52:14,057 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 01:52:14,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 01:57:14,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3556.77466 ± 1761.953
2026-01-24 01:57:14,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4872.0615, 207.55081, 2744.644, 4649.18, 4739.2783, 276.37158, 3989.9312, 4664.4756, 4690.5825, 4733.6733]
2026-01-24 01:57:14,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 66.0, 1000.0, 1000.0, 1000.0, 86.0, 854.0, 1000.0, 1000.0, 1000.0]
2026-01-24 01:57:14,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 87/100 (estimated time remaining: 4 hours, 22 minutes, 37 seconds)
2026-01-24 02:10:59,795 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 02:10:59,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 02:16:45,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3986.63086 ± 1163.074
2026-01-24 02:16:45,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4541.0415, 2392.415, 3381.149, 1473.1665, 4723.567, 5114.7646, 4758.2466, 4747.691, 3769.4343, 4964.831]
2026-01-24 02:16:45,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 485.0, 751.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 02:16:45,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 88/100 (estimated time remaining: 4 hours, 8 minutes, 15 seconds)
2026-01-24 02:30:15,496 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 02:30:15,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 02:36:27,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 4758.20361 ± 121.706
2026-01-24 02:36:27,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4759.095, 4870.8965, 4813.46, 4434.6816, 4818.42, 4824.3564, 4647.778, 4819.888, 4777.5723, 4815.8853]
2026-01-24 02:36:27,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 02:36:27,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1274 [INFO]: New best (4758.20) for latency DatasetOffice
2026-01-24 02:36:27,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 50 minutes, 44 seconds)
2026-01-24 02:50:03,333 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 02:50:03,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 02:54:51,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3619.23779 ± 1475.700
2026-01-24 02:54:51,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [2579.1384, 4630.6636, 2603.7693, 509.76172, 4654.3867, 4793.903, 4639.514, 4801.9893, 4912.169, 2067.0845]
2026-01-24 02:54:51,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [549.0, 1000.0, 608.0, 120.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 441.0]
2026-01-24 02:54:51,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 27 minutes, 52 seconds)
2026-01-24 03:08:11,588 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 03:08:11,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 03:13:59,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 4186.07031 ± 1045.027
2026-01-24 03:13:59,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4520.374, 4821.8467, 4615.9673, 4694.0864, 3668.8672, 1195.6534, 4321.5713, 4701.1885, 4566.0977, 4755.05]
2026-01-24 03:13:59,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 967.0, 437.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 03:13:59,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 91/100 (estimated time remaining: 3 hours, 10 minutes, 19 seconds)
2026-01-24 03:28:37,423 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 03:28:37,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 03:33:53,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3998.82031 ± 1455.121
2026-01-24 03:33:53,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4891.172, 4617.502, 4696.732, 418.66052, 4643.2314, 4751.046, 1932.3696, 4529.136, 4663.616, 4844.737]
2026-01-24 03:33:53,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 117.0, 1000.0, 1000.0, 422.0, 1000.0, 1000.0, 1000.0]
2026-01-24 03:33:53,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 53 minutes, 57 seconds)
2026-01-24 03:46:50,324 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 03:46:50,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 03:53:01,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 4434.81494 ± 853.239
2026-01-24 03:53:01,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4698.6274, 4866.217, 1933.2001, 4800.4756, 4402.7695, 4715.243, 4364.4854, 4752.715, 4925.8726, 4888.5415]
2026-01-24 03:53:01,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 972.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 03:53:01,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 34 minutes, 1 second)
2026-01-24 04:06:47,660 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 04:06:47,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 04:12:01,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3914.04688 ± 1166.205
2026-01-24 04:12:01,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [2078.4626, 2540.1345, 4717.133, 4522.092, 4508.1816, 4807.7744, 4701.1665, 4565.8486, 4849.145, 1850.5275]
2026-01-24 04:12:01,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [487.0, 567.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 415.0]
2026-01-24 04:12:01,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 94/100 (estimated time remaining: 2 hours, 13 minutes, 46 seconds)
2026-01-24 04:26:32,607 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 04:26:32,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 04:31:33,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3343.78979 ± 1758.401
2026-01-24 04:31:33,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [592.73047, 1613.662, 128.18793, 4439.992, 4930.208, 4695.7544, 4782.5493, 3387.1636, 4542.2603, 4325.389]
2026-01-24 04:31:33,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 349.0, 43.0, 1000.0, 1000.0, 1000.0, 1000.0, 685.0, 1000.0, 1000.0]
2026-01-24 04:31:33,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 56 minutes, 2 seconds)
2026-01-24 04:44:31,010 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 04:44:31,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 04:49:29,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3627.98120 ± 1407.005
2026-01-24 04:49:29,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4789.7812, 4589.201, 4500.3306, 4125.7915, 140.80376, 4608.0625, 3175.1313, 4792.069, 2336.3154, 3222.3225]
2026-01-24 04:49:29,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 101.0, 1000.0, 654.0, 1000.0, 529.0, 721.0]
2026-01-24 04:49:29,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 35 minutes, 29 seconds)
2026-01-24 05:03:16,835 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 05:03:16,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 05:08:17,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3814.43237 ± 1586.737
2026-01-24 05:08:17,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4656.016, 4397.0933, 892.15045, 2928.6492, 4897.6123, 820.255, 4703.295, 4886.5283, 5025.813, 4936.91]
2026-01-24 05:08:17,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 202.0, 613.0, 1000.0, 229.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 05:08:17,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 15 minutes, 31 seconds)
2026-01-24 05:22:24,528 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 05:22:24,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 05:28:08,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3999.77588 ± 1358.970
2026-01-24 05:28:08,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [466.51508, 4740.657, 3872.4778, 4709.176, 4422.435, 4842.334, 2556.5522, 4627.224, 4892.485, 4867.8965]
2026-01-24 05:28:08,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 790.0, 1000.0, 1000.0, 1000.0, 545.0, 1000.0, 1000.0, 1000.0]
2026-01-24 05:28:08,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 98/100 (estimated time remaining: 57 minutes, 4 seconds)
2026-01-24 05:41:36,538 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 05:41:36,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 05:46:20,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3573.08130 ± 1804.722
2026-01-24 05:46:20,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4877.2017, 3980.5706, 216.74126, 234.80655, 4667.076, 4894.3345, 2548.9307, 4807.379, 4740.556, 4763.2153]
2026-01-24 05:46:20,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 927.0, 103.0, 109.0, 1000.0, 1000.0, 548.0, 1000.0, 1000.0, 1000.0]
2026-01-24 05:46:20,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 99/100 (estimated time remaining: 37 minutes, 43 seconds)
2026-01-24 06:00:43,494 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 06:00:43,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 06:04:54,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 3078.59229 ± 1922.720
2026-01-24 06:04:54,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4872.711, 4827.5586, 2333.721, 4250.0825, 965.2079, 34.034187, 4908.5337, 3722.2795, 4720.489, 151.30678]
2026-01-24 06:04:54,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 516.0, 887.0, 238.0, 24.0, 1000.0, 1000.0, 1000.0, 48.0]
2026-01-24 06:04:54,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1247 [INFO]: Iteration 100/100 (estimated time remaining: 18 minutes, 40 seconds)
2026-01-24 06:18:47,898 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 06:18:47,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 06:24:51,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1269 [DEBUG]: Total Reward: 4513.28809 ± 478.069
2026-01-24 06:24:51,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1270 [DEBUG]: All rewards: [4606.892, 4716.4233, 4809.481, 4783.341, 4817.066, 4350.799, 3149.66, 4472.286, 4627.3643, 4799.571]
2026-01-24 06:24:51,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 714.0, 1000.0, 1000.0, 1000.0]
2026-01-24 06:24:51,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1299 [DEBUG]: Training session finished
