2026-01-22 23:01:38,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-halfcheetah/DatasetOffice-mbpac_memdelay
2026-01-22 23:01:38,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-halfcheetah/DatasetOffice-mbpac_memdelay
2026-01-22 23:01:38,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x154ac0029fd0>}
2026-01-22 23:01:38,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1159 [DEBUG]: using device: cuda
2026-01-22 23:01:38,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1181 [INFO]: Creating new trainer
2026-01-22 23:01:38,517 baseline-mbpac-noisy-halfcheetah:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2026-01-22 23:01:38,517 baseline-mbpac-noisy-halfcheetah:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:01:38,526 baseline-mbpac-noisy-halfcheetah:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2026-01-22 23:01:39,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1242 [DEBUG]: Starting training session...
2026-01-22 23:01:39,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 1/100
2026-01-22 23:15:17,260 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-22 23:15:17,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:22:50,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -353.90775 ± 26.930
2026-01-22 23:22:50,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-330.31354, -369.97797, -324.7987, -368.03818, -383.60663, -325.3272, -411.01114, -341.94617, -343.6538, -340.40402]
2026-01-22 23:22:50,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:22:50,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (-353.91) for latency DatasetOffice
2026-01-22 23:22:50,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 2/100 (estimated time remaining: 34 hours, 57 minutes, 31 seconds)
2026-01-22 23:37:04,613 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-22 23:37:04,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:44:24,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -283.81552 ± 51.791
2026-01-22 23:44:24,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-224.95975, -309.54675, -294.91974, -343.62338, -287.07343, -357.107, -340.0369, -243.81425, -205.66528, -231.4089]
2026-01-22 23:44:24,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:44:24,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (-283.82) for latency DatasetOffice
2026-01-22 23:44:24,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 3/100 (estimated time remaining: 34 hours, 54 minutes, 45 seconds)
2026-01-22 23:57:59,060 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-22 23:57:59,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:03:52,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 862.68420 ± 460.016
2026-01-23 00:03:52,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [985.25415, 1199.3566, 1110.8617, -45.65955, 1028.323, 1125.5039, -22.853256, 1195.568, 1193.6902, 856.7965]
2026-01-23 00:03:52,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:03:52,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (862.68) for latency DatasetOffice
2026-01-23 00:03:52,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 4/100 (estimated time remaining: 33 hours, 31 minutes, 35 seconds)
2026-01-23 00:17:11,244 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 00:17:11,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:23:04,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1940.11426 ± 692.351
2026-01-23 00:23:04,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [101.62492, 1780.896, 1919.626, 2117.2537, 2492.1262, 2779.9104, 2447.1328, 1795.3496, 2163.2307, 1803.9915]
2026-01-23 00:23:04,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:23:04,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (1940.11) for latency DatasetOffice
2026-01-23 00:23:04,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 5/100 (estimated time remaining: 32 hours, 34 minutes, 15 seconds)
2026-01-23 00:36:14,816 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 00:36:14,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:42:09,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3247.24292 ± 1011.428
2026-01-23 00:42:09,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3689.2698, 3472.014, 3616.161, 3566.9668, 3649.3901, 3667.5437, 3800.7678, 252.32135, 3598.7542, 3159.2424]
2026-01-23 00:42:09,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:42:09,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (3247.24) for latency DatasetOffice
2026-01-23 00:42:09,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 6/100 (estimated time remaining: 31 hours, 49 minutes, 37 seconds)
2026-01-23 00:55:29,190 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 00:55:29,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:01:24,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3782.85474 ± 165.448
2026-01-23 01:01:24,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3521.613, 3705.7434, 3638.615, 3899.6765, 3897.6855, 4040.0852, 3766.765, 3575.6772, 3988.8337, 3793.852]
2026-01-23 01:01:24,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:01:24,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (3782.85) for latency DatasetOffice
2026-01-23 01:01:24,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 7/100 (estimated time remaining: 30 hours, 53 minutes, 4 seconds)
2026-01-23 01:14:54,453 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 01:14:54,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:22:13,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3943.80396 ± 58.613
2026-01-23 01:22:13,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3936.0872, 3982.562, 4004.4612, 4022.7954, 3853.3162, 3911.8916, 4014.5378, 3854.6023, 3917.4255, 3940.3591]
2026-01-23 01:22:13,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:22:13,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (3943.80) for latency DatasetOffice
2026-01-23 01:22:13,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 8/100 (estimated time remaining: 30 hours, 19 minutes, 26 seconds)
2026-01-23 01:35:27,438 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 01:35:27,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:41:27,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3990.49805 ± 234.703
2026-01-23 01:41:27,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4044.466, 3943.5, 4071.0173, 3306.3777, 4148.1787, 4126.1816, 4049.2583, 4125.5664, 4056.9194, 4033.515]
2026-01-23 01:41:27,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:41:27,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (3990.50) for latency DatasetOffice
2026-01-23 01:41:27,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 9/100 (estimated time remaining: 29 hours, 55 minutes, 41 seconds)
2026-01-23 01:54:47,489 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 01:54:47,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:42,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4284.64502 ± 103.424
2026-01-23 02:00:42,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4270.1714, 4435.5933, 4355.1216, 4252.3877, 4298.7783, 4229.689, 4272.402, 4351.008, 4352.7007, 4028.6013]
2026-01-23 02:00:42,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:00:42,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (4284.65) for latency DatasetOffice
2026-01-23 02:00:42,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 10/100 (estimated time remaining: 29 hours, 36 minutes, 55 seconds)
2026-01-23 02:13:54,104 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 02:13:54,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:19:48,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4235.21680 ± 188.309
2026-01-23 02:19:48,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4304.126, 4146.0425, 4252.7285, 4419.033, 4331.9473, 4532.308, 4350.0947, 4073.2703, 4096.7456, 3845.8713]
2026-01-23 02:19:48,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:19:48,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 11/100 (estimated time remaining: 29 hours, 17 minutes, 38 seconds)
2026-01-23 02:32:52,633 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 02:32:52,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:40:02,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4204.06152 ± 85.275
2026-01-23 02:40:02,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4246.4473, 4282.899, 4232.5107, 4092.5515, 4339.315, 4039.5728, 4212.0327, 4147.468, 4255.9907, 4191.8237]
2026-01-23 02:40:02,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:40:02,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 12/100 (estimated time remaining: 29 hours, 15 minutes, 41 seconds)
2026-01-23 02:52:56,248 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 02:52:56,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:58:47,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4402.18457 ± 105.096
2026-01-23 02:58:47,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4424.1836, 4605.2095, 4408.139, 4416.6235, 4344.0757, 4316.9927, 4490.7275, 4401.2666, 4180.7656, 4433.8623]
2026-01-23 02:58:47,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:58:47,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (4402.18) for latency DatasetOffice
2026-01-23 02:58:47,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 13/100 (estimated time remaining: 28 hours, 19 minutes, 36 seconds)
2026-01-23 03:12:02,464 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 03:12:02,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:19:13,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3999.96753 ± 1159.245
2026-01-23 03:19:13,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4371.662, 4493.5356, 4294.4395, 528.7959, 4523.508, 4420.4214, 4368.7725, 4321.4287, 4365.665, 4311.4507]
2026-01-23 03:19:13,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:19:13,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 14/100 (estimated time remaining: 28 hours, 21 minutes, 3 seconds)
2026-01-23 03:32:20,841 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 03:32:20,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:38:13,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4521.55078 ± 88.193
2026-01-23 03:38:13,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4592.789, 4592.926, 4538.737, 4575.0635, 4518.7666, 4679.4307, 4366.65, 4446.307, 4440.1167, 4464.717]
2026-01-23 03:38:13,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:38:13,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (4521.55) for latency DatasetOffice
2026-01-23 03:38:13,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 15/100 (estimated time remaining: 27 hours, 57 minutes, 5 seconds)
2026-01-23 03:50:58,038 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 03:50:58,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:56:46,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4423.18652 ± 245.244
2026-01-23 03:56:46,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4557.6543, 4346.531, 4501.489, 4380.919, 4357.67, 4775.154, 4594.558, 4409.6626, 4519.5156, 3788.7053]
2026-01-23 03:56:46,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:56:46,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 16/100 (estimated time remaining: 27 hours, 28 minutes, 34 seconds)
2026-01-23 04:10:11,290 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 04:10:11,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:16:02,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4570.37451 ± 132.538
2026-01-23 04:16:02,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4520.6313, 4667.9473, 4447.536, 4484.1562, 4594.0103, 4648.4766, 4313.659, 4657.1274, 4816.883, 4553.3193]
2026-01-23 04:16:02,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:16:02,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (4570.37) for latency DatasetOffice
2026-01-23 04:16:02,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 17/100 (estimated time remaining: 26 hours, 52 minutes, 54 seconds)
2026-01-23 04:29:13,180 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 04:29:13,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:34:59,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4166.49170 ± 1051.821
2026-01-23 04:34:59,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2511.2827, 4803.1377, 4767.2427, 1693.0004, 4681.104, 4580.9624, 4801.809, 4578.959, 4550.5327, 4696.886]
2026-01-23 04:34:59,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:34:59,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 18/100 (estimated time remaining: 26 hours, 36 minutes, 48 seconds)
2026-01-23 04:48:03,128 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 04:48:03,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:53:54,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4688.94971 ± 517.091
2026-01-23 04:53:54,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4737.18, 4850.6436, 4874.4536, 3167.2239, 4902.124, 4911.09, 4785.85, 5091.22, 4858.7036, 4711.0054]
2026-01-23 04:53:54,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:53:54,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (4688.95) for latency DatasetOffice
2026-01-23 04:53:54,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 19/100 (estimated time remaining: 25 hours, 52 minutes, 51 seconds)
2026-01-23 05:06:54,287 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 05:06:54,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:12:45,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4761.67188 ± 122.402
2026-01-23 05:12:45,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4925.7197, 4765.475, 4681.4087, 4751.4146, 4979.4375, 4566.9062, 4647.7847, 4770.4087, 4858.7803, 4669.3833]
2026-01-23 05:12:45,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:12:45,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (4761.67) for latency DatasetOffice
2026-01-23 05:12:45,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 20/100 (estimated time remaining: 25 hours, 31 minutes, 36 seconds)
2026-01-23 05:25:35,520 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 05:25:35,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:31:23,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4256.52686 ± 1465.153
2026-01-23 05:31:23,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4716.396, 4806.717, 4595.245, 5204.3613, 5008.885, 4790.7163, 4752.008, 4888.421, -15.595297, 3818.115]
2026-01-23 05:31:23,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:31:23,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 21/100 (estimated time remaining: 25 hours, 13 minutes, 42 seconds)
2026-01-23 05:44:14,960 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 05:44:14,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:50:04,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4473.63965 ± 953.442
2026-01-23 05:50:04,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3911.9976, 2808.3025, 5068.881, 5096.994, 5093.2607, 2599.9438, 4809.9487, 5022.9937, 5189.5166, 5134.557]
2026-01-23 05:50:04,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:50:04,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 22/100 (estimated time remaining: 24 hours, 45 minutes, 44 seconds)
2026-01-23 06:02:56,986 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 06:02:56,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:10:07,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5008.94531 ± 189.678
2026-01-23 06:10:07,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4850.0767, 5210.145, 4908.42, 5160.1626, 5169.2485, 4559.907, 5061.6113, 5102.796, 4931.167, 5135.917]
2026-01-23 06:10:07,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:10:07,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (5008.95) for latency DatasetOffice
2026-01-23 06:10:07,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 23/100 (estimated time remaining: 24 hours, 44 minutes, 7 seconds)
2026-01-23 06:23:04,310 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 06:23:04,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:28:49,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4513.23145 ± 1652.915
2026-01-23 06:28:49,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5329.1157, 5361.523, 5282.7793, 109.908844, 5371.483, 2761.8132, 4976.9795, 5343.867, 5209.184, 5385.658]
2026-01-23 06:28:49,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:28:49,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 24/100 (estimated time remaining: 24 hours, 21 minutes, 36 seconds)
2026-01-23 06:41:38,126 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 06:41:38,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:47:28,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5274.47852 ± 132.288
2026-01-23 06:47:28,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5408.7593, 5246.2275, 5036.886, 5330.4155, 5214.1963, 5199.673, 5143.3975, 5436.9795, 5246.4385, 5481.813]
2026-01-23 06:47:28,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:47:28,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (5274.48) for latency DatasetOffice
2026-01-23 06:47:28,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 25/100 (estimated time remaining: 23 hours, 59 minutes, 32 seconds)
2026-01-23 07:00:46,538 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 07:00:46,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:06:37,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4115.67236 ± 1998.314
2026-01-23 07:06:37,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5402.889, 5085.306, 5083.7314, 5310.713, 5240.5117, 5291.9067, 5317.48, 384.405, -1.2656273, 4041.0422]
2026-01-23 07:06:37,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:06:37,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 26/100 (estimated time remaining: 23 hours, 48 minutes, 26 seconds)
2026-01-23 07:20:03,626 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 07:20:03,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:25:55,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5335.43066 ± 85.437
2026-01-23 07:25:55,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5292.1265, 5248.3135, 5442.3286, 5373.6406, 5341.715, 5495.6562, 5304.662, 5389.8345, 5228.1455, 5237.8887]
2026-01-23 07:25:55,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:25:55,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (5335.43) for latency DatasetOffice
2026-01-23 07:25:55,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 27/100 (estimated time remaining: 23 hours, 38 minutes, 31 seconds)
2026-01-23 07:39:18,780 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 07:39:18,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:45:10,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5288.31934 ± 170.436
2026-01-23 07:45:10,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5337.5493, 5469.9507, 5199.554, 5206.7563, 5634.1606, 5088.1484, 5320.244, 5254.1997, 5016.209, 5356.419]
2026-01-23 07:45:10,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:45:10,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 28/100 (estimated time remaining: 23 hours, 7 minutes, 48 seconds)
2026-01-23 07:58:34,860 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 07:58:34,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:04:26,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5228.38135 ± 405.507
2026-01-23 08:04:26,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5538.7373, 5111.1665, 5341.824, 4071.8108, 5234.925, 5504.1025, 5416.971, 5467.1206, 5247.127, 5350.0327]
2026-01-23 08:04:26,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:04:26,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 29/100 (estimated time remaining: 22 hours, 57 minutes, 3 seconds)
2026-01-23 08:17:38,940 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 08:17:38,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:23:29,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5509.10547 ± 86.432
2026-01-23 08:23:29,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5528.2734, 5699.3804, 5412.0327, 5464.8003, 5535.071, 5476.17, 5536.4927, 5596.527, 5444.8574, 5397.4453]
2026-01-23 08:23:29,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:23:29,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (5509.11) for latency DatasetOffice
2026-01-23 08:23:29,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 30/100 (estimated time remaining: 22 hours, 43 minutes, 35 seconds)
2026-01-23 08:36:32,803 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 08:36:32,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:42:26,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5066.62012 ± 1052.729
2026-01-23 08:42:26,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5775.993, 5294.0635, 2348.0352, 5616.0776, 5412.2896, 5875.7617, 5514.747, 5405.626, 5575.8384, 3847.7688]
2026-01-23 08:42:26,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:42:26,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 31/100 (estimated time remaining: 22 hours, 21 minutes, 27 seconds)
2026-01-23 08:55:44,986 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 08:55:44,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:01:43,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5436.89355 ± 475.459
2026-01-23 09:01:43,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4082.2102, 5579.0356, 5555.385, 5315.5146, 5456.604, 5541.531, 5585.374, 5843.159, 5574.2915, 5835.831]
2026-01-23 09:01:43,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:01:43,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 32/100 (estimated time remaining: 22 hours, 1 minute, 57 seconds)
2026-01-23 09:14:55,344 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 09:14:55,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:20:50,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5708.48535 ± 203.452
2026-01-23 09:20:50,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5740.805, 5833.2866, 5949.748, 5788.4756, 5728.0635, 5712.1733, 5770.4507, 5726.8477, 5135.736, 5699.266]
2026-01-23 09:20:50,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:20:50,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (5708.49) for latency DatasetOffice
2026-01-23 09:20:50,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 33/100 (estimated time remaining: 21 hours, 41 minutes, 4 seconds)
2026-01-23 09:33:46,521 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 09:33:46,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:39:38,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5540.75586 ± 450.037
2026-01-23 09:39:38,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5778.109, 5455.0317, 5601.9414, 4261.072, 5726.902, 5509.805, 5856.423, 5565.0083, 5724.1245, 5929.142]
2026-01-23 09:39:38,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:39:38,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 34/100 (estimated time remaining: 21 hours, 15 minutes, 42 seconds)
2026-01-23 09:53:03,132 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 09:53:03,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:59:01,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5804.43652 ± 89.421
2026-01-23 09:59:01,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5727.3623, 5677.501, 5697.7686, 5782.7827, 5884.1055, 5813.332, 5744.281, 5920.173, 5948.0986, 5848.9614]
2026-01-23 09:59:01,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:59:01,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (5804.44) for latency DatasetOffice
2026-01-23 09:59:01,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 35/100 (estimated time remaining: 21 hours, 55 seconds)
2026-01-23 10:12:17,814 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 10:12:17,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:18:12,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5626.87012 ± 713.997
2026-01-23 10:18:12,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6141.881, 5459.6665, 5943.185, 5838.9893, 5966.658, 5944.5674, 5744.565, 6038.768, 5627.4673, 3562.9485]
2026-01-23 10:18:12,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 10:18:12,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 36/100 (estimated time remaining: 20 hours, 44 minutes, 55 seconds)
2026-01-23 10:31:27,373 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 10:31:27,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:38:46,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5766.85645 ± 130.418
2026-01-23 10:38:46,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5856.9478, 5900.1177, 5777.52, 5586.7783, 5830.998, 5654.666, 5590.005, 5680.4663, 6003.039, 5788.0283]
2026-01-23 10:38:46,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 10:38:46,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 37/100 (estimated time remaining: 20 hours, 42 minutes, 16 seconds)
2026-01-23 10:52:24,401 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 10:52:24,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:59:40,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5758.58740 ± 560.707
2026-01-23 10:59:40,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5771.3906, 5959.3105, 5750.383, 5947.466, 6047.2046, 5693.742, 6214.902, 6042.649, 4139.0, 6019.829]
2026-01-23 10:59:40,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 10:59:40,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 38/100 (estimated time remaining: 20 hours, 45 minutes, 10 seconds)
2026-01-23 11:12:50,775 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 11:12:50,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:20:06,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5374.13867 ± 1690.049
2026-01-23 11:20:06,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6030.3594, 5670.654, 5727.8203, 323.5045, 5922.777, 5818.147, 6117.013, 6117.072, 6066.553, 5947.4897]
2026-01-23 11:20:06,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 11:20:06,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 39/100 (estimated time remaining: 20 hours, 45 minutes, 40 seconds)
2026-01-23 11:33:26,187 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 11:33:26,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:39:21,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6034.87500 ± 155.094
2026-01-23 11:39:21,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6032.5503, 6074.4966, 5981.8857, 5935.4863, 5653.883, 6088.392, 6269.77, 6151.102, 6028.643, 6132.5376]
2026-01-23 11:39:21,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 11:39:21,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (6034.88) for latency DatasetOffice
2026-01-23 11:39:21,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 40/100 (estimated time remaining: 20 hours, 24 minutes, 2 seconds)
2026-01-23 11:52:19,181 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 11:52:19,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:59:37,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5219.05859 ± 1741.858
2026-01-23 11:59:37,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6119.3125, 5753.1313, 5728.2017, 6257.349, 5956.4893, 6244.0205, 5891.176, 420.0302, 6059.4526, 3761.4216]
2026-01-23 11:59:37,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 11:59:37,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 41/100 (estimated time remaining: 20 hours, 17 minutes, 4 seconds)
2026-01-23 12:12:51,489 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 12:12:51,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 12:18:46,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6222.57666 ± 219.357
2026-01-23 12:18:46,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6002.521, 6296.0083, 6460.0884, 6179.676, 6275.2065, 6089.45, 5741.12, 6528.76, 6299.294, 6353.645]
2026-01-23 12:18:46,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 12:18:46,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (6222.58) for latency DatasetOffice
2026-01-23 12:18:46,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 42/100 (estimated time remaining: 19 hours, 40 minutes, 1 second)
2026-01-23 12:32:05,032 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 12:32:05,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 12:37:59,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5824.64941 ± 209.979
2026-01-23 12:37:59,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5933.8853, 6045.2773, 5914.6035, 5646.537, 5986.3286, 5578.3516, 6102.4365, 5681.0747, 5440.802, 5917.2007]
2026-01-23 12:37:59,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 12:37:59,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 43/100 (estimated time remaining: 19 hours, 35 seconds)
2026-01-23 12:51:22,867 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 12:51:22,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 12:57:17,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5590.84717 ± 1739.818
2026-01-23 12:57:17,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6183.032, 6101.982, 6343.03, 386.89822, 6185.3774, 6419.5938, 5924.0664, 6154.8955, 6185.7314, 6023.8687]
2026-01-23 12:57:17,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 12:57:17,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 44/100 (estimated time remaining: 18 hours, 27 minutes, 55 seconds)
2026-01-23 13:10:21,651 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 13:10:21,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 13:17:38,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6538.82178 ± 211.663
2026-01-23 13:17:38,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6686.485, 6822.663, 6327.0723, 6486.5283, 6652.713, 6494.333, 6809.1787, 6178.9897, 6278.467, 6651.7886]
2026-01-23 13:17:38,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 13:17:38,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (6538.82) for latency DatasetOffice
2026-01-23 13:17:38,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 45/100 (estimated time remaining: 18 hours, 20 minutes, 53 seconds)
2026-01-23 13:31:06,164 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 13:31:06,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 13:38:23,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5849.24219 ± 779.185
2026-01-23 13:38:23,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6615.3486, 4731.442, 5950.914, 6280.4775, 6406.2383, 6399.9424, 6278.119, 6189.6123, 5529.8438, 4110.484]
2026-01-23 13:38:23,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 13:38:23,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 46/100 (estimated time remaining: 18 hours, 6 minutes, 23 seconds)
2026-01-23 13:51:35,839 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 13:51:35,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 13:57:29,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6541.55566 ± 199.064
2026-01-23 13:57:29,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6632.094, 6702.467, 6852.459, 6479.818, 6308.968, 6565.4565, 6236.8574, 6811.736, 6476.3564, 6349.3486]
2026-01-23 13:57:29,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 13:57:29,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (6541.56) for latency DatasetOffice
2026-01-23 13:57:29,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 47/100 (estimated time remaining: 17 hours, 46 minutes, 10 seconds)
2026-01-23 14:10:31,523 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 14:10:31,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 14:16:22,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6783.02197 ± 223.833
2026-01-23 14:16:22,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7181.331, 6814.3296, 6582.5215, 6580.602, 7009.6177, 6805.3906, 6965.8623, 6375.085, 6697.576, 6817.9053]
2026-01-23 14:16:22,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 14:16:22,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (6783.02) for latency DatasetOffice
2026-01-23 14:16:22,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 48/100 (estimated time remaining: 17 hours, 22 minutes, 47 seconds)
2026-01-23 14:29:48,610 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 14:29:48,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 14:35:40,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6555.14307 ± 680.670
2026-01-23 14:35:40,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6599.154, 6875.6616, 6945.6255, 4623.813, 6994.834, 6891.5376, 6212.7173, 6646.472, 6851.424, 6910.1943]
2026-01-23 14:35:40,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 14:35:40,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 49/100 (estimated time remaining: 17 hours, 3 minutes, 10 seconds)
2026-01-23 14:48:37,309 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 14:48:37,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 14:54:34,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7067.95312 ± 232.682
2026-01-23 14:54:34,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7027.0903, 6834.732, 6967.9487, 7356.1914, 7154.399, 7112.345, 6885.548, 6750.891, 7565.144, 7025.2363]
2026-01-23 14:54:34,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 14:54:34,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (7067.95) for latency DatasetOffice
2026-01-23 14:54:34,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 50/100 (estimated time remaining: 16 hours, 28 minutes, 36 seconds)
2026-01-23 15:08:06,445 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 15:08:06,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 15:13:57,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6496.63037 ± 842.443
2026-01-23 15:13:57,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6999.829, 6421.638, 6482.7983, 7223.215, 7263.558, 6787.687, 6532.209, 6457.1416, 6669.022, 4129.2036]
2026-01-23 15:13:57,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 15:13:57,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 51/100 (estimated time remaining: 15 hours, 55 minutes, 45 seconds)
2026-01-23 15:27:05,778 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 15:27:05,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 15:34:21,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6542.38477 ± 948.826
2026-01-23 15:34:21,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6956.225, 7078.45, 7040.989, 6567.5693, 6514.2773, 3830.3323, 6374.247, 6763.104, 6899.2534, 7399.3984]
2026-01-23 15:34:21,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 15:34:21,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 52/100 (estimated time remaining: 15 hours, 49 minutes, 11 seconds)
2026-01-23 15:47:39,705 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 15:47:39,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 15:53:32,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7142.55176 ± 151.580
2026-01-23 15:53:32,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7365.57, 7360.547, 7192.1436, 7111.6543, 7232.5093, 7247.6396, 6977.1226, 7022.085, 6953.2114, 6963.0312]
2026-01-23 15:53:32,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 15:53:32,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (7142.55) for latency DatasetOffice
2026-01-23 15:53:32,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 53/100 (estimated time remaining: 15 hours, 32 minutes, 54 seconds)
2026-01-23 16:06:44,859 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 16:06:44,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 16:12:37,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6832.99463 ± 637.326
2026-01-23 16:12:37,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7122.098, 7257.315, 6792.0933, 5033.943, 7201.6416, 7232.6914, 6875.269, 7311.022, 6854.0674, 6649.8027]
2026-01-23 16:12:37,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 16:12:37,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 54/100 (estimated time remaining: 15 hours, 11 minutes, 16 seconds)
2026-01-23 16:25:37,294 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 16:25:37,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 16:32:52,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7296.49756 ± 87.015
2026-01-23 16:32:52,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7293.372, 7379.3506, 7160.097, 7418.226, 7384.738, 7238.9883, 7219.198, 7378.553, 7303.51, 7188.9453]
2026-01-23 16:32:52,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 16:32:52,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (7296.50) for latency DatasetOffice
2026-01-23 16:32:52,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 55/100 (estimated time remaining: 15 hours, 4 minutes, 22 seconds)
2026-01-23 16:46:08,670 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 16:46:08,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 16:53:25,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6769.97119 ± 978.442
2026-01-23 16:53:25,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7529.7354, 6349.6094, 6717.801, 7758.119, 6956.74, 7513.213, 6819.5356, 7061.3276, 6897.511, 4096.1187]
2026-01-23 16:53:25,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 16:53:25,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 56/100 (estimated time remaining: 14 hours, 55 minutes, 5 seconds)
2026-01-23 17:06:46,827 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 17:06:46,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 17:12:36,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7259.27441 ± 299.386
2026-01-23 17:12:36,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7434.963, 7631.173, 7542.223, 7461.753, 7007.8735, 7079.25, 6555.5347, 7276.721, 7226.3423, 7376.9053]
2026-01-23 17:12:36,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 17:12:36,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 57/100 (estimated time remaining: 14 hours, 24 minutes, 41 seconds)
2026-01-23 17:26:07,347 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 17:26:07,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 17:32:03,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7191.18848 ± 208.828
2026-01-23 17:32:03,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7452.3076, 7277.594, 7086.4136, 7537.7686, 7357.0454, 7034.9683, 7176.744, 6982.4546, 6825.7437, 7180.8467]
2026-01-23 17:32:03,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 17:32:03,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 58/100 (estimated time remaining: 14 hours, 7 minutes, 10 seconds)
2026-01-23 17:45:26,282 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 17:45:26,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 17:52:41,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7001.34863 ± 687.683
2026-01-23 17:52:41,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7271.364, 7476.499, 7451.8247, 4997.1836, 7052.8516, 7100.068, 7122.505, 7421.0474, 7041.6396, 7078.502]
2026-01-23 17:52:41,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 17:52:41,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 59/100 (estimated time remaining: 14 hours, 35 seconds)
2026-01-23 18:05:48,760 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 18:05:48,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 18:11:39,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7315.39551 ± 146.788
2026-01-23 18:11:39,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7139.405, 7070.928, 7451.8125, 7538.2334, 7383.799, 7236.281, 7376.261, 7371.4717, 7155.6504, 7430.1104]
2026-01-23 18:11:39,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 18:11:39,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (7315.40) for latency DatasetOffice
2026-01-23 18:11:39,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 60/100 (estimated time remaining: 13 hours, 30 minutes, 4 seconds)
2026-01-23 18:24:35,957 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 18:24:35,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 18:30:27,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6457.22607 ± 2322.723
2026-01-23 18:30:27,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7448.5166, 7356.701, 7302.3354, 7284.308, 7854.7915, 7619.8604, 7622.9067, 7469.521, 2.5810328, 4610.737]
2026-01-23 18:30:27,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 18:30:27,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 61/100 (estimated time remaining: 12 hours, 56 minutes, 18 seconds)
2026-01-23 18:43:38,118 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 18:43:38,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 18:50:52,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7244.78125 ± 246.778
2026-01-23 18:50:52,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7461.259, 7168.1724, 7474.284, 6898.6646, 7164.418, 7140.7197, 6787.099, 7389.3213, 7590.6924, 7373.1875]
2026-01-23 18:50:52,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 18:50:52,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 62/100 (estimated time remaining: 12 hours, 46 minutes, 27 seconds)
2026-01-23 19:03:55,292 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 19:03:55,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 19:09:44,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7004.41260 ± 1553.441
2026-01-23 19:09:44,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7810.895, 2464.3438, 7019.826, 7477.968, 7657.2935, 7495.316, 7750.5093, 7709.986, 6732.247, 7925.744]
2026-01-23 19:09:44,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 19:09:44,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 63/100 (estimated time remaining: 12 hours, 22 minutes, 25 seconds)
2026-01-23 19:22:52,137 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 19:22:52,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 19:28:42,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6729.15918 ± 2201.664
2026-01-23 19:28:42,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7598.6157, 7246.579, 7564.6787, 143.83553, 7444.5015, 7472.678, 7445.089, 7655.6377, 7637.16, 7082.813]
2026-01-23 19:28:42,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 19:28:42,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 64/100 (estimated time remaining: 11 hours, 50 minutes, 29 seconds)
2026-01-23 19:41:44,427 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 19:41:44,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 19:47:39,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7933.68262 ± 305.936
2026-01-23 19:47:39,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8097.472, 7888.3267, 7910.2427, 7761.227, 7255.1367, 7632.136, 8354.609, 8070.7817, 8134.973, 8231.919]
2026-01-23 19:47:39,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 19:47:39,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (7933.68) for latency DatasetOffice
2026-01-23 19:47:39,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 65/100 (estimated time remaining: 11 hours, 31 minutes, 14 seconds)
2026-01-23 20:00:49,109 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 20:00:49,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 20:06:41,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7393.33887 ± 1035.853
2026-01-23 20:06:41,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8065.518, 7088.7344, 7627.704, 8047.4224, 7854.739, 7985.8545, 7743.339, 7579.5596, 7540.4644, 4400.0537]
2026-01-23 20:06:41,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 20:06:41,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 66/100 (estimated time remaining: 11 hours, 13 minutes, 37 seconds)
2026-01-23 20:20:06,632 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 20:20:06,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 20:27:19,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7274.98682 ± 193.325
2026-01-23 20:27:19,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7129.5547, 7519.861, 7239.421, 7355.6714, 7017.0264, 6925.6636, 7213.031, 7429.5107, 7454.4395, 7465.6865]
2026-01-23 20:27:19,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 20:27:20,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 67/100 (estimated time remaining: 10 hours, 55 minutes, 52 seconds)
2026-01-23 20:40:53,106 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 20:40:53,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 20:46:42,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7577.81348 ± 218.342
2026-01-23 20:46:42,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7758.971, 7488.2285, 7795.695, 7574.723, 7691.3906, 7469.0796, 7647.8906, 7606.0083, 7002.8433, 7743.3013]
2026-01-23 20:46:42,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 20:46:42,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 68/100 (estimated time remaining: 10 hours, 39 minutes, 56 seconds)
2026-01-23 20:59:50,667 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 20:59:50,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 21:07:04,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7536.74854 ± 724.823
2026-01-23 21:07:04,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7845.3784, 7652.2554, 7935.112, 5385.5083, 7674.8086, 7776.0073, 7942.8735, 7666.564, 7828.633, 7660.3467]
2026-01-23 21:07:04,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 21:07:04,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 69/100 (estimated time remaining: 10 hours, 29 minutes, 35 seconds)
2026-01-23 21:20:24,265 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 21:20:24,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 21:26:19,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7606.44629 ± 223.317
2026-01-23 21:26:19,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7730.6543, 7667.1904, 7922.562, 7633.425, 7342.0923, 7603.9023, 7108.823, 7626.9336, 7577.2065, 7851.6753]
2026-01-23 21:26:19,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 21:26:19,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 70/100 (estimated time remaining: 10 hours, 11 minutes, 40 seconds)
2026-01-23 21:39:08,976 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 21:39:08,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 21:44:58,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6663.37988 ± 2443.966
2026-01-23 21:44:58,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7880.8066, 7258.7217, 7512.2, 7930.57, 7755.2056, 8010.4497, 7969.8174, 7861.794, -15.241311, 4469.473]
2026-01-23 21:44:58,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 21:44:58,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 71/100 (estimated time remaining: 9 hours, 49 minutes, 42 seconds)
2026-01-23 21:57:44,907 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 21:57:44,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 22:03:34,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7091.20459 ± 1788.168
2026-01-23 22:03:34,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7754.795, 7779.185, 7855.871, 7239.969, 7253.299, 7873.2363, 1769.5277, 7731.857, 7866.9233, 7787.38]
2026-01-23 22:03:34,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 22:03:34,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 72/100 (estimated time remaining: 9 hours, 18 minutes, 13 seconds)
2026-01-23 22:16:26,664 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 22:16:26,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 22:22:17,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7747.41943 ± 168.299
2026-01-23 22:22:17,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7814.4536, 7682.3853, 7693.942, 7858.4233, 7821.0864, 7562.8047, 7502.203, 7700.952, 7696.8477, 8141.1]
2026-01-23 22:22:17,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 22:22:17,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 73/100 (estimated time remaining: 8 hours, 55 minutes, 14 seconds)
2026-01-23 22:35:18,995 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 22:35:19,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 22:41:09,673 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7546.40771 ± 838.253
2026-01-23 22:41:09,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8119.2773, 7585.6655, 7691.3213, 5089.6978, 8019.73, 7703.64, 7539.8696, 7928.105, 7944.59, 7842.1914]
2026-01-23 22:41:09,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 22:41:09,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 74/100 (estimated time remaining: 8 hours, 28 minutes, 3 seconds)
2026-01-23 22:54:08,944 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 22:54:08,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 23:00:05,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7848.72510 ± 210.234
2026-01-23 23:00:05,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7928.702, 7583.461, 7418.2944, 7957.4336, 7890.1445, 7926.933, 7874.583, 7705.9224, 8178.5703, 8023.2056]
2026-01-23 23:00:05,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 23:00:05,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 75/100 (estimated time remaining: 8 hours, 7 minutes, 37 seconds)
2026-01-23 23:13:23,540 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 23:13:23,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 23:19:16,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6959.72803 ± 943.446
2026-01-23 23:19:16,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7285.0767, 5655.3315, 7075.2715, 7592.0117, 7557.7153, 7859.6523, 7492.9736, 7193.7944, 7188.78, 4696.675]
2026-01-23 23:19:16,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 23:19:16,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 76/100 (estimated time remaining: 7 hours, 51 minutes, 32 seconds)
2026-01-23 23:32:20,418 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 23:32:20,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 23:38:11,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8004.09619 ± 170.290
2026-01-23 23:38:11,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7933.0254, 7944.704, 8002.746, 7918.9736, 7907.1973, 7700.0645, 8185.692, 7925.547, 8243.233, 8279.783]
2026-01-23 23:38:11,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 23:38:11,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (8004.10) for latency DatasetOffice
2026-01-23 23:38:11,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 77/100 (estimated time remaining: 7 hours, 34 minutes, 6 seconds)
2026-01-23 23:51:37,608 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 23:51:37,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 23:58:51,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7964.18066 ± 339.859
2026-01-23 23:58:51,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8157.044, 7896.0615, 7829.926, 8030.1885, 8282.909, 7945.963, 7959.3867, 8100.5186, 7067.62, 8372.189]
2026-01-23 23:58:51,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 23:58:51,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 78/100 (estimated time remaining: 7 hours, 24 minutes, 15 seconds)
2026-01-24 00:12:07,127 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 00:12:07,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 00:19:21,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6580.45068 ± 3300.414
2026-01-24 00:19:21,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8140.864, 8498.149, 8199.615, -57.411808, 8193.666, 8231.864, 25.052172, 8171.0044, 8355.448, 8046.253]
2026-01-24 00:19:21,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 00:19:21,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 79/100 (estimated time remaining: 7 hours, 12 minutes, 3 seconds)
2026-01-24 00:32:54,460 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 00:32:54,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 00:38:47,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7988.81787 ± 177.215
2026-01-24 00:38:47,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7650.8403, 8119.0586, 8131.205, 8045.9736, 8015.704, 8169.687, 7993.085, 8180.8438, 7863.8325, 7717.9473]
2026-01-24 00:38:47,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 00:38:47,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 80/100 (estimated time remaining: 6 hours, 54 minutes, 31 seconds)
2026-01-24 00:52:05,851 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 00:52:05,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 00:59:20,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7354.37402 ± 1137.509
2026-01-24 00:59:20,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7994.085, 5689.405, 7753.0205, 7867.233, 8256.725, 8198.6455, 7991.965, 7300.1377, 7818.472, 4674.058]
2026-01-24 00:59:20,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 00:59:20,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 81/100 (estimated time remaining: 6 hours, 40 minutes, 13 seconds)
2026-01-24 01:12:48,743 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 01:12:48,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 01:18:43,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7854.03027 ± 206.222
2026-01-24 01:18:43,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7775.8594, 7903.7153, 7811.987, 7963.267, 7815.9375, 7704.6997, 7404.029, 8221.685, 7883.31, 8055.8125]
2026-01-24 01:18:43,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 01:18:43,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 82/100 (estimated time remaining: 6 hours, 22 minutes, 2 seconds)
2026-01-24 01:31:56,740 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 01:31:56,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 01:39:11,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8075.54688 ± 228.664
2026-01-24 01:39:11,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7728.848, 8056.0894, 8318.65, 8223.511, 8387.077, 8257.922, 7987.8867, 8174.2407, 7681.794, 7939.4536]
2026-01-24 01:39:11,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 01:39:11,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (8075.55) for latency DatasetOffice
2026-01-24 01:39:11,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 83/100 (estimated time remaining: 6 hours, 1 minute, 9 seconds)
2026-01-24 01:52:32,730 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 01:52:32,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 01:58:29,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7963.63916 ± 779.275
2026-01-24 01:58:29,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8153.3994, 7971.5747, 8176.486, 5692.82, 8060.0513, 8517.912, 7958.7827, 8235.694, 8441.92, 8427.753]
2026-01-24 01:58:29,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 01:58:29,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 84/100 (estimated time remaining: 5 hours, 37 minutes, 3 seconds)
2026-01-24 02:11:34,713 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 02:11:34,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 02:17:27,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8140.12988 ± 133.654
2026-01-24 02:17:27,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8021.195, 8161.549, 8099.8525, 8064.555, 8280.232, 8361.252, 8318.883, 8080.073, 8099.278, 7914.4204]
2026-01-24 02:17:27,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 02:17:27,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (8140.13) for latency DatasetOffice
2026-01-24 02:17:27,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 85/100 (estimated time remaining: 5 hours, 15 minutes, 43 seconds)
2026-01-24 02:30:56,954 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 02:30:56,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 02:36:49,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7564.07422 ± 884.096
2026-01-24 02:36:49,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8170.1216, 6245.9414, 7482.177, 8149.691, 8344.236, 8203.417, 8143.5767, 7725.7544, 7606.9062, 5568.9233]
2026-01-24 02:36:49,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 02:36:49,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 52 minutes, 26 seconds)
2026-01-24 02:49:55,535 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 02:49:55,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 02:55:46,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7359.73193 ± 1819.377
2026-01-24 02:55:46,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7688.929, 8244.337, 8093.5493, 8103.5405, 7859.2485, 7660.128, 1932.4677, 7785.5693, 8147.9893, 8081.5605]
2026-01-24 02:55:46,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 02:55:46,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 87/100 (estimated time remaining: 4 hours, 31 minutes, 45 seconds)
2026-01-24 03:08:41,049 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 03:08:41,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 03:14:29,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8226.91699 ± 149.392
2026-01-24 03:14:29,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8271.951, 8270.491, 8224.72, 8082.816, 8500.473, 8132.1206, 8274.922, 8245.31, 7915.8735, 8350.498]
2026-01-24 03:14:29,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 03:14:29,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (8226.92) for latency DatasetOffice
2026-01-24 03:14:29,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 88/100 (estimated time remaining: 4 hours, 7 minutes, 48 seconds)
2026-01-24 03:27:30,259 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 03:27:30,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 03:33:21,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7809.39355 ± 718.579
2026-01-24 03:33:21,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8222.967, 8033.659, 8085.796, 5680.0176, 8125.619, 8096.5996, 7784.195, 8110.3687, 8004.2827, 7950.4365]
2026-01-24 03:33:21,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 03:33:21,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 47 minutes, 40 seconds)
2026-01-24 03:46:20,123 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 03:46:20,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 03:52:09,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8228.19922 ± 149.014
2026-01-24 03:52:09,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8016.2764, 8051.534, 8332.492, 8280.368, 8406.406, 8099.4883, 8241.934, 8164.483, 8506.838, 8182.172]
2026-01-24 03:52:09,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 03:52:09,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (8228.20) for latency DatasetOffice
2026-01-24 03:52:09,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 28 minutes, 20 seconds)
2026-01-24 04:05:25,351 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 04:05:25,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 04:11:13,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7016.62744 ± 2313.150
2026-01-24 04:11:13,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8175.9604, 7395.949, 7574.3857, 8092.3755, 8432.626, 8318.113, 8125.6504, 419.4726, 7820.476, 5811.2666]
2026-01-24 04:11:13,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 04:11:13,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 91/100 (estimated time remaining: 3 hours, 8 minutes, 48 seconds)
2026-01-24 04:24:06,481 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 04:24:06,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 04:29:56,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8123.55176 ± 332.828
2026-01-24 04:29:56,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7799.473, 8638.492, 8622.919, 8246.836, 7844.32, 8035.6494, 7515.094, 8115.5225, 8244.207, 8173.003]
2026-01-24 04:29:56,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 04:29:56,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 49 minutes, 30 seconds)
2026-01-24 04:43:10,112 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 04:43:10,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 04:49:01,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7838.01562 ± 174.730
2026-01-24 04:49:01,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7895.243, 8053.94, 8062.6113, 7822.94, 7755.715, 7694.9873, 8020.3486, 7532.86, 7915.8306, 7625.6777]
2026-01-24 04:49:01,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 04:49:01,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 31 minutes, 14 seconds)
2026-01-24 05:02:24,952 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 05:02:24,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 05:08:16,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8101.36426 ± 799.398
2026-01-24 05:08:16,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8716.145, 8540.534, 8472.0, 5769.119, 8033.3916, 8280.927, 8162.534, 8467.655, 8292.448, 8278.894]
2026-01-24 05:08:16,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 05:08:16,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 94/100 (estimated time remaining: 2 hours, 12 minutes, 53 seconds)
2026-01-24 05:21:31,380 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 05:21:31,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 05:27:24,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8490.16504 ± 206.466
2026-01-24 05:27:24,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8069.372, 8817.154, 8608.39, 8496.182, 8649.8955, 8240.49, 8592.6455, 8338.753, 8568.075, 8520.69]
2026-01-24 05:27:24,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 05:27:24,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (8490.17) for latency DatasetOffice
2026-01-24 05:27:24,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 54 minutes, 18 seconds)
2026-01-24 05:40:19,497 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 05:40:19,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 05:46:12,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7665.37012 ± 926.909
2026-01-24 05:46:12,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8062.2812, 7713.508, 7477.5293, 7998.991, 8479.942, 8128.9346, 8016.737, 8054.117, 7726.211, 4995.453]
2026-01-24 05:46:12,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 05:46:12,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 34 minutes, 58 seconds)
2026-01-24 05:59:37,936 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 05:59:37,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 06:05:33,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8165.44824 ± 316.839
2026-01-24 06:05:33,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8055.7314, 8062.309, 8365.884, 7538.032, 7800.4077, 8368.277, 8006.0635, 8426.436, 8646.332, 8385.012]
2026-01-24 06:05:33,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 06:05:33,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 16 minutes, 29 seconds)
2026-01-24 06:18:51,739 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 06:18:51,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 06:26:06,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8230.77051 ± 239.405
2026-01-24 06:26:06,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8268.436, 8534.636, 8243.45, 8208.042, 8418.666, 8179.774, 8322.812, 8325.684, 7580.1494, 8226.057]
2026-01-24 06:26:06,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 06:26:06,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 98/100 (estimated time remaining: 58 minutes, 14 seconds)
2026-01-24 06:39:23,307 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 06:39:23,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 06:45:13,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7527.72412 ± 2561.840
2026-01-24 06:45:13,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8536.819, 8316.481, 8401.188, -83.403305, 8621.287, 8298.502, 7856.4985, 8859.181, 8794.499, 7676.188]
2026-01-24 06:45:13,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 06:45:13,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 99/100 (estimated time remaining: 38 minutes, 46 seconds)
2026-01-24 06:58:07,301 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 06:58:07,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 07:03:56,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8571.88086 ± 98.682
2026-01-24 07:03:56,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8618.665, 8650.336, 8595.914, 8498.826, 8406.01, 8484.939, 8632.873, 8637.094, 8736.371, 8457.774]
2026-01-24 07:03:56,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 07:03:56,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1274 [INFO]: New best (8571.88) for latency DatasetOffice
2026-01-24 07:03:56,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1247 [INFO]: Iteration 100/100 (estimated time remaining: 19 minutes, 18 seconds)
2026-01-24 07:17:16,637 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 07:17:16,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 07:23:12,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7969.00244 ± 848.035
2026-01-24 07:23:12,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8548.083, 7654.8086, 7856.156, 8502.812, 8245.3125, 8589.743, 8234.63, 8166.4116, 8323.672, 5568.3955]
2026-01-24 07:23:12,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 07:23:12,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1299 [DEBUG]: Training session finished
