2025-09-11 18:33:46,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc15-halfcheetah/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 18:33:46,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc15-halfcheetah/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 18:33:46,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1491f14fd550>}
2025-09-11 18:33:46,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1111 [DEBUG]: using device: cuda
2025-09-11 18:33:46,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-11 18:33:46,793 baseline-mbpac-noiseperc15-halfcheetah:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-11 18:33:46,793 baseline-mbpac-noiseperc15-halfcheetah:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 18:33:46,801 baseline-mbpac-noiseperc15-halfcheetah:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-11 18:33:48,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-11 18:33:48,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-11 18:45:52,855 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 18:45:52,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 18:51:44,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: -348.43829 ± 41.709
2025-09-11 18:51:44,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [-425.4748, -330.05096, -345.70047, -367.0074, -357.45255, -255.22098, -346.51685, -321.09198, -383.56754, -352.29926]
2025-09-11 18:51:44,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 18:51:44,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (-348.44) for latency ExtremeClogL1U23
2025-09-11 18:51:44,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 29 hours, 36 minutes, 23 seconds)
2025-09-11 19:05:16,835 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:05:16,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:10:04,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: -51.07970 ± 57.479
2025-09-11 19:10:04,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1.819214, -64.44793, -176.43158, -43.28719, 2.5047512, 8.80288, -127.85047, -65.08598, -33.647808, -13.172837]
2025-09-11 19:10:04,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 19:10:04,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (-51.08) for latency ExtremeClogL1U23
2025-09-11 19:10:04,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 29 hours, 37 minutes, 14 seconds)
2025-09-11 19:23:25,754 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:23:25,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:28:12,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 306.18280 ± 255.881
2025-09-11 19:28:12,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [601.2117, 448.06674, -94.68457, 409.07236, 50.750317, -28.63915, 635.5132, 134.98996, 357.3793, 548.168]
2025-09-11 19:28:12,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 19:28:12,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (306.18) for latency ExtremeClogL1U23
2025-09-11 19:28:12,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 29 hours, 19 minutes, 26 seconds)
2025-09-11 19:41:27,223 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:41:27,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:46:13,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1041.31543 ± 467.496
2025-09-11 19:46:13,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1166.2413, 1431.6056, 1393.5685, 1230.9728, 1195.2705, 107.37203, 1293.2405, 1263.2701, 133.43213, 1198.1813]
2025-09-11 19:46:13,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 19:46:13,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (1041.32) for latency ExtremeClogL1U23
2025-09-11 19:46:13,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 28 hours, 58 minutes, 15 seconds)
2025-09-11 19:59:24,897 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:59:24,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:04:09,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1544.30920 ± 400.515
2025-09-11 20:04:09,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1815.0696, 1703.7435, 659.95374, 1849.5057, 1218.5288, 1858.298, 2038.0535, 1722.2197, 1387.5907, 1190.1288]
2025-09-11 20:04:09,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 20:04:09,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (1544.31) for latency ExtremeClogL1U23
2025-09-11 20:04:09,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 28 hours, 36 minutes, 49 seconds)
2025-09-11 20:17:18,843 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:17:18,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:22:03,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1714.40759 ± 853.951
2025-09-11 20:22:03,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2436.7334, 2145.8552, -27.454926, 449.26544, 2412.9827, 1430.5637, 1307.2944, 2463.7397, 2099.7236, 2425.3713]
2025-09-11 20:22:03,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 20:22:03,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (1714.41) for latency ExtremeClogL1U23
2025-09-11 20:22:03,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 28 hours, 17 minutes, 47 seconds)
2025-09-11 20:35:06,911 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:35:06,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:40:55,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2010.62634 ± 1014.664
2025-09-11 20:40:55,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2391.3157, 2809.948, 186.73721, 2639.8247, 245.9115, 2761.3464, 2587.031, 1108.7108, 2531.6458, 2843.793]
2025-09-11 20:40:55,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 20:40:55,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (2010.63) for latency ExtremeClogL1U23
2025-09-11 20:40:55,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 28 hours, 9 minutes, 54 seconds)
2025-09-11 20:54:15,461 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:54:15,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:00:06,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1461.93018 ± 878.933
2025-09-11 21:00:06,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2644.3196, 2448.6377, 140.48203, 1265.751, 1729.1008, 859.3755, 2434.1296, 312.59344, 737.23596, 2047.6771]
2025-09-11 21:00:06,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:00:06,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 28 hours, 10 minutes, 46 seconds)
2025-09-11 21:13:34,575 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:13:34,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:18:20,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 967.46082 ± 963.428
2025-09-11 21:18:20,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [105.75037, 402.06808, 1177.089, 764.13275, 49.305275, 473.6899, 2701.1812, 868.5483, 2853.772, 279.07108]
2025-09-11 21:18:20,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:18:20,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 27 hours, 56 minutes, 23 seconds)
2025-09-11 21:31:15,317 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:31:15,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:36:00,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1955.99805 ± 1330.769
2025-09-11 21:36:00,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3393.76, 3015.9116, 2950.457, 3049.716, 298.91385, 157.2829, 3112.219, 143.86192, 819.11584, 2618.7415]
2025-09-11 21:36:00,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:36:00,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 27 hours, 33 minutes, 17 seconds)
2025-09-11 21:49:15,536 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:49:15,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:54:03,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2002.46289 ± 1177.749
2025-09-11 21:54:03,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3470.2583, 1493.4021, 1496.6508, 505.10324, 3479.9775, 3459.9924, 1123.9023, 3200.1145, 850.8936, 944.334]
2025-09-11 21:54:03,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:54:03,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 27 hours, 17 minutes, 40 seconds)
2025-09-11 22:06:58,834 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:06:58,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:12:47,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1873.70898 ± 1400.219
2025-09-11 22:12:47,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1677.4336, 51.26292, 3718.7778, 47.16357, 128.71765, 1149.9343, 3322.3643, 3302.846, 2154.5388, 3184.05]
2025-09-11 22:12:47,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:12:47,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 26 hours, 56 minutes, 42 seconds)
2025-09-11 22:25:10,480 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:25:10,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:30:49,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2465.46118 ± 1047.050
2025-09-11 22:30:49,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1240.5906, 3191.8193, 1664.1135, 2151.928, 3521.5693, 1635.5885, 3800.3713, 3170.136, 3535.5498, 742.94275]
2025-09-11 22:30:49,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:30:49,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (2465.46) for latency ExtremeClogL1U23
2025-09-11 22:30:49,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 26 hours, 18 minutes, 26 seconds)
2025-09-11 22:43:11,957 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:43:11,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:47:49,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2281.83398 ± 1134.900
2025-09-11 22:47:49,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [149.3491, 542.2817, 3007.619, 2417.0369, 2261.5967, 1472.4941, 3131.7659, 2772.466, 3437.809, 3625.9226]
2025-09-11 22:47:49,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:47:49,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 25 hours, 39 minutes, 18 seconds)
2025-09-11 23:00:02,994 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:00:02,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:05:42,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3264.39990 ± 862.244
2025-09-11 23:05:42,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3714.7256, 3952.264, 3554.717, 3760.2935, 3939.2673, 3074.3704, 3828.251, 3404.2456, 2322.0493, 1093.8152]
2025-09-11 23:05:42,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:05:42,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (3264.40) for latency ExtremeClogL1U23
2025-09-11 23:05:42,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 25 hours, 25 minutes, 1 second)
2025-09-11 23:18:03,500 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:18:03,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:22:38,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2284.34351 ± 1245.310
2025-09-11 23:22:38,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2149.7126, 3809.2498, 237.025, 1552.3403, 3142.5466, 2747.1885, 3593.2983, 1958.653, 206.81322, 3446.6096]
2025-09-11 23:22:38,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:22:38,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 24 hours, 48 minutes, 14 seconds)
2025-09-11 23:34:46,248 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:34:46,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:39:23,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2160.09766 ± 1341.734
2025-09-11 23:39:23,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [765.0787, 941.08014, 3675.091, 734.88055, 287.35782, 3212.502, 3613.6277, 3755.2444, 1626.7958, 2989.3184]
2025-09-11 23:39:23,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:39:23,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 23 hours, 57 minutes, 44 seconds)
2025-09-11 23:51:36,533 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:51:36,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:56:16,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2151.08008 ± 1285.749
2025-09-11 23:56:16,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3819.4766, 2148.8105, 938.14746, 1163.9347, 1523.8188, 1614.7205, 4227.576, 4018.8755, 1412.9385, 642.50476]
2025-09-11 23:56:16,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:56:16,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 23 hours, 21 minutes, 26 seconds)
2025-09-12 00:08:35,504 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:08:35,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:13:10,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2680.79541 ± 1099.215
2025-09-12 00:13:10,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [599.6349, 3674.234, 2086.6396, 4024.2031, 3765.3848, 3067.314, 1468.7521, 2122.0374, 2194.8335, 3804.9226]
2025-09-12 00:13:10,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:13:10,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 23 hours, 2 minutes, 37 seconds)
2025-09-12 00:25:15,056 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:25:15,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:29:49,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2105.89355 ± 1316.943
2025-09-12 00:29:49,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3905.0164, 2646.7441, 2792.852, 60.439533, 4026.0342, 802.2392, 443.4078, 1729.2665, 2946.449, 1706.4872]
2025-09-12 00:29:49,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:29:49,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 22 hours, 25 minutes, 36 seconds)
2025-09-12 00:42:02,093 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:42:02,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:46:40,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2618.31616 ± 1458.596
2025-09-12 00:46:40,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4235.0264, 4199.962, 1281.3922, 3988.1777, 1115.6921, 613.30725, 3136.553, 4337.3423, 905.8659, 2369.844]
2025-09-12 00:46:40,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:46:41,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 22 hours, 7 minutes, 51 seconds)
2025-09-12 00:59:07,600 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:59:07,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:03:43,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2216.23364 ± 1505.435
2025-09-12 01:03:43,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3732.2207, 1098.805, 368.84915, 622.3124, 4169.735, 3818.8896, 994.8528, 812.55035, 4009.2002, 2534.9229]
2025-09-12 01:03:43,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:03:43,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 21 hours, 55 minutes, 24 seconds)
2025-09-12 01:16:16,111 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:16:16,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:20:51,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2277.70605 ± 1633.849
2025-09-12 01:20:51,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [772.5428, 419.99176, 4179.8154, 1361.565, 4362.4463, 1528.7393, 4330.787, 496.884, 1260.0867, 4064.2039]
2025-09-12 01:20:51,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:20:51,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 21 hours, 42 minutes, 36 seconds)
2025-09-12 01:32:55,057 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:32:55,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:37:35,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2929.12842 ± 1422.585
2025-09-12 01:37:35,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4019.4104, 4275.8354, 2846.0732, 1668.3617, 597.69086, 1897.0574, 4207.9307, 4426.2974, 1074.242, 4278.3843]
2025-09-12 01:37:35,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:37:35,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 21 hours, 23 minutes, 5 seconds)
2025-09-12 01:50:03,247 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:50:03,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:54:37,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2348.40283 ± 1318.803
2025-09-12 01:54:37,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1357.6194, 1069.8652, 1315.2961, 1605.8564, 3968.0115, 4311.494, 1387.2826, 2923.5195, 1180.9598, 4364.1235]
2025-09-12 01:54:37,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:54:37,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 21 hours, 12 minutes, 10 seconds)
2025-09-12 02:06:45,134 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:06:45,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:11:23,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2926.40552 ± 1544.125
2025-09-12 02:11:23,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4286.187, 4345.5474, 454.51672, 3298.5393, 4393.9307, 2409.0947, 4057.3628, 1710.0887, 236.0863, 4072.7004]
2025-09-12 02:11:23,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:11:23,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 20 hours, 53 minutes, 36 seconds)
2025-09-12 02:23:24,863 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:23:24,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:28:00,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2328.05591 ± 1302.567
2025-09-12 02:28:00,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1822.9291, 4555.266, 1726.1827, 3690.9478, 1148.648, 2176.2698, 1739.2463, 4193.169, 1933.6974, 294.20203]
2025-09-12 02:28:00,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:28:00,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 20 hours, 30 minutes, 42 seconds)
2025-09-12 02:40:16,626 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:40:16,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:44:55,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2976.32007 ± 918.057
2025-09-12 02:44:55,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3271.3904, 3838.1855, 2354.3562, 4155.309, 2681.534, 4538.3037, 1788.887, 3107.302, 1926.0607, 2101.8713]
2025-09-12 02:44:55,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:44:55,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 20 hours, 10 minutes, 33 seconds)
2025-09-12 02:57:18,201 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:57:18,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:01:56,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2232.69971 ± 1556.406
2025-09-12 03:01:56,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4334.5073, 410.1397, -28.283329, 2749.516, 2756.7686, 1586.9377, 1420.8146, 785.94403, 3788.8284, 4521.8247]
2025-09-12 03:01:56,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:01:56,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 19 hours, 57 minutes, 45 seconds)
2025-09-12 03:14:08,401 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:14:08,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:18:44,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2921.78833 ± 1531.281
2025-09-12 03:18:44,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4167.117, 4637.9937, 295.33142, 3761.2283, 1547.4294, 952.3657, 4319.8784, 3632.2515, 1703.0781, 4201.208]
2025-09-12 03:18:44,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:18:44,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 19 hours, 37 minutes, 38 seconds)
2025-09-12 03:30:52,641 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:30:52,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:35:32,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3412.35205 ± 1002.076
2025-09-12 03:35:32,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2861.784, 2355.1316, 3649.838, 4399.7393, 4475.826, 4356.5186, 2350.1226, 4214.684, 3926.4973, 1533.3772]
2025-09-12 03:35:32,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:35:32,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (3412.35) for latency ExtremeClogL1U23
2025-09-12 03:35:32,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 19 hours, 21 minutes, 21 seconds)
2025-09-12 03:47:41,247 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:47:41,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:52:20,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2920.41064 ± 1657.191
2025-09-12 03:52:20,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [889.2822, 522.00146, 3168.459, 3703.7188, 4306.163, 46.347622, 4604.5, 3552.11, 4513.8823, 3897.643]
2025-09-12 03:52:20,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:52:20,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 19 hours, 6 minutes, 52 seconds)
2025-09-12 04:04:32,101 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:04:32,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:09:08,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3262.43872 ± 1543.567
2025-09-12 04:09:08,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2531.0542, 1094.8807, 722.71075, 4741.6, 3446.4492, 1690.9224, 4188.7197, 4960.8906, 4585.2446, 4661.9136]
2025-09-12 04:09:08,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:09:08,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 18 hours, 48 minutes, 23 seconds)
2025-09-12 04:21:24,431 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:21:24,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:26:03,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3311.81787 ± 1182.680
2025-09-12 04:26:03,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4820.8716, 3111.2463, 3078.3232, 4385.836, 4453.196, 1211.9375, 2974.8105, 4408.514, 3211.8335, 1461.6123]
2025-09-12 04:26:03,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:26:03,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 18 hours, 30 minutes, 17 seconds)
2025-09-12 04:38:12,067 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:38:12,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:42:47,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3126.59595 ± 1341.496
2025-09-12 04:42:47,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3115.5896, 3133.9404, 4318.915, 3086.9985, 1867.7845, 4665.358, 232.74223, 3829.7925, 2191.852, 4822.9863]
2025-09-12 04:42:47,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:42:47,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 18 hours, 12 minutes, 39 seconds)
2025-09-12 04:54:49,284 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:54:49,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:00:29,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3509.89380 ± 1787.991
2025-09-12 05:00:29,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4689.7993, 4884.3516, 4478.1787, 4791.1733, 1188.5455, 4859.716, 4391.132, 1254.0765, 4518.41, 43.555946]
2025-09-12 05:00:29,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:00:29,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (3509.89) for latency ExtremeClogL1U23
2025-09-12 05:00:29,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 18 hours, 7 minutes, 19 seconds)
2025-09-12 05:12:45,165 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:12:45,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:18:25,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2402.38086 ± 1568.588
2025-09-12 05:18:25,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2338.3945, 412.2219, 2553.7207, 375.78113, 288.49557, 3467.3184, 3715.9863, 4598.237, 4417.664, 1855.9874]
2025-09-12 05:18:25,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:18:25,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 18 hours, 4 minutes, 35 seconds)
2025-09-12 05:30:54,069 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:30:54,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:35:34,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2972.49756 ± 1277.400
2025-09-12 05:35:34,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2938.939, 4343.203, 1773.723, 4824.082, 4267.9683, 1440.0452, 3358.9648, 1638.2885, 3852.8037, 1286.9548]
2025-09-12 05:35:34,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:35:34,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 17 hours, 51 minutes, 47 seconds)
2025-09-12 05:47:44,069 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:47:44,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:52:22,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2476.15576 ± 1756.042
2025-09-12 05:52:22,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [747.07715, 4628.622, -5.780023, 256.89722, 4479.961, 2439.094, 4731.2744, 2145.4817, 3861.6536, 1477.2756]
2025-09-12 05:52:22,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:52:22,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 17 hours, 33 minutes, 9 seconds)
2025-09-12 06:04:46,303 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:04:46,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:09:22,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3421.59692 ± 1253.545
2025-09-12 06:09:22,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3734.427, 1865.2703, 4179.4272, 1323.2992, 3158.2087, 3253.371, 4964.6157, 4782.468, 2084.6426, 4870.24]
2025-09-12 06:09:22,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:09:22,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 17 hours, 18 minutes, 56 seconds)
2025-09-12 06:21:23,568 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:21:23,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:27:04,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3808.49731 ± 1554.085
2025-09-12 06:27:04,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2369.7578, 3558.9988, 5101.647, 5131.4033, 4516.805, 5253.211, 1505.0869, 958.9839, 4377.531, 5311.546]
2025-09-12 06:27:04,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:27:04,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (3808.50) for latency ExtremeClogL1U23
2025-09-12 06:27:04,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 17 hours, 1 minute, 37 seconds)
2025-09-12 06:39:33,579 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:39:33,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:44:12,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2605.03809 ± 1641.653
2025-09-12 06:44:12,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4337.523, 452.07437, 4883.3804, 2346.6477, 2375.853, 2738.899, 5079.973, 2522.423, 294.8153, 1018.7922]
2025-09-12 06:44:12,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:44:12,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 16 hours, 35 minutes, 11 seconds)
2025-09-12 06:56:24,012 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:56:24,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:02:05,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2621.31299 ± 1621.294
2025-09-12 07:02:05,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [361.3729, 2579.425, 4579.0684, 3797.9895, 4528.9067, 1006.63074, 75.800156, 2502.3787, 4399.601, 2381.9578]
2025-09-12 07:02:05,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:02:05,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 16 hours, 26 minutes, 22 seconds)
2025-09-12 07:14:23,092 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:14:23,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:19:01,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 4372.53809 ± 1083.708
2025-09-12 07:19:01,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4722.4375, 4223.669, 4742.74, 5016.9775, 4254.8467, 4780.6875, 4749.89, 4894.1147, 5114.2646, 1225.7533]
2025-09-12 07:19:01,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:19:01,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (4372.54) for latency ExtremeClogL1U23
2025-09-12 07:19:01,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 16 hours, 10 minutes, 22 seconds)
2025-09-12 07:30:56,723 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:30:56,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:35:29,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 4284.11328 ± 1262.999
2025-09-12 07:35:29,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3768.733, 4613.0786, 5023.1714, 5298.223, 3279.9614, 4457.591, 4925.512, 5245.0938, 5223.877, 1005.8898]
2025-09-12 07:35:29,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:35:29,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 15 hours, 47 minutes, 14 seconds)
2025-09-12 07:47:14,958 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:47:14,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:52:44,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2453.72168 ± 2235.580
2025-09-12 07:52:44,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [90.602394, 86.90229, 301.97592, 3597.3022, 424.19464, 4793.5796, 4878.2026, 365.04053, 4833.1035, 5166.3125]
2025-09-12 07:52:44,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:52:44,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 15 hours, 25 minutes, 15 seconds)
2025-09-12 08:05:05,925 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:05:05,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:10:35,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2591.26562 ± 1743.249
2025-09-12 08:10:35,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [5429.075, 3908.7893, 108.82846, 4116.935, 3864.8955, 397.92972, 2988.3088, 3099.899, 1307.3157, 690.67914]
2025-09-12 08:10:35,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:10:35,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 15 hours, 15 minutes, 41 seconds)
2025-09-12 08:22:50,193 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:22:50,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:27:17,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3162.56787 ± 1728.320
2025-09-12 08:27:17,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1069.313, 2854.1438, 5177.639, 4898.249, 5447.3735, 1935.1885, 4443.044, 582.4384, 1450.4968, 3767.791]
2025-09-12 08:27:17,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:27:17,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 14 hours, 46 minutes, 7 seconds)
2025-09-12 08:39:20,834 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:39:20,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:43:52,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2773.64966 ± 1554.713
2025-09-12 08:43:52,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4748.1343, 3701.372, 1711.0886, 756.38873, 5297.6567, 3769.2542, 301.73944, 1947.6947, 2580.817, 2922.3518]
2025-09-12 08:43:52,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:43:52,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 14 hours, 25 minutes, 27 seconds)
2025-09-12 08:55:53,153 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:55:53,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:01:22,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2541.64648 ± 1687.592
2025-09-12 09:01:22,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [450.51324, 2726.1453, 4381.619, 1629.9359, 3633.2197, 632.9694, 2113.6829, 4745.047, 317.48254, 4785.85]
2025-09-12 09:01:22,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:01:22,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 14 hours, 18 minutes, 53 seconds)
2025-09-12 09:13:18,775 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:13:18,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:17:46,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3450.58203 ± 1763.008
2025-09-12 09:17:46,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1460.8904, 1463.384, 3626.163, 5094.2554, 5092.7896, 2178.9458, 5320.3745, 4602.554, 5129.1367, 537.32745]
2025-09-12 09:17:46,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:17:46,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 13 hours, 53 minutes, 23 seconds)
2025-09-12 09:29:48,653 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:29:48,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:34:19,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 4262.50537 ± 1151.241
2025-09-12 09:34:19,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4586.098, 4785.541, 1988.0946, 4823.789, 4871.598, 5242.0635, 4029.0288, 5630.0405, 2236.8223, 4431.976]
2025-09-12 09:34:19,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:34:19,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 13 hours, 23 minutes, 51 seconds)
2025-09-12 09:46:10,064 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:46:10,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:51:40,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3529.32739 ± 1433.639
2025-09-12 09:51:40,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4652.406, 4211.0415, 3107.124, 3924.4094, 3115.0674, 4943.0864, 4544.361, 4809.7, 1620.4679, 365.6104]
2025-09-12 09:51:40,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:51:40,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 13 hours, 13 minutes, 8 seconds)
2025-09-12 10:03:55,610 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:03:55,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:08:28,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3747.82153 ± 1642.930
2025-09-12 10:08:28,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3150.0544, 5274.7617, 1314.1782, 4838.9614, 4937.862, 5325.9834, 4838.583, 4649.752, 617.63763, 2530.4407]
2025-09-12 10:08:28,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:08:28,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 12 hours, 58 minutes, 23 seconds)
2025-09-12 10:20:30,057 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:20:30,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:24:57,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3543.87842 ± 1642.377
2025-09-12 10:24:57,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1248.7452, 5310.1357, 5468.3604, 2514.5466, 5105.901, 1010.14636, 5037.325, 2817.613, 4510.1304, 2415.8835]
2025-09-12 10:24:57,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:24:57,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 12 hours, 32 minutes, 15 seconds)
2025-09-12 10:36:52,074 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:36:52,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:41:20,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 4200.18994 ± 1139.959
2025-09-12 10:41:20,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4318.8403, 5213.014, 5352.643, 4307.319, 4039.4604, 5333.0796, 4795.038, 4363.7085, 2604.1406, 1674.657]
2025-09-12 10:41:20,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:41:20,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 12 hours, 15 minutes, 20 seconds)
2025-09-12 10:53:16,448 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:53:16,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:57:47,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3799.91943 ± 1243.590
2025-09-12 10:57:47,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4603.4585, 1066.9072, 4815.527, 4783.142, 2591.6416, 3476.26, 2630.6028, 4473.2383, 4623.3696, 4935.0483]
2025-09-12 10:57:47,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:57:47,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 11 hours, 57 minutes, 49 seconds)
2025-09-12 11:09:45,666 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:09:45,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:14:13,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2130.60767 ± 2054.128
2025-09-12 11:14:13,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [5187.279, 300.33463, 5594.1562, 311.40042, 4346.6875, 511.73944, 2886.8557, 512.9509, 1024.9749, 629.69977]
2025-09-12 11:14:13,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:14:13,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 11 hours, 33 minutes, 26 seconds)
2025-09-12 11:26:24,499 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:26:24,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:30:53,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3115.88940 ± 1969.916
2025-09-12 11:30:53,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [399.40524, 5167.3364, 4990.917, 786.84906, 776.2907, 4326.0063, 5072.0137, 2699.6416, 5352.89, 1587.5437]
2025-09-12 11:30:53,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:30:53,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 11 hours, 15 minutes, 45 seconds)
2025-09-12 11:42:50,615 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:42:50,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:47:19,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 4214.89990 ± 1091.590
2025-09-12 11:47:19,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3302.5134, 2844.9717, 3434.7886, 2324.4277, 5417.593, 4261.838, 5167.085, 5294.3486, 5281.3286, 4820.104]
2025-09-12 11:47:19,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:47:19,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 10 hours, 58 minutes, 56 seconds)
2025-09-12 11:59:27,079 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:59:27,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:03:59,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3034.79346 ± 2086.323
2025-09-12 12:03:59,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4511.266, 278.73483, 5695.024, 4680.09, 3873.8262, 5300.867, 598.85583, 1386.2845, 90.15867, 3932.8289]
2025-09-12 12:03:59,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:03:59,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 10 hours, 44 minutes, 38 seconds)
2025-09-12 12:15:59,909 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:15:59,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:20:32,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3180.82251 ± 1643.267
2025-09-12 12:20:32,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4563.0444, 3876.5918, 4030.2024, 378.58276, 3124.951, 4239.393, 477.2579, 1629.8136, 5136.4854, 4351.905]
2025-09-12 12:20:32,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:20:32,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 10 hours, 28 minutes, 49 seconds)
2025-09-12 12:32:38,751 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:32:38,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:37:07,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3174.95312 ± 1590.784
2025-09-12 12:37:07,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2977.9893, 5210.2935, 2965.9639, 3226.3274, 3537.1777, 243.43233, 830.76196, 4899.578, 2757.1562, 5100.852]
2025-09-12 12:37:07,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:37:07,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 10 hours, 13 minutes, 27 seconds)
2025-09-12 12:48:52,907 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:48:52,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:54:25,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2603.20264 ± 2095.012
2025-09-12 12:54:25,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4153.9463, 5289.347, 1161.3414, 5780.491, 4967.3213, 827.2237, 2255.5972, 158.51532, 998.79913, 439.44556]
2025-09-12 12:54:25,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:54:25,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 10 hours, 1 minute, 28 seconds)
2025-09-12 13:06:30,189 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:06:30,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:12:01,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 4087.33984 ± 1460.466
2025-09-12 13:12:01,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [5426.4937, 5382.5884, 4482.7705, 711.2493, 2281.4673, 4831.5996, 5070.6387, 3330.2815, 4323.8906, 5032.424]
2025-09-12 13:12:01,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:12:01,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 9 hours, 52 minutes, 48 seconds)
2025-09-12 13:23:55,904 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:23:55,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:29:27,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3300.79639 ± 2026.895
2025-09-12 13:29:27,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4264.7227, 4782.1323, 5120.832, 5101.694, 640.6002, 4526.0605, 2886.4385, 5233.199, 145.60722, 306.67703]
2025-09-12 13:29:27,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:29:27,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 9 hours, 41 minutes, 9 seconds)
2025-09-12 13:41:19,172 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:41:19,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:46:49,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3582.15967 ± 2015.050
2025-09-12 13:46:49,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [5105.6934, 798.59106, 4429.4272, 307.56226, 563.5131, 4775.942, 4125.4365, 5376.3105, 5131.723, 5207.3994]
2025-09-12 13:46:49,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:46:49,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 9 hours, 29 minutes, 31 seconds)
2025-09-12 13:58:48,544 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:58:48,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:03:21,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2733.97681 ± 1733.573
2025-09-12 14:03:21,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2198.6794, 64.31476, 1949.7844, 1581.6416, 2717.0332, 5457.9214, 3691.4307, 4000.8513, 522.55493, 5155.557]
2025-09-12 14:03:21,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:03:21,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 9 hours, 11 minutes, 52 seconds)
2025-09-12 14:15:19,793 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:15:19,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:19:52,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3961.61572 ± 1921.457
2025-09-12 14:19:52,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [718.085, 43.80443, 5171.3057, 4135.938, 5506.335, 3176.8508, 5459.084, 4921.782, 5480.236, 5002.739]
2025-09-12 14:19:52,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:19:52,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 49 minutes, 47 seconds)
2025-09-12 14:31:55,150 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:31:55,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:36:23,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 4062.31250 ± 1477.136
2025-09-12 14:36:23,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4937.9697, 5233.363, 5189.4355, 5545.969, 3274.1624, 5383.8354, 3297.9849, 4112.0767, 3096.2966, 552.033]
2025-09-12 14:36:23,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:36:23,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 8 hours, 26 minutes, 13 seconds)
2025-09-12 14:48:22,709 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:48:22,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:52:55,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3746.19580 ± 1704.476
2025-09-12 14:52:55,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2768.8425, 5242.509, 5746.7036, 5297.921, 2358.4197, 1713.0756, 5720.9453, 1242.3745, 2418.1626, 4953.004]
2025-09-12 14:52:55,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:52:55,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 8 hours, 4 minutes, 10 seconds)
2025-09-12 15:05:03,002 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:05:03,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:10:34,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3683.06494 ± 2151.794
2025-09-12 15:10:34,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [295.0833, 5237.1523, 5717.599, 5202.27, 1044.2719, 4361.4556, 6005.2207, 5221.7134, 3326.3618, 419.51886]
2025-09-12 15:10:34,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:10:34,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 49 minutes)
2025-09-12 15:22:47,716 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:22:47,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:27:19,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 4176.68652 ± 1739.363
2025-09-12 15:27:19,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [5289.966, 5176.9346, 4519.2, 5342.159, 5103.633, 3967.98, 5606.9, 68.23696, 4978.4644, 1713.3937]
2025-09-12 15:27:19,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:27:19,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 33 minutes, 24 seconds)
2025-09-12 15:39:33,514 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:39:33,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:44:13,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 4116.96777 ± 1090.791
2025-09-12 15:44:13,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2092.2185, 5032.3633, 3107.9844, 3064.1812, 5211.186, 4969.928, 5287.5107, 3128.1929, 4734.245, 4541.872]
2025-09-12 15:44:13,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:44:13,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 7 hours, 18 minutes, 38 seconds)
2025-09-12 15:56:41,898 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:56:41,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:01:10,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3368.46021 ± 1577.912
2025-09-12 16:01:10,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [5165.7437, 2752.0603, 4805.043, 4245.2144, 5356.7705, 1391.3134, 3115.917, 4444.2686, 1243.9865, 1164.2822]
2025-09-12 16:01:10,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:01:10,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 7 hours, 3 minutes, 57 seconds)
2025-09-12 16:13:14,029 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:13:14,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:18:45,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3733.77930 ± 1323.335
2025-09-12 16:18:45,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2196.3716, 4859.9937, 5041.2065, 4680.5166, 3869.45, 3884.6328, 3874.9705, 1231.2249, 2280.6775, 5418.7476]
2025-09-12 16:18:45,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:18:45,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 51 minutes, 56 seconds)
2025-09-12 16:30:39,749 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:30:39,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:36:09,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2330.81323 ± 1497.864
2025-09-12 16:36:09,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2325.7104, 1166.0579, 4181.7075, 58.31632, 724.2652, 4252.9062, 1337.3353, 2351.8984, 2293.309, 4616.628]
2025-09-12 16:36:09,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:36:09,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 33 minutes, 40 seconds)
2025-09-12 16:48:23,423 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:48:23,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:52:55,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3199.91919 ± 1719.029
2025-09-12 16:52:55,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2441.9697, 4860.5864, 918.42224, 3513.2383, 4585.5083, 1007.92365, 4495.0073, 779.2323, 3659.227, 5738.077]
2025-09-12 16:52:55,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:52:55,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 6 hours, 16 minutes, 39 seconds)
2025-09-12 17:04:47,431 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:04:47,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:10:17,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3487.05786 ± 1719.029
2025-09-12 17:10:17,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [5402.1885, 2601.8792, 5466.9517, 4573.9526, 853.33813, 1607.598, 1342.2517, 4703.3687, 3031.4424, 5287.605]
2025-09-12 17:10:17,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:10:17,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 6 hours, 1 minute, 25 seconds)
2025-09-12 17:22:28,589 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:22:28,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:26:59,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3657.36597 ± 1815.219
2025-09-12 17:26:59,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2099.3208, 4955.9233, 5472.89, 94.05692, 3610.5828, 4841.6943, 5370.1177, 5263.089, 1222.9783, 3643.0095]
2025-09-12 17:26:59,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:26:59,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 43 minutes, 15 seconds)
2025-09-12 17:38:52,151 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:38:52,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:43:20,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2691.42920 ± 2011.635
2025-09-12 17:43:20,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4817.796, 2183.9197, 676.73914, 2414.4812, 4863.693, 5360.547, 464.77972, 5015.142, 914.69147, 202.50201]
2025-09-12 17:43:20,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:43:20,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 21 minutes, 25 seconds)
2025-09-12 17:55:22,348 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:55:22,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:59:52,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3067.57275 ± 2264.169
2025-09-12 17:59:52,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [5255.5625, 5649.278, 655.1861, 361.33893, 1967.9618, 5435.221, 4719.323, 5303.5586, 128.83893, 1199.4583]
2025-09-12 17:59:52,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:59:52,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 5 hours, 1 minute, 19 seconds)
2025-09-12 18:12:01,770 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:12:01,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:16:29,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3457.29932 ± 2037.699
2025-09-12 18:16:29,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [5712.22, 5141.2993, 1964.8927, 5297.123, 329.49658, 2634.3098, 1108.6815, 1499.174, 5825.971, 5059.824]
2025-09-12 18:16:29,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:16:29,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 44 minutes, 7 seconds)
2025-09-12 18:28:42,972 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:28:42,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:33:10,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2833.74121 ± 1862.868
2025-09-12 18:33:10,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1408.8726, 4659.3545, 818.93427, 1608.5353, 5181.98, 691.0057, 551.8596, 4356.3184, 4087.6343, 4972.9175]
2025-09-12 18:33:10,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:33:10,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 25 minutes, 15 seconds)
2025-09-12 18:45:17,132 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:45:17,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:49:50,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 4336.92725 ± 2026.734
2025-09-12 18:49:50,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [5761.617, 5529.9365, 5400.0693, 5079.9272, 5165.1743, 5359.994, 483.87698, 5242.4775, 5223.1816, 123.015144]
2025-09-12 18:49:50,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:49:50,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 8 minutes, 31 seconds)
2025-09-12 19:01:40,399 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:01:40,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:06:07,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3429.14648 ± 2109.232
2025-09-12 19:06:07,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [335.00757, 3246.8552, 2107.3723, 1529.9678, 5228.736, 201.1414, 5360.974, 5743.001, 5345.588, 5192.8223]
2025-09-12 19:06:07,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:06:07,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 51 minutes, 48 seconds)
2025-09-12 19:18:18,579 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:18:18,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:22:50,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 4166.58984 ± 1645.985
2025-09-12 19:22:50,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4935.1387, 5052.917, 5902.807, 5160.948, 1397.1481, 5156.309, 5029.7397, 4978.181, 3144.1526, 908.55615]
2025-09-12 19:22:50,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:22:50,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 35 minutes, 43 seconds)
2025-09-12 19:34:53,861 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:34:53,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:39:21,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2396.01685 ± 1595.762
2025-09-12 19:39:21,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3501.8352, 359.24252, 762.44135, 3277.4949, 406.92645, 4714.724, 1724.2329, 3963.1426, 4091.814, 1158.3138]
2025-09-12 19:39:21,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:39:21,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 18 minutes, 53 seconds)
2025-09-12 19:51:24,277 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:51:24,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:55:56,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2958.34131 ± 2017.428
2025-09-12 19:55:56,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1177.0706, 2150.8726, 612.62, 3625.5637, 5653.2275, 4636.6743, 5093.3535, 94.75456, 1282.6577, 5256.6187]
2025-09-12 19:55:56,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:55:56,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 2 minutes, 4 seconds)
2025-09-12 20:07:43,511 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:07:43,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:12:11,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3276.16504 ± 1815.768
2025-09-12 20:12:11,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4706.532, 4795.52, 1266.7384, 1532.6069, 4496.6943, 5660.2144, 4710.873, 1425.2357, 3820.844, 346.3909]
2025-09-12 20:12:11,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:12:11,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 44 minutes, 41 seconds)
2025-09-12 20:23:59,109 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:23:59,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:28:32,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3958.75732 ± 1816.031
2025-09-12 20:28:32,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3735.3054, 5502.7554, 1018.31946, 1554.2686, 5462.9263, 5565.258, 5394.1284, 5248.5522, 4772.069, 1333.9921]
2025-09-12 20:28:32,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:28:32,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 28 minutes, 21 seconds)
2025-09-12 20:40:37,886 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:40:37,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:46:09,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 4039.64209 ± 1662.590
2025-09-12 20:46:09,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [5165.6323, 2843.1372, 4337.442, 4804.713, 5216.5894, 5121.6367, 4968.8643, 1879.5254, 370.29895, 5688.5806]
2025-09-12 20:46:09,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:46:09,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 13 minutes, 19 seconds)
2025-09-12 20:58:15,930 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:58:15,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:03:47,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 4144.77588 ± 1636.705
2025-09-12 21:03:47,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1973.8284, 5261.352, 4522.894, 3710.7488, 5221.2725, 6175.549, 1907.6205, 5640.66, 1621.7784, 5412.053]
2025-09-12 21:03:47,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:03:47,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 58 minutes, 11 seconds)
2025-09-12 21:15:39,785 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:15:39,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:21:12,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 4656.71875 ± 693.205
2025-09-12 21:21:12,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2811.2417, 5034.4175, 4785.761, 4210.404, 5452.5654, 4727.1616, 4517.7656, 5087.3877, 4944.4785, 4996.006]
2025-09-12 21:21:12,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:21:12,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (4656.72) for latency ExtremeClogL1U23
2025-09-12 21:21:12,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 42 minutes, 19 seconds)
2025-09-12 21:33:35,014 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:33:35,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:39:10,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3669.09570 ± 2146.538
2025-09-12 21:39:10,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3729.035, 375.19586, 5072.6123, 848.40857, 5720.4316, 5291.0127, 5829.7197, 4665.06, 296.70087, 4862.7812]
2025-09-12 21:39:10,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:39:10,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 26 minutes, 59 seconds)
2025-09-12 21:51:25,901 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:51:25,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:56:01,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 4447.23730 ± 1505.240
2025-09-12 21:56:01,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [92.0094, 4626.8633, 4457.4697, 4868.9214, 4983.5283, 5204.196, 5844.484, 4918.795, 5108.786, 4367.322]
2025-09-12 21:56:01,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:56:01,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 9 minutes, 59 seconds)
2025-09-12 22:08:22,613 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:08:22,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 22:12:59,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3429.68896 ± 2373.542
2025-09-12 22:12:59,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [6030.982, 5227.5137, 148.69754, 4758.0376, 5581.09, 4942.5093, 1780.9583, 218.14163, 5361.571, 247.3897]
2025-09-12 22:12:59,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:12:59,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 52 minutes, 5 seconds)
2025-09-12 22:25:02,113 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:25:02,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 22:30:37,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 4095.53198 ± 1466.948
2025-09-12 22:30:37,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [5127.4565, 5731.5264, 3500.034, 5524.722, 4192.241, 5507.755, 1518.6339, 5050.498, 2197.0457, 2605.4075]
2025-09-12 22:30:37,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:30:37,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 34 minutes, 44 seconds)
2025-09-12 22:42:57,554 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:42:57,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 22:47:36,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 4442.69385 ± 1389.817
2025-09-12 22:47:36,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [5166.433, 5519.337, 3741.9543, 5372.286, 5809.9995, 1352.8315, 5200.0645, 5344.654, 4337.545, 2581.83]
2025-09-12 22:47:36,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:47:36,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 17 minutes, 16 seconds)
2025-09-12 22:59:45,983 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:59:45,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 23:04:20,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 4074.76831 ± 1845.718
2025-09-12 23:04:20,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [5453.963, 254.87582, 616.95184, 4868.914, 4419.1587, 5092.76, 5239.589, 4832.577, 5334.3164, 4634.5776]
2025-09-12 23:04:20,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:04:20,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1251 [DEBUG]: Training session finished
