2025-09-11 18:44:02,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc25-halfcheetah/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 18:44:02,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc25-halfcheetah/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 18:44:02,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14b98859c210>}
2025-09-11 18:44:02,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1111 [DEBUG]: using device: cuda
2025-09-11 18:44:02,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-11 18:44:02,518 baseline-mbpac-noiseperc25-halfcheetah:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-11 18:44:02,518 baseline-mbpac-noiseperc25-halfcheetah:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 18:44:02,525 baseline-mbpac-noiseperc25-halfcheetah:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-11 18:44:03,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-11 18:44:03,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-11 18:54:30,574 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 18:54:30,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 18:58:59,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -321.39731 ± 47.965
2025-09-11 18:58:59,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-350.81482, -305.83762, -307.5591, -235.8271, -299.55875, -303.29413, -324.56943, -423.82596, -293.38934, -369.29663]
2025-09-11 18:58:59,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 18:58:59,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (-321.40) for latency ExtremeClogL1U23
2025-09-11 18:58:59,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 24 hours, 38 minutes, 18 seconds)
2025-09-11 19:10:18,705 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:10:18,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:14:40,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -166.67642 ± 43.144
2025-09-11 19:14:40,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-101.57604, -187.22682, -215.22635, -106.874985, -224.65492, -166.53447, -132.05537, -161.10446, -149.00635, -222.50436]
2025-09-11 19:14:40,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 19:14:40,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (-166.68) for latency ExtremeClogL1U23
2025-09-11 19:14:40,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 24 hours, 59 minutes, 59 seconds)
2025-09-11 19:25:56,635 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:25:56,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:30:19,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -99.30288 ± 47.284
2025-09-11 19:30:19,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-123.938156, -90.28557, -170.74304, -119.73388, -74.67919, -89.98093, -94.17155, -170.00818, -10.338918, -49.149376]
2025-09-11 19:30:19,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 19:30:19,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (-99.30) for latency ExtremeClogL1U23
2025-09-11 19:30:19,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 24 hours, 55 minutes, 51 seconds)
2025-09-11 19:41:40,075 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:41:40,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:46:04,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -29.32343 ± 68.499
2025-09-11 19:46:04,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-113.98691, 42.973713, -71.23713, -106.46631, -58.703945, 87.184555, -13.092887, 56.70196, -19.331568, -97.27578]
2025-09-11 19:46:04,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 19:46:04,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (-29.32) for latency ExtremeClogL1U23
2025-09-11 19:46:04,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 24 hours, 48 minutes, 23 seconds)
2025-09-11 19:57:24,450 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:57:24,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:01:53,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 250.70383 ± 115.422
2025-09-11 20:01:53,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [253.77673, 284.06375, 71.21266, 422.76483, 271.3126, 290.161, 21.168625, 220.68797, 345.378, 326.51218]
2025-09-11 20:01:53,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 20:01:53,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (250.70) for latency ExtremeClogL1U23
2025-09-11 20:01:53,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 24 hours, 38 minutes, 55 seconds)
2025-09-11 20:13:17,597 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:13:17,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:17:46,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 358.44058 ± 339.450
2025-09-11 20:17:46,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [663.67816, -70.60814, 12.524292, 716.58887, 767.74744, -76.68207, 105.66704, 375.31888, 283.96375, 806.2074]
2025-09-11 20:17:46,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 20:17:46,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (358.44) for latency ExtremeClogL1U23
2025-09-11 20:17:46,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 24 hours, 41 minutes, 15 seconds)
2025-09-11 20:29:09,749 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:29:09,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:33:38,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 614.72760 ± 305.056
2025-09-11 20:33:38,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [779.47485, 1009.8388, 365.59747, 1053.9183, 516.6946, 159.3115, 848.39844, 215.71167, 424.42596, 773.9044]
2025-09-11 20:33:38,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 20:33:38,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (614.73) for latency ExtremeClogL1U23
2025-09-11 20:33:38,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 24 hours, 28 minutes, 45 seconds)
2025-09-11 20:45:00,165 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:45:00,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:49:29,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 872.22101 ± 310.116
2025-09-11 20:49:29,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1198.9097, 330.6517, 1059.1982, 993.8254, 1017.22815, 1004.3463, 302.64514, 1158.713, 650.69476, 1005.9975]
2025-09-11 20:49:29,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 20:49:29,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (872.22) for latency ExtremeClogL1U23
2025-09-11 20:49:29,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 24 hours, 16 minutes, 42 seconds)
2025-09-11 21:00:50,944 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:00:50,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:05:21,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 767.15057 ± 392.030
2025-09-11 21:05:21,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [297.82703, 1102.181, 410.6986, 1243.6272, 1182.6462, 572.1304, 278.69717, 375.70074, 1206.1128, 1001.8842]
2025-09-11 21:05:21,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:05:21,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 24 hours, 2 minutes, 51 seconds)
2025-09-11 21:16:43,839 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:16:43,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:21:07,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1258.57788 ± 127.967
2025-09-11 21:21:07,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1336.7847, 1347.6772, 1064.567, 1290.6781, 1295.4084, 1190.4277, 1307.338, 1420.9406, 994.0195, 1337.9366]
2025-09-11 21:21:07,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:21:07,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1258.58) for latency ExtremeClogL1U23
2025-09-11 21:21:07,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 23 hours, 46 minutes, 4 seconds)
2025-09-11 21:32:30,309 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:32:30,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:36:57,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1056.07935 ± 501.158
2025-09-11 21:36:57,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1227.5725, 1482.0302, 1347.9573, 1461.5469, 235.9454, 1321.6724, 1428.6189, 178.52351, 507.5127, 1369.4141]
2025-09-11 21:36:57,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:36:57,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 23 hours, 29 minutes, 16 seconds)
2025-09-11 21:48:19,555 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:48:19,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:52:47,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1329.86206 ± 377.882
2025-09-11 21:52:47,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1227.6871, 1733.9962, 1399.6005, 1345.1273, 1349.0615, 1378.1554, 1494.3079, 271.47302, 1585.5474, 1513.6649]
2025-09-11 21:52:47,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:52:47,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1329.86) for latency ExtremeClogL1U23
2025-09-11 21:52:47,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 23 hours, 13 minutes, 11 seconds)
2025-09-11 22:04:09,559 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:04:09,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:08:38,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1464.91040 ± 210.267
2025-09-11 22:08:38,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1190.7622, 1455.296, 1593.3329, 1477.9896, 1660.7303, 1511.3347, 1638.5352, 966.2387, 1630.1741, 1524.7098]
2025-09-11 22:08:38,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:08:38,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1464.91) for latency ExtremeClogL1U23
2025-09-11 22:08:38,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 22 hours, 57 minutes, 10 seconds)
2025-09-11 22:20:01,718 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:20:01,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:24:30,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1572.48804 ± 108.179
2025-09-11 22:24:30,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1674.1405, 1520.3417, 1690.3188, 1681.5275, 1581.3561, 1350.1625, 1645.3048, 1436.5166, 1622.1188, 1523.0933]
2025-09-11 22:24:30,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:24:30,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1572.49) for latency ExtremeClogL1U23
2025-09-11 22:24:30,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 22 hours, 41 minutes, 25 seconds)
2025-09-11 22:35:52,348 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:35:52,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:40:21,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1652.64941 ± 95.249
2025-09-11 22:40:21,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1425.4666, 1747.3739, 1750.6044, 1624.638, 1684.7308, 1685.7582, 1722.3884, 1666.0685, 1542.9971, 1676.4678]
2025-09-11 22:40:21,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:40:21,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1652.65) for latency ExtremeClogL1U23
2025-09-11 22:40:21,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 22 hours, 27 minutes, 7 seconds)
2025-09-11 22:51:45,993 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:51:46,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:56:15,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1616.65833 ± 152.904
2025-09-11 22:56:15,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1686.5721, 1800.3601, 1582.962, 1653.6709, 1335.2963, 1477.538, 1779.5258, 1825.228, 1477.3369, 1548.0935]
2025-09-11 22:56:15,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:56:15,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 22 hours, 12 minutes, 22 seconds)
2025-09-11 23:07:42,938 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:07:42,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:12:11,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1612.34399 ± 190.036
2025-09-11 23:12:11,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1775.6586, 1344.197, 1923.2986, 1416.7811, 1465.7731, 1400.1954, 1685.7439, 1809.8982, 1733.8323, 1568.0638]
2025-09-11 23:12:11,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:12:11,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 21 hours, 58 minutes, 4 seconds)
2025-09-11 23:23:34,722 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:23:34,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:28:01,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1736.30627 ± 149.292
2025-09-11 23:28:01,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1731.5266, 1883.1119, 1634.4269, 1894.2766, 1518.4808, 1659.2886, 2018.2073, 1794.0927, 1603.1879, 1626.4637]
2025-09-11 23:28:01,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:28:01,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1736.31) for latency ExtremeClogL1U23
2025-09-11 23:28:01,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 21 hours, 41 minutes, 55 seconds)
2025-09-11 23:39:24,642 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:39:24,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:43:55,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1665.00525 ± 320.280
2025-09-11 23:43:55,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1581.2861, 1830.8859, 1542.7814, 1826.4065, 1734.0507, 1769.8973, 766.51483, 1824.0253, 1927.6207, 1846.5847]
2025-09-11 23:43:55,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:43:55,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 21 hours, 26 minutes, 40 seconds)
2025-09-11 23:55:22,232 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:55:22,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:59:49,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1402.04517 ± 462.516
2025-09-11 23:59:49,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1907.8728, 772.5224, 793.7918, 1694.0142, 1684.5605, 1928.4172, 1182.5646, 1644.6646, 1701.5339, 710.50977]
2025-09-11 23:59:49,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:59:49,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 21 hours, 11 minutes, 26 seconds)
2025-09-12 00:11:04,726 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:11:04,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:15:30,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1773.92969 ± 201.465
2025-09-12 00:15:30,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1958.2517, 2063.5676, 1670.7737, 2025.9865, 1868.2429, 1686.957, 1651.4535, 1855.1627, 1513.7158, 1445.1853]
2025-09-12 00:15:30,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:15:30,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1773.93) for latency ExtremeClogL1U23
2025-09-12 00:15:30,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 20 hours, 52 minutes, 2 seconds)
2025-09-12 00:26:42,832 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:26:42,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:31:05,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1689.38928 ± 401.792
2025-09-12 00:31:05,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1755.8137, 542.9401, 1523.7198, 1929.644, 1922.1873, 1945.7872, 1702.8688, 1872.1174, 1902.8859, 1795.9285]
2025-09-12 00:31:05,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:31:05,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 20 hours, 30 minutes, 40 seconds)
2025-09-12 00:42:07,591 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:42:07,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:46:25,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1861.33521 ± 167.800
2025-09-12 00:46:25,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1516.1503, 2027.9679, 2045.9312, 1693.2887, 2067.506, 1933.6907, 1938.7106, 1734.8158, 1840.4678, 1814.8242]
2025-09-12 00:46:25,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:46:25,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1861.34) for latency ExtremeClogL1U23
2025-09-12 00:46:26,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 20 hours, 7 minutes, 31 seconds)
2025-09-12 00:57:28,385 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:57:28,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:01:47,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1912.61401 ± 142.052
2025-09-12 01:01:47,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1986.7753, 1806.5695, 2202.1096, 1776.1407, 1766.1494, 1779.2816, 2055.2136, 2018.4243, 1805.201, 1930.2748]
2025-09-12 01:01:47,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:01:47,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1912.61) for latency ExtremeClogL1U23
2025-09-12 01:01:47,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 19 hours, 43 minutes, 39 seconds)
2025-09-12 01:12:52,115 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:12:52,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:17:08,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1763.89380 ± 182.937
2025-09-12 01:17:08,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1930.5546, 1817.4974, 1828.0503, 1454.1132, 1955.3375, 1735.03, 1570.0205, 1499.8418, 1862.1512, 1986.3406]
2025-09-12 01:17:08,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:17:08,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 19 hours, 19 minutes, 41 seconds)
2025-09-12 01:28:13,558 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:28:13,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:32:32,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1899.26404 ± 151.489
2025-09-12 01:32:32,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2085.5269, 1956.8182, 2073.8726, 1739.1854, 2009.3392, 1647.4741, 2056.2458, 1765.1959, 1768.2461, 1890.7365]
2025-09-12 01:32:32,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:32:32,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 19 hours, 7 seconds)
2025-09-12 01:43:37,407 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:43:37,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:47:54,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1880.79077 ± 188.567
2025-09-12 01:47:54,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2040.6869, 2092.4258, 1621.5176, 1606.4756, 1903.1125, 1970.7775, 1924.0491, 2164.043, 1825.32, 1659.4994]
2025-09-12 01:47:54,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:47:54,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 18 hours, 41 minutes, 42 seconds)
2025-09-12 01:59:00,424 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:59:00,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:03:19,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1865.79004 ± 202.149
2025-09-12 02:03:19,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1785.4635, 1748.1975, 1955.7125, 1781.2146, 1701.6613, 2256.6826, 1717.221, 2011.0085, 2124.0046, 1576.7339]
2025-09-12 02:03:19,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:03:19,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 18 hours, 27 minutes, 13 seconds)
2025-09-12 02:14:26,283 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:14:26,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:18:45,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1874.70276 ± 133.717
2025-09-12 02:18:45,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1802.7028, 2044.0625, 1701.5044, 1916.1294, 1726.8557, 1749.0377, 2012.8431, 1873.0281, 1815.8872, 2104.976]
2025-09-12 02:18:45,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:18:45,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 18 hours, 12 minutes, 53 seconds)
2025-09-12 02:29:51,155 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:29:51,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:34:14,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1908.13184 ± 113.616
2025-09-12 02:34:14,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1806.7393, 1940.9897, 1885.9796, 1889.2701, 1835.2472, 2031.4064, 2143.3367, 1762.9288, 1991.4266, 1793.9933]
2025-09-12 02:34:14,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:34:14,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 17 hours, 59 minutes, 26 seconds)
2025-09-12 02:45:20,762 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:45:20,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:49:36,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1833.72974 ± 639.252
2025-09-12 02:49:36,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2087.2468, 2092.536, 2056.235, 2275.4658, 1791.56, -43.462456, 1950.4037, 2091.678, 2153.7993, 1881.8344]
2025-09-12 02:49:36,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:49:36,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 17 hours, 43 minutes, 33 seconds)
2025-09-12 03:00:41,976 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:00:41,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:05:00,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1978.71655 ± 188.599
2025-09-12 03:05:00,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1577.7512, 1781.738, 1951.8375, 1836.6033, 2066.5142, 2054.247, 2213.1162, 2115.9707, 2197.4226, 1991.9668]
2025-09-12 03:05:00,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:05:00,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1978.72) for latency ExtremeClogL1U23
2025-09-12 03:05:00,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 17 hours, 28 minutes, 27 seconds)
2025-09-12 03:16:07,291 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:16:07,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:20:24,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1861.35034 ± 424.926
2025-09-12 03:20:24,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1840.0757, 1987.0511, 625.4824, 2064.3723, 1852.1627, 2027.9456, 1872.5322, 2119.2273, 2125.367, 2099.2864]
2025-09-12 03:20:24,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:20:24,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 17 hours, 12 minutes, 49 seconds)
2025-09-12 03:31:29,281 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:31:29,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:35:48,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1955.92444 ± 207.785
2025-09-12 03:35:48,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2250.4365, 1843.474, 2019.2883, 1888.5764, 2197.961, 1981.8552, 2093.9536, 1835.1046, 1972.8986, 1475.6945]
2025-09-12 03:35:48,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:35:48,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 16 hours, 57 minutes, 3 seconds)
2025-09-12 03:46:56,082 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:46:56,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:51:13,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1866.44470 ± 248.141
2025-09-12 03:51:13,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1723.0016, 2071.5554, 1950.9446, 1945.3143, 1797.9384, 1972.955, 1755.3048, 1984.1803, 2211.7976, 1251.4553]
2025-09-12 03:51:13,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:51:13,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 16 hours, 40 minutes, 42 seconds)
2025-09-12 04:02:19,920 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:02:19,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:06:42,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1914.43286 ± 510.398
2025-09-12 04:06:42,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2077.1843, 2430.9167, 2015.7943, 2152.3984, 459.85156, 2184.33, 1936.5016, 2142.5142, 1861.8308, 1883.0067]
2025-09-12 04:06:42,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:06:42,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 16 hours, 26 minutes, 49 seconds)
2025-09-12 04:17:49,215 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:17:49,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:22:06,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2005.99878 ± 135.565
2025-09-12 04:22:06,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1870.5729, 2025.5049, 2097.9966, 2059.4614, 1983.5634, 2056.7722, 2159.839, 2058.3105, 2085.404, 1662.5647]
2025-09-12 04:22:06,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:22:06,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (2006.00) for latency ExtremeClogL1U23
2025-09-12 04:22:06,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 16 hours, 11 minutes, 23 seconds)
2025-09-12 04:33:14,958 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:33:14,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:37:39,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1904.57935 ± 428.647
2025-09-12 04:37:39,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2063.6475, 2187.431, 1961.4473, 673.9392, 1854.3961, 2173.5596, 2211.2695, 2113.2854, 1872.8054, 1934.0128]
2025-09-12 04:37:39,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:37:39,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 15 hours, 57 minutes, 58 seconds)
2025-09-12 04:48:48,100 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:48:48,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:53:08,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2026.88892 ± 141.316
2025-09-12 04:53:08,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1877.3062, 1864.4786, 1906.2272, 2185.2551, 1884.5144, 2028.1936, 2138.6636, 2153.4966, 2268.4395, 1962.3142]
2025-09-12 04:53:08,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:53:08,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (2026.89) for latency ExtremeClogL1U23
2025-09-12 04:53:08,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 15 hours, 43 minutes, 20 seconds)
2025-09-12 05:04:13,151 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:04:13,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:08:30,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2024.80981 ± 160.035
2025-09-12 05:08:30,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1687.0685, 1871.0676, 1934.6653, 2080.741, 2068.694, 1975.7124, 2116.0225, 2085.9714, 2116.3274, 2311.8271]
2025-09-12 05:08:30,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:08:30,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 27 minutes, 25 seconds)
2025-09-12 05:19:36,694 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:19:36,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:23:59,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2096.58716 ± 148.172
2025-09-12 05:23:59,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1977.6354, 2155.81, 2347.0732, 2212.0137, 2017.7856, 2174.1667, 2109.1038, 1760.9257, 2123.105, 2088.2522]
2025-09-12 05:23:59,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:23:59,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (2096.59) for latency ExtremeClogL1U23
2025-09-12 05:23:59,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 15 hours, 11 minutes, 54 seconds)
2025-09-12 05:35:03,906 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:35:03,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:39:25,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2004.19019 ± 296.715
2025-09-12 05:39:25,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2246.2637, 1202.3931, 2231.2966, 2130.9453, 2265.448, 1983.575, 2073.1475, 2069.8958, 1821.5798, 2017.3567]
2025-09-12 05:39:25,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:39:25,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 14 hours, 56 minutes, 53 seconds)
2025-09-12 05:50:28,216 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:50:28,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:54:46,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1833.02026 ± 537.118
2025-09-12 05:54:46,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2004.8503, 283.0022, 2070.8027, 2043.1747, 2052.751, 2051.3433, 2010.3301, 1855.8993, 2280.3806, 1677.6675]
2025-09-12 05:54:46,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:54:46,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 14 hours, 39 minutes, 12 seconds)
2025-09-12 06:05:50,324 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:05:50,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:10:06,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2173.92407 ± 135.517
2025-09-12 06:10:06,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2126.2036, 2287.0442, 2118.82, 2381.9656, 1920.8679, 2300.8367, 2028.7205, 2150.918, 2305.7668, 2118.0952]
2025-09-12 06:10:06,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:10:06,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (2173.92) for latency ExtremeClogL1U23
2025-09-12 06:10:06,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 14 hours, 22 minutes, 4 seconds)
2025-09-12 06:21:09,524 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:21:09,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:25:32,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2077.48047 ± 183.925
2025-09-12 06:25:32,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2056.1646, 2189.8096, 2153.8738, 2252.865, 1618.836, 2040.4333, 2045.6454, 2097.4592, 2335.7903, 1983.9269]
2025-09-12 06:25:32,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:25:32,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 14 hours, 7 minutes, 24 seconds)
2025-09-12 06:36:37,589 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:36:37,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:41:01,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2139.28564 ± 130.223
2025-09-12 06:41:01,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1880.386, 2149.687, 1996.54, 2229.598, 2081.4773, 2034.3304, 2208.5295, 2232.5464, 2308.6118, 2271.1501]
2025-09-12 06:41:01,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:41:01,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 13 hours, 52 minutes, 1 second)
2025-09-12 06:52:06,907 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:52:06,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:56:26,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2061.06055 ± 542.537
2025-09-12 06:56:26,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2400.7244, 2543.8552, 2340.6418, 2162.4324, 2306.956, 2095.939, 1984.7283, 503.01074, 2124.97, 2147.3472]
2025-09-12 06:56:26,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:56:26,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 13 hours, 36 minutes, 27 seconds)
2025-09-12 07:07:31,815 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:07:31,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:11:51,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2077.44653 ± 373.611
2025-09-12 07:11:51,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1833.6384, 2328.4243, 2092.485, 2283.988, 2556.1414, 1114.1179, 2301.3384, 2205.855, 2061.7974, 1996.6792]
2025-09-12 07:11:51,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:11:51,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 13 hours, 21 minutes, 33 seconds)
2025-09-12 07:22:57,164 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:22:57,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:27:13,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2165.29346 ± 198.984
2025-09-12 07:27:13,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1981.9177, 2044.6484, 2032.4619, 2170.8025, 2676.6013, 2017.0417, 2020.0743, 2318.1106, 2181.6626, 2209.6123]
2025-09-12 07:27:13,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:27:13,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 13 hours, 6 minutes, 37 seconds)
2025-09-12 07:38:19,372 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:38:19,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:42:42,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2087.50562 ± 286.932
2025-09-12 07:42:42,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1822.3452, 2025.7382, 1998.2712, 2683.8328, 2369.9797, 2249.6792, 1573.5533, 2093.7944, 1963.8723, 2093.9878]
2025-09-12 07:42:42,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:42:42,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 12 hours, 51 minutes, 37 seconds)
2025-09-12 07:53:48,026 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:53:48,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:58:10,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2262.02002 ± 166.730
2025-09-12 07:58:10,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2149.9565, 2028.2433, 2070.8538, 2551.9446, 2297.538, 2392.2341, 2511.8179, 2205.3281, 2198.887, 2213.3984]
2025-09-12 07:58:10,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:58:10,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (2262.02) for latency ExtremeClogL1U23
2025-09-12 07:58:10,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 12 hours, 36 minutes, 4 seconds)
2025-09-12 08:09:16,715 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:09:16,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:13:38,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2245.35718 ± 179.284
2025-09-12 08:13:38,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2304.6074, 2297.2085, 2254.0442, 2136.6892, 2249.5344, 2472.865, 2012.2963, 2588.5886, 1976.281, 2161.4583]
2025-09-12 08:13:38,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:13:38,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 21 minutes, 8 seconds)
2025-09-12 08:24:45,297 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:24:45,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:29:07,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2162.88477 ± 139.866
2025-09-12 08:29:07,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2162.418, 2085.9739, 1922.9822, 2417.797, 2123.0771, 2163.561, 2058.1938, 2205.0898, 2385.2417, 2104.5107]
2025-09-12 08:29:07,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:29:07,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 6 minutes, 24 seconds)
2025-09-12 08:40:14,419 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:40:14,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:44:37,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2305.31299 ± 143.182
2025-09-12 08:44:37,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2405.6394, 2281.045, 2113.4453, 2545.7502, 2283.3215, 2473.84, 2320.665, 2247.3313, 2334.4604, 2047.6295]
2025-09-12 08:44:37,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:44:37,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (2305.31) for latency ExtremeClogL1U23
2025-09-12 08:44:37,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 11 hours, 52 minutes, 2 seconds)
2025-09-12 08:55:43,690 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:55:43,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:00:00,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2129.13013 ± 201.822
2025-09-12 09:00:00,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1874.4388, 2349.5815, 2430.0854, 1999.6759, 2132.3406, 2197.7908, 2246.5388, 2218.1985, 2113.309, 1729.3428]
2025-09-12 09:00:00,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:00:00,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 11 hours, 35 minutes, 39 seconds)
2025-09-12 09:11:05,915 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:11:05,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:15:26,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2244.13623 ± 162.998
2025-09-12 09:15:26,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2171.5518, 2404.1953, 2447.2212, 2306.416, 1952.9338, 2502.4329, 2243.8665, 2189.653, 2115.2102, 2107.8796]
2025-09-12 09:15:26,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:15:26,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 11 hours, 19 minutes, 56 seconds)
2025-09-12 09:26:33,613 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:26:33,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:30:56,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1680.81116 ± 797.765
2025-09-12 09:30:56,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2227.633, 2092.513, 2121.5369, 2196.752, 87.33976, 2094.414, 1084.4762, 2358.3555, 2160.544, 384.54453]
2025-09-12 09:30:56,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:30:56,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 11 hours, 4 minutes, 39 seconds)
2025-09-12 09:41:58,847 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:41:58,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:46:20,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2302.67383 ± 159.387
2025-09-12 09:46:20,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2493.0708, 2349.3535, 2439.999, 2000.233, 2431.8018, 2342.2803, 2409.852, 2328.5312, 2065.57, 2166.0476]
2025-09-12 09:46:20,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:46:20,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 10 hours, 48 minutes, 33 seconds)
2025-09-12 09:57:23,843 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:57:23,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:01:45,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2330.95435 ± 214.757
2025-09-12 10:01:45,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2484.9475, 2448.2764, 2621.3782, 2286.483, 1995.5259, 2088.3203, 2514.0886, 2266.3494, 2553.7249, 2050.4487]
2025-09-12 10:01:45,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:01:45,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (2330.95) for latency ExtremeClogL1U23
2025-09-12 10:01:45,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 10 hours, 32 minutes, 33 seconds)
2025-09-12 10:12:50,663 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:12:50,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:17:13,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2390.33057 ± 188.580
2025-09-12 10:17:13,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2748.9263, 2170.4492, 2236.1208, 2211.1013, 2484.35, 2221.413, 2639.6165, 2441.573, 2475.286, 2274.468]
2025-09-12 10:17:13,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:17:13,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (2390.33) for latency ExtremeClogL1U23
2025-09-12 10:17:13,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 10 hours, 17 minutes, 50 seconds)
2025-09-12 10:28:20,896 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:28:20,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:32:43,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2158.05591 ± 496.358
2025-09-12 10:32:43,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2416.6943, 752.01074, 2561.2188, 2252.2478, 2402.1157, 2430.9934, 1999.1227, 2054.3062, 2350.469, 2361.38]
2025-09-12 10:32:43,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:32:43,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 10 hours, 2 minutes, 51 seconds)
2025-09-12 10:43:50,836 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:43:50,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:48:15,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1934.09412 ± 710.542
2025-09-12 10:48:15,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2076.9202, 2433.838, 1974.8695, 2462.7786, 2339.6567, 1098.9572, 148.05704, 2534.5947, 2178.5117, 2092.7588]
2025-09-12 10:48:15,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:48:15,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 9 hours, 47 minutes, 39 seconds)
2025-09-12 10:59:21,598 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:59:21,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:03:45,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2307.39404 ± 203.557
2025-09-12 11:03:45,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2129.4595, 2525.0374, 1903.764, 2366.2712, 2349.9285, 2125.137, 2434.4321, 2600.4163, 2194.03, 2445.4668]
2025-09-12 11:03:45,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:03:45,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 9 hours, 32 minutes, 52 seconds)
2025-09-12 11:14:53,541 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:14:53,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:19:22,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2285.59814 ± 203.800
2025-09-12 11:19:22,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2519.4949, 2338.6277, 2106.1333, 2498.3726, 2642.3352, 2081.421, 2046.8607, 2062.7925, 2231.5303, 2328.4097]
2025-09-12 11:19:22,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:19:22,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 9 hours, 18 minutes, 45 seconds)
2025-09-12 11:30:59,246 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:30:59,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:35:29,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2409.03662 ± 157.158
2025-09-12 11:35:29,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2515.8972, 2552.3909, 2241.0437, 2069.051, 2591.1147, 2486.8284, 2554.8342, 2349.4968, 2324.2632, 2405.446]
2025-09-12 11:35:29,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:35:29,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (2409.04) for latency ExtremeClogL1U23
2025-09-12 11:35:29,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 9 hours, 7 minutes, 47 seconds)
2025-09-12 11:47:11,842 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:47:11,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:51:41,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2047.48474 ± 602.320
2025-09-12 11:51:41,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2069.187, 2441.833, 1578.266, 2550.5105, 2033.2294, 455.79904, 2199.2532, 2617.5417, 2157.9326, 2371.2954]
2025-09-12 11:51:41,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:51:41,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 8 hours, 56 minutes, 54 seconds)
2025-09-12 12:03:16,584 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:03:16,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:07:42,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2268.07935 ± 186.909
2025-09-12 12:07:42,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2553.8943, 2272.5981, 2191.087, 2022.4365, 2423.89, 2130.0352, 1943.0664, 2472.0903, 2364.128, 2307.5674]
2025-09-12 12:07:42,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:07:42,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 44 minutes, 20 seconds)
2025-09-12 12:19:20,926 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:19:20,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:23:47,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2296.83936 ± 197.147
2025-09-12 12:23:47,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2039.2413, 2332.1445, 2194.6946, 1876.714, 2267.4038, 2361.7083, 2495.5132, 2491.8264, 2484.5754, 2424.5708]
2025-09-12 12:23:47,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:23:47,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 32 minutes, 13 seconds)
2025-09-12 12:35:27,141 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:35:27,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:39:57,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2521.62451 ± 155.460
2025-09-12 12:39:57,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2632.9016, 2289.3435, 2483.7969, 2459.251, 2792.2559, 2536.65, 2411.9927, 2371.784, 2765.8748, 2472.3962]
2025-09-12 12:39:57,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:39:57,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (2521.62) for latency ExtremeClogL1U23
2025-09-12 12:39:57,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 19 minutes, 36 seconds)
2025-09-12 12:51:38,693 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:51:38,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:56:08,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2332.14307 ± 141.224
2025-09-12 12:56:08,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2394.673, 2338.4883, 2418.7903, 2421.3284, 2428.1636, 2533.695, 2241.0798, 2054.7546, 2367.3804, 2123.0754]
2025-09-12 12:56:08,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:56:08,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 8 hours, 3 minutes, 54 seconds)
2025-09-12 13:07:49,631 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:07:49,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:12:16,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2474.55371 ± 166.949
2025-09-12 13:12:16,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2367.404, 2519.083, 2647.8777, 2455.615, 2409.0608, 2673.436, 2706.8936, 2203.0625, 2533.251, 2229.8542]
2025-09-12 13:12:16,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:12:16,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 47 minutes, 25 seconds)
2025-09-12 13:23:56,600 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:23:56,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:28:32,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2243.74951 ± 476.779
2025-09-12 13:28:32,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2235.7532, 2471.883, 2595.458, 987.09546, 2573.4624, 2171.8242, 2770.289, 2420.918, 1935.9943, 2274.8179]
2025-09-12 13:28:32,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:28:32,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 32 minutes, 41 seconds)
2025-09-12 13:40:03,320 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:40:03,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:44:25,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2417.94238 ± 285.619
2025-09-12 13:44:25,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2397.6187, 2635.948, 1663.922, 2546.5554, 2582.742, 2421.0579, 2582.7402, 2694.0686, 2193.6284, 2461.1428]
2025-09-12 13:44:25,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:44:25,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 15 minutes, 27 seconds)
2025-09-12 13:55:30,024 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:55:30,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:59:56,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2430.85156 ± 140.784
2025-09-12 13:59:56,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2214.179, 2492.475, 2250.5483, 2643.415, 2450.9407, 2262.4656, 2432.745, 2462.5623, 2626.4, 2472.784]
2025-09-12 13:59:56,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:59:56,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 55 minutes, 57 seconds)
2025-09-12 14:11:21,710 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:11:21,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:15:38,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2250.88647 ± 550.541
2025-09-12 14:15:38,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2622.1497, 649.24133, 2369.2793, 2243.5098, 2709.2534, 2402.4656, 2303.6616, 2433.2205, 2447.404, 2328.6792]
2025-09-12 14:15:38,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:15:38,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 37 minutes, 29 seconds)
2025-09-12 14:26:37,641 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:26:37,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:30:57,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2407.89990 ± 326.551
2025-09-12 14:30:57,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2539.1184, 2512.2253, 2743.3296, 2434.2837, 1735.5027, 2872.5298, 2177.9773, 2016.415, 2414.1538, 2633.4624]
2025-09-12 14:30:57,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:30:57,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 17 minutes, 41 seconds)
2025-09-12 14:41:57,819 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:41:57,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:46:13,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2238.56934 ± 534.905
2025-09-12 14:46:13,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2311.146, 2621.8499, 2385.9631, 2441.9949, 1310.7366, 2315.3613, 2639.9023, 1152.5974, 2308.9114, 2897.23]
2025-09-12 14:46:13,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:46:13,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 57 minutes, 19 seconds)
2025-09-12 14:57:08,057 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:57:08,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:01:25,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2194.93237 ± 641.506
2025-09-12 15:01:25,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2252.9456, 2507.8972, 2595.725, 378.5816, 2082.6492, 2352.5083, 2476.5806, 2065.3594, 2442.774, 2794.3052]
2025-09-12 15:01:25,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:01:25,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 38 minutes, 46 seconds)
2025-09-12 15:12:26,006 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:12:26,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:16:46,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2341.36963 ± 183.049
2025-09-12 15:16:46,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2574.7483, 2292.8914, 2289.8813, 2101.3904, 2303.2864, 2269.743, 2049.671, 2501.729, 2656.9678, 2373.388]
2025-09-12 15:16:46,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:16:46,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 22 minutes, 40 seconds)
2025-09-12 15:27:46,430 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:27:46,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:32:07,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2481.14941 ± 545.217
2025-09-12 15:32:07,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [861.01733, 2586.5356, 2782.4944, 2735.1523, 2706.7454, 2694.6729, 2592.152, 2683.0764, 2656.5222, 2513.1262]
2025-09-12 15:32:07,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:32:07,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 5 minutes, 58 seconds)
2025-09-12 15:43:10,319 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:43:10,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:47:31,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2518.25928 ± 198.951
2025-09-12 15:47:31,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2546.339, 3026.0696, 2504.9434, 2299.0273, 2601.674, 2543.8682, 2244.6829, 2471.5815, 2467.9304, 2476.4785]
2025-09-12 15:47:31,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:47:31,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 50 minutes, 55 seconds)
2025-09-12 15:58:35,709 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:58:35,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:02:53,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2486.68311 ± 246.845
2025-09-12 16:02:53,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2655.3552, 2737.4272, 2499.2454, 2193.1619, 2188.8574, 2717.0405, 2122.034, 2818.8577, 2618.8743, 2315.9795]
2025-09-12 16:02:53,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:02:53,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 36 minutes, 2 seconds)
2025-09-12 16:13:57,846 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:13:57,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:18:19,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2496.87524 ± 452.198
2025-09-12 16:18:19,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [3101.052, 2720.8975, 2239.2458, 2696.8943, 2518.8352, 2462.5308, 2608.5725, 1298.9497, 2761.553, 2560.2212]
2025-09-12 16:18:19,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:18:19,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 21 minutes, 27 seconds)
2025-09-12 16:29:25,021 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:29:25,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:33:44,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2521.53564 ± 246.609
2025-09-12 16:33:44,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2848.9475, 2555.0942, 2460.3274, 2053.212, 2615.4087, 2779.2473, 2206.7815, 2658.5017, 2314.3599, 2723.4783]
2025-09-12 16:33:44,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:33:44,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 6 minutes, 17 seconds)
2025-09-12 16:44:48,691 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:44:48,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:49:07,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2688.26172 ± 181.208
2025-09-12 16:49:07,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2784.671, 2681.7908, 2955.3442, 2710.2605, 2755.539, 2649.7817, 2593.944, 2803.172, 2726.3816, 2221.7322]
2025-09-12 16:49:07,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:49:07,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (2688.26) for latency ExtremeClogL1U23
2025-09-12 16:49:07,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 50 minutes, 57 seconds)
2025-09-12 17:00:12,738 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:00:12,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:04:34,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2467.38354 ± 214.574
2025-09-12 17:04:34,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2007.5906, 2276.443, 2527.6597, 2309.696, 2745.8616, 2356.114, 2696.1355, 2592.9858, 2593.9187, 2567.434]
2025-09-12 17:04:34,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:04:34,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 35 minutes, 45 seconds)
2025-09-12 17:15:41,057 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:15:41,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:20:01,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2652.89014 ± 112.663
2025-09-12 17:20:01,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2585.666, 2639.2705, 2849.0527, 2503.8887, 2817.104, 2641.674, 2621.599, 2718.2705, 2482.8281, 2669.5457]
2025-09-12 17:20:01,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:20:01,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 20 minutes, 32 seconds)
2025-09-12 17:31:03,216 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:31:03,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:35:21,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2554.73828 ± 90.779
2025-09-12 17:35:21,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2547.9382, 2597.647, 2427.6956, 2575.2078, 2580.825, 2469.8484, 2631.314, 2527.1455, 2746.913, 2442.8484]
2025-09-12 17:35:21,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:35:21,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 4 minutes, 52 seconds)
2025-09-12 17:46:24,208 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:46:24,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:50:39,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2485.39551 ± 512.761
2025-09-12 17:50:39,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2655.201, 1930.5272, 2052.6335, 2872.8938, 3062.644, 1391.7472, 3002.0662, 2374.3418, 2763.9797, 2747.92]
2025-09-12 17:50:39,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:50:39,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 49 minutes, 12 seconds)
2025-09-12 18:01:42,110 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:01:42,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:06:02,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2563.18945 ± 159.954
2025-09-12 18:06:02,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2722.726, 2698.0757, 2626.7207, 2286.2407, 2574.2222, 2851.7788, 2386.1191, 2451.6375, 2521.2224, 2513.1497]
2025-09-12 18:06:02,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:06:02,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 33 minutes, 50 seconds)
2025-09-12 18:17:06,669 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:17:06,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:21:22,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2440.85303 ± 433.580
2025-09-12 18:21:22,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2671.8945, 1659.2418, 2642.9512, 2347.8843, 2599.4185, 2938.3435, 1926.7349, 1964.8691, 2602.85, 3054.3445]
2025-09-12 18:21:22,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:21:22,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 18 minutes, 13 seconds)
2025-09-12 18:32:25,234 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:32:25,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:36:40,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2528.63770 ± 124.103
2025-09-12 18:36:40,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2460.039, 2501.7373, 2420.4438, 2489.87, 2565.6663, 2453.1719, 2483.3772, 2741.1575, 2392.2727, 2778.6414]
2025-09-12 18:36:40,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:36:40,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 2 minutes, 38 seconds)
2025-09-12 18:47:44,702 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:47:44,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:52:00,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2680.03564 ± 262.528
2025-09-12 18:52:00,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2468.8245, 2309.241, 2282.971, 2769.9187, 2980.9614, 3059.094, 2820.1794, 2787.2607, 2466.961, 2854.9434]
2025-09-12 18:52:00,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:52:00,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 47 minutes, 19 seconds)
2025-09-12 19:03:03,044 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:03:03,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:07:26,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2530.92114 ± 438.392
2025-09-12 19:07:26,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1383.11, 2760.508, 2672.744, 2336.5808, 2763.0212, 2606.713, 2427.9666, 2524.023, 2667.2493, 3167.2932]
2025-09-12 19:07:26,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:07:26,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 32 minutes, 9 seconds)
2025-09-12 19:18:29,043 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:18:29,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:22:48,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2216.28760 ± 918.108
2025-09-12 19:22:48,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2897.385, 2417.6685, 2613.4866, 2514.9062, 2966.1714, 846.30646, 2836.2737, 2427.1458, 54.981083, 2588.5513]
2025-09-12 19:22:48,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:22:48,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 16 minutes, 45 seconds)
2025-09-12 19:33:52,152 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:33:52,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:38:13,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2430.28320 ± 571.874
2025-09-12 19:38:13,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2876.5176, 2530.766, 869.30304, 2743.7168, 2743.9546, 2421.9512, 3047.669, 2385.3933, 2462.1262, 2221.4307]
2025-09-12 19:38:13,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:38:13,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 1 minute, 28 seconds)
2025-09-12 19:49:17,049 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:49:17,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:53:33,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2327.54712 ± 758.642
2025-09-12 19:53:33,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2917.1685, 2535.3835, 2371.4976, 182.14973, 2974.6755, 2801.657, 2230.8213, 2312.6516, 2652.784, 2296.6838]
2025-09-12 19:53:33,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:53:33,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 46 minutes, 7 seconds)
2025-09-12 20:04:38,224 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:04:38,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:08:59,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2783.35107 ± 194.573
2025-09-12 20:08:59,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2611.0872, 2954.387, 3034.647, 2639.8096, 2675.6418, 2880.7817, 3126.3237, 2490.565, 2729.3208, 2690.9485]
2025-09-12 20:08:59,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:08:59,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (2783.35) for latency ExtremeClogL1U23
2025-09-12 20:08:59,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 30 minutes, 47 seconds)
2025-09-12 20:20:03,557 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:20:03,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:24:25,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2425.58813 ± 847.482
2025-09-12 20:24:25,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2829.471, 3027.9539, 1360.1257, 2855.278, 2833.943, 2821.3027, 2903.7056, 2639.0432, 2716.8135, 268.2451]
2025-09-12 20:24:25,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:24:25,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 15 minutes, 23 seconds)
2025-09-12 20:35:27,108 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:35:27,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:39:43,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2433.21680 ± 612.371
2025-09-12 20:39:43,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [3006.1123, 2583.8164, 2526.0425, 2432.1963, 734.9083, 2102.1423, 2663.5107, 2869.6262, 2747.6382, 2666.1746]
2025-09-12 20:39:43,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:39:43,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1251 [DEBUG]: Training session finished
