2025-09-11 18:20:27,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc5-halfcheetah/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 18:20:27,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc5-halfcheetah/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 18:20:27,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1485017a0e10>}
2025-09-11 18:20:27,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1111 [DEBUG]: using device: cuda
2025-09-11 18:20:27,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-11 18:20:27,355 baseline-mbpac-noiseperc5-halfcheetah:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-11 18:20:27,355 baseline-mbpac-noiseperc5-halfcheetah:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 18:20:27,364 baseline-mbpac-noiseperc5-halfcheetah:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-11 18:20:28,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-11 18:20:28,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-11 18:31:11,908 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 18:31:11,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 18:35:56,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: -348.97308 ± 36.292
2025-09-11 18:35:56,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [-417.6483, -316.6681, -354.43222, -371.4967, -291.47028, -363.97577, -348.0654, -331.67685, -386.10825, -308.18915]
2025-09-11 18:35:56,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 18:35:56,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (-348.97) for latency ExtremeClogL1U23
2025-09-11 18:35:56,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 25 hours, 31 minutes, 58 seconds)
2025-09-11 18:47:47,191 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 18:47:47,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 18:52:29,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: -34.13637 ± 48.603
2025-09-11 18:52:29,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [8.02574, -72.42473, -83.07316, -30.70058, -6.698351, -28.844334, 11.357449, -131.08156, 40.21131, -48.135487]
2025-09-11 18:52:29,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 18:52:29,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (-34.14) for latency ExtremeClogL1U23
2025-09-11 18:52:29,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 26 hours, 8 minutes, 30 seconds)
2025-09-11 19:04:20,444 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:04:20,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:09:03,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 983.43927 ± 547.871
2025-09-11 19:09:03,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [553.72363, 1571.1925, 1446.3055, 97.650856, 1448.2161, 1431.6312, 1469.4786, 1082.3628, 524.29645, 209.53546]
2025-09-11 19:09:03,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 19:09:03,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (983.44) for latency ExtremeClogL1U23
2025-09-11 19:09:03,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 26 hours, 10 minutes, 47 seconds)
2025-09-11 19:20:59,180 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:20:59,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:25:41,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1505.98218 ± 719.113
2025-09-11 19:25:41,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1529.1963, 1932.5468, 2314.1758, 2013.4819, 252.52487, 2137.8347, 1684.1893, 199.09064, 1927.2123, 1069.569]
2025-09-11 19:25:41,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 19:25:41,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (1505.98) for latency ExtremeClogL1U23
2025-09-11 19:25:41,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 26 hours, 5 minutes, 16 seconds)
2025-09-11 19:37:33,285 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:37:33,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:42:22,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2860.58398 ± 168.242
2025-09-11 19:42:22,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2924.3694, 2947.1685, 2512.5688, 3063.119, 2703.1318, 2801.7092, 2950.4856, 2899.1125, 3086.5928, 2717.5823]
2025-09-11 19:42:22,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 19:42:22,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (2860.58) for latency ExtremeClogL1U23
2025-09-11 19:42:22,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 25 hours, 56 minutes, 3 seconds)
2025-09-11 19:54:13,966 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:54:13,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:58:59,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2147.16943 ± 1357.415
2025-09-11 19:58:59,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3624.6233, 3409.1177, 3363.6091, 627.5293, 131.73483, 256.39133, 2139.2957, 3132.1943, 1361.001, 3426.197]
2025-09-11 19:58:59,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 19:58:59,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 26 hours, 1 minute, 19 seconds)
2025-09-11 20:10:55,566 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:10:55,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:15:36,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3354.11963 ± 630.678
2025-09-11 20:15:36,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1480.4988, 3468.474, 3613.9038, 3639.0486, 3630.9102, 3434.2231, 3601.8804, 3402.7385, 3611.8228, 3657.6953]
2025-09-11 20:15:36,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 20:15:36,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (3354.12) for latency ExtremeClogL1U23
2025-09-11 20:15:36,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 25 hours, 46 minutes, 12 seconds)
2025-09-11 20:27:28,493 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:27:28,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:32:12,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3062.65967 ± 1258.147
2025-09-11 20:32:12,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4081.5925, 3806.8208, 3963.59, 1733.4612, 3900.6519, 3744.4255, 3537.1975, 2166.5896, 42.824413, 3649.4448]
2025-09-11 20:32:12,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 20:32:12,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 25 hours, 30 minutes, 4 seconds)
2025-09-11 20:44:01,216 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:44:01,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:48:39,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2931.93701 ± 1527.192
2025-09-11 20:48:39,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4027.7898, 4061.1833, 3636.3428, 3984.9456, -15.789267, 4047.0266, 3767.0342, 3875.3972, 1021.1711, 914.2678]
2025-09-11 20:48:39,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 20:48:39,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 25 hours, 9 minutes, 55 seconds)
2025-09-11 21:00:11,584 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:00:11,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:04:47,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2047.08374 ± 1752.338
2025-09-11 21:04:47,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [969.7902, 4370.5596, 4020.8755, 327.92188, 485.60922, 240.44275, 544.166, 4184.48, 1248.0261, 4078.966]
2025-09-11 21:04:47,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:04:47,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 24 hours, 43 minutes, 39 seconds)
2025-09-11 21:16:23,021 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:16:23,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:20:57,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3178.98633 ± 1190.826
2025-09-11 21:20:57,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4192.2407, 2006.2476, 1892.9169, 564.3534, 3325.7776, 3857.7922, 4155.0835, 4219.299, 3908.632, 3667.5188]
2025-09-11 21:20:57,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:20:57,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 24 hours, 18 minutes, 54 seconds)
2025-09-11 21:32:32,896 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:32:32,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:37:07,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3454.10669 ± 1381.592
2025-09-11 21:37:07,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1858.0015, 4342.419, 4587.409, 3911.015, 143.15782, 4134.424, 4254.816, 4472.5464, 4177.618, 2659.6597]
2025-09-11 21:37:07,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:37:07,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (3454.11) for latency ExtremeClogL1U23
2025-09-11 21:37:07,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 23 hours, 54 minutes, 43 seconds)
2025-09-11 21:48:44,595 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:48:44,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:53:21,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3222.05811 ± 1427.481
2025-09-11 21:53:21,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1678.8475, 1849.4917, 4373.8887, 4484.9976, 2689.9458, 4164.9937, 4500.644, 285.12253, 3758.2969, 4434.35]
2025-09-11 21:53:21,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:53:21,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 23 hours, 31 minutes, 50 seconds)
2025-09-11 22:04:59,084 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:04:59,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:09:37,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4007.97412 ± 1044.501
2025-09-11 22:09:37,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4624.3193, 4692.475, 4553.8657, 4486.5195, 4201.326, 3876.8748, 3994.328, 965.6081, 4352.4233, 4332.004]
2025-09-11 22:09:37,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:09:37,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (4007.97) for latency ExtremeClogL1U23
2025-09-11 22:09:37,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 23 hours, 12 minutes, 36 seconds)
2025-09-11 22:21:16,086 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:21:16,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:25:54,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4044.60156 ± 1064.507
2025-09-11 22:25:54,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4301.7896, 4467.5176, 4127.642, 4610.6387, 4442.0547, 4543.346, 4456.184, 4439.379, 4176.7275, 880.7376]
2025-09-11 22:25:54,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:25:54,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (4044.60) for latency ExtremeClogL1U23
2025-09-11 22:25:54,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 22 hours, 59 minutes)
2025-09-11 22:37:32,820 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:37:32,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:42:09,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2271.44189 ± 1635.618
2025-09-11 22:42:09,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4562.2207, 4783.1133, 437.71347, 1903.311, 609.319, 1202.3346, 1958.4939, 2763.5388, 4177.3813, 316.9916]
2025-09-11 22:42:09,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:42:09,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 22 hours, 44 minutes, 12 seconds)
2025-09-11 22:53:48,171 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:53:48,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:58:24,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3704.21436 ± 993.036
2025-09-11 22:58:24,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3026.0684, 4128.5757, 3208.0, 3724.986, 4181.7666, 3995.1423, 4442.603, 4316.483, 4849.893, 1168.6261]
2025-09-11 22:58:24,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:58:24,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 22 hours, 29 minutes, 5 seconds)
2025-09-11 23:10:02,999 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:10:03,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:14:42,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3335.64526 ± 1712.631
2025-09-11 23:14:42,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4661.9805, 4656.889, 1219.1777, 4774.1006, 2335.455, 941.3043, 4824.1396, 4767.1562, 4480.47, 695.7781]
2025-09-11 23:14:42,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:14:42,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 22 hours, 14 minutes, 10 seconds)
2025-09-11 23:26:21,989 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:26:21,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:31:01,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3462.60083 ± 1177.643
2025-09-11 23:31:01,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4036.9338, 4571.072, 2194.7458, 2777.4854, 1760.7924, 4692.6826, 1725.3512, 3586.9243, 4545.939, 4734.0815]
2025-09-11 23:31:01,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:31:01,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 21 hours, 58 minutes, 41 seconds)
2025-09-11 23:42:40,474 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:42:40,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:47:18,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3118.19653 ± 1468.205
2025-09-11 23:47:18,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4707.6377, 1553.168, 3175.6106, 4598.2754, 5167.8384, 2468.1667, 1084.4465, 2737.9377, 4477.31, 1211.5743]
2025-09-11 23:47:18,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:47:18,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 21 hours, 42 minutes, 12 seconds)
2025-09-11 23:58:58,738 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:58:58,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:03:37,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3673.60474 ± 1513.948
2025-09-12 00:03:37,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5008.0503, 5003.2534, 1141.4701, 4843.966, 4673.3145, 1889.6838, 3552.6353, 1329.0797, 4667.979, 4626.6167]
2025-09-12 00:03:37,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:03:37,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 21 hours, 27 minutes, 14 seconds)
2025-09-12 00:15:14,178 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:15:14,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:19:50,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3201.60815 ± 1780.933
2025-09-12 00:19:50,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4877.046, 3123.1548, 660.5345, 852.3222, 4862.217, 912.6564, 4688.1, 5013.97, 2239.748, 4786.3335]
2025-09-12 00:19:50,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:19:50,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 21 hours, 10 minutes, 30 seconds)
2025-09-12 00:31:24,512 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:31:24,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:36:04,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3450.88428 ± 1456.931
2025-09-12 00:36:04,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1994.8477, 4641.9, 5046.733, 2697.9626, 5378.1475, 4775.4526, 4297.4863, 2739.8528, 1364.9572, 1571.5032]
2025-09-12 00:36:04,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:36:04,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 20 hours, 53 minutes, 8 seconds)
2025-09-12 00:47:37,334 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:47:37,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:52:09,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3866.86450 ± 1731.083
2025-09-12 00:52:09,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4959.262, 1531.0908, 5154.275, 1953.4531, 4309.445, 4827.0654, 5139.097, 5210.252, 431.36975, 5153.332]
2025-09-12 00:52:09,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:52:09,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 20 hours, 33 minutes, 11 seconds)
2025-09-12 01:03:38,132 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:03:38,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:08:08,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3522.63623 ± 1724.328
2025-09-12 01:08:08,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5333.6494, 1210.2073, 1488.9067, 4817.548, 4852.2617, 4941.24, 1038.4445, 2203.0234, 3975.9797, 5365.1025]
2025-09-12 01:08:08,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:08:08,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 20 hours, 12 minutes, 33 seconds)
2025-09-12 01:19:36,319 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:19:36,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:24:07,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4543.76660 ± 1173.170
2025-09-12 01:24:07,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5115.2197, 5341.3887, 4972.937, 4649.7285, 5223.645, 1108.0205, 4696.0303, 4769.273, 4504.8525, 5056.571]
2025-09-12 01:24:07,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:24:07,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (4543.77) for latency ExtremeClogL1U23
2025-09-12 01:24:07,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 19 hours, 51 minutes, 17 seconds)
2025-09-12 01:35:37,497 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:35:37,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:40:10,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3311.27734 ± 1881.855
2025-09-12 01:40:10,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5005.7944, 105.5302, 1084.6029, 3552.894, 4978.212, 4887.4033, 5129.367, 4962.961, 2327.3733, 1078.6337]
2025-09-12 01:40:10,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:40:10,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 19 hours, 32 minutes, 55 seconds)
2025-09-12 01:51:42,220 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:51:42,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:56:13,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3459.81689 ± 1630.407
2025-09-12 01:56:13,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2982.9194, 4863.5522, 2578.0334, 2304.9558, 644.7462, 5397.1206, 5008.8853, 5183.0493, 4300.8506, 1334.0543]
2025-09-12 01:56:13,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:56:13,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 19 hours, 14 minutes, 12 seconds)
2025-09-12 02:07:43,933 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:07:43,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:12:22,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3881.69385 ± 1496.361
2025-09-12 02:12:22,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4954.03, 5135.6, 552.674, 4889.2075, 3246.571, 3624.7842, 5300.9556, 4168.11, 1944.3108, 5000.6914]
2025-09-12 02:12:22,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:12:22,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 18 hours, 59 minutes, 5 seconds)
2025-09-12 02:23:55,544 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:23:55,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:28:28,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3868.01440 ± 1801.759
2025-09-12 02:28:28,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5059.4326, 5277.5596, 1844.8514, 4663.6, 1640.0652, 5633.2363, 320.62604, 3801.5408, 5152.163, 5287.0713]
2025-09-12 02:28:28,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:28:28,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 18 hours, 44 minutes, 46 seconds)
2025-09-12 02:40:05,751 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:40:05,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:44:35,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3899.01514 ± 1583.460
2025-09-12 02:44:35,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3400.686, 5477.4434, 5437.372, 1283.2133, 3985.0378, 5188.3984, 5205.84, 4770.4946, 3274.889, 966.7759]
2025-09-12 02:44:35,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:44:35,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 18 hours, 30 minutes, 26 seconds)
2025-09-12 02:56:02,577 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:56:02,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:00:34,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4177.57568 ± 830.622
2025-09-12 03:00:34,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3650.874, 1948.2733, 4678.445, 4568.5703, 4217.183, 4826.3677, 4560.139, 4604.888, 4824.1865, 3896.8284]
2025-09-12 03:00:34,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:00:34,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 18 hours, 13 minutes, 17 seconds)
2025-09-12 03:12:03,444 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:12:03,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:16:36,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4658.90918 ± 786.249
2025-09-12 03:16:36,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4196.162, 4673.8545, 2473.307, 5210.5176, 5080.904, 4639.245, 5034.828, 5102.536, 5095.077, 5082.6616]
2025-09-12 03:16:36,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:16:36,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (4658.91) for latency ExtremeClogL1U23
2025-09-12 03:16:36,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 17 hours, 57 minutes, 4 seconds)
2025-09-12 03:28:07,557 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:28:07,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:32:44,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4218.33447 ± 1216.176
2025-09-12 03:32:44,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3761.1306, 4441.246, 4619.0576, 4531.2227, 5147.7085, 4767.819, 4950.435, 4721.219, 4521.88, 721.6274]
2025-09-12 03:32:44,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:32:44,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 17 hours, 40 minutes, 58 seconds)
2025-09-12 03:44:13,953 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:44:13,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:48:51,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4646.34375 ± 699.980
2025-09-12 03:48:51,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3914.7214, 5334.3325, 4805.791, 4846.4824, 4917.9854, 5030.6167, 4618.812, 5047.438, 5100.0063, 2847.2498]
2025-09-12 03:48:51,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:48:51,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 17 hours, 24 minutes, 59 seconds)
2025-09-12 04:00:14,002 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:00:14,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:04:39,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4737.21680 ± 1335.858
2025-09-12 04:04:39,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5047.74, 5567.6514, 813.7938, 5289.333, 5015.164, 5460.2637, 5098.983, 5455.836, 5038.1997, 4585.203]
2025-09-12 04:04:39,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:04:39,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (4737.22) for latency ExtremeClogL1U23
2025-09-12 04:04:39,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 17 hours, 4 minutes, 53 seconds)
2025-09-12 04:15:58,611 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:15:58,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:20:27,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4297.80176 ± 1401.063
2025-09-12 04:20:27,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2316.6047, 5270.836, 4412.4395, 4811.0576, 944.2323, 4905.071, 5594.678, 4895.897, 5098.048, 4729.1533]
2025-09-12 04:20:27,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:20:27,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 16 hours, 46 minutes, 33 seconds)
2025-09-12 04:31:48,348 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:31:48,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:36:16,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4729.03857 ± 1015.165
2025-09-12 04:36:16,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3462.7842, 5431.195, 2197.739, 5474.3843, 5023.7734, 5302.2773, 5266.617, 5275.3916, 4641.8193, 5214.407]
2025-09-12 04:36:16,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:36:16,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 16 hours, 27 minutes, 50 seconds)
2025-09-12 04:47:35,589 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:47:35,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:52:02,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4015.69531 ± 1665.935
2025-09-12 04:52:02,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4473.118, 5066.611, -23.486038, 5029.7617, 4732.823, 1689.1902, 5244.693, 4966.4956, 4096.913, 4880.8315]
2025-09-12 04:52:02,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:52:02,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 16 hours, 7 minutes, 22 seconds)
2025-09-12 05:03:22,306 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:03:22,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:07:54,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4849.46045 ± 1077.780
2025-09-12 05:07:54,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3774.678, 5448.6216, 5358.0156, 4930.138, 5435.2017, 3887.1433, 5736.5947, 5592.758, 2391.864, 5939.5933]
2025-09-12 05:07:54,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:07:54,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (4849.46) for latency ExtremeClogL1U23
2025-09-12 05:07:54,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 48 minutes, 35 seconds)
2025-09-12 05:19:14,023 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:19:14,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:23:43,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3614.91333 ± 1933.396
2025-09-12 05:23:43,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2032.8204, 1072.479, 5709.0933, 5579.3047, 413.86603, 2097.7908, 3952.8271, 4404.072, 5130.047, 5756.833]
2025-09-12 05:23:43,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:23:43,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 15 hours, 32 minutes, 56 seconds)
2025-09-12 05:35:03,609 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:35:03,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:39:30,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3878.82861 ± 2001.576
2025-09-12 05:39:30,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4550.198, 226.34326, 5066.2974, 5236.012, 5699.1025, 5109.9756, 5378.605, 2512.4487, 235.56651, 4773.736]
2025-09-12 05:39:30,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:39:30,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 15 hours, 17 minutes, 4 seconds)
2025-09-12 05:50:52,384 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:50:52,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:55:26,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3519.27979 ± 1672.334
2025-09-12 05:55:26,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [454.2711, 4388.3677, 5361.7246, 4440.906, 5399.847, 2597.2283, 1101.8654, 2932.2195, 5234.458, 3281.9102]
2025-09-12 05:55:26,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:55:26,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 15 hours, 2 minutes, 33 seconds)
2025-09-12 06:06:50,310 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:06:50,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:11:21,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3012.11353 ± 2245.188
2025-09-12 06:11:21,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5542.8716, 5901.987, 1880.5337, 5809.7686, 4906.6855, 2090.752, 1018.7319, 0.8144593, 1.5587785, 2967.4314]
2025-09-12 06:11:21,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:11:21,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 14 hours, 48 minutes, 24 seconds)
2025-09-12 06:22:40,733 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:22:40,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:27:11,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4112.06885 ± 2083.149
2025-09-12 06:27:11,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5588.5664, 4095.8691, 5630.8516, 5817.858, 760.19794, 5213.5776, 1239.4132, 5989.58, 5709.3804, 1075.3945]
2025-09-12 06:27:11,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:27:11,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 14 hours, 32 minutes, 6 seconds)
2025-09-12 06:38:31,870 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:38:31,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:43:00,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2078.19141 ± 2076.260
2025-09-12 06:43:00,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [155.19518, 348.4601, 517.573, 5608.6084, 485.32242, 1526.6877, 5675.605, 203.027, 3178.959, 3082.4766]
2025-09-12 06:43:00,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:43:00,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 14 hours, 16 minutes, 18 seconds)
2025-09-12 06:54:25,862 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:54:25,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:58:56,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3255.40259 ± 2149.929
2025-09-12 06:58:56,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5833.855, 5735.735, 5485.92, 678.5053, 5914.0063, 1813.9994, 1438.8071, 3389.825, 1532.7151, 730.6575]
2025-09-12 06:58:56,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:58:56,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 14 hours, 1 minute, 52 seconds)
2025-09-12 07:10:18,604 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:10:18,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:14:45,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3781.44385 ± 2031.655
2025-09-12 07:14:45,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5353.01, 5820.6475, 768.07214, 5424.9136, 5982.956, 2829.516, 757.40894, 1527.0227, 5477.9907, 3872.9001]
2025-09-12 07:14:45,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:14:45,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 13 hours, 44 minutes, 53 seconds)
2025-09-12 07:26:05,215 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:26:05,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:30:29,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3034.64941 ± 2121.600
2025-09-12 07:30:29,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1839.8347, 5469.3115, 1434.9177, 339.5832, 6271.8394, 4497.14, 1717.4486, 2217.1333, 760.4635, 5798.8223]
2025-09-12 07:30:29,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:30:29,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 13 hours, 27 minutes, 11 seconds)
2025-09-12 07:41:37,424 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:41:37,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:46:00,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2088.45679 ± 2232.373
2025-09-12 07:46:00,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1446.3326, 1564.9261, 1406.4974, 6441.9346, 6100.6567, 739.8035, 2754.1086, 45.658104, 252.6766, 131.97528]
2025-09-12 07:46:00,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:46:00,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 13 hours, 8 minutes, 4 seconds)
2025-09-12 07:57:07,649 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:57:07,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:01:37,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3958.06689 ± 2239.857
2025-09-12 08:01:37,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1360.5258, 2105.1992, 5270.004, 6197.5776, 5486.362, 5277.1943, 6337.617, 1115.7953, 5892.177, 538.2205]
2025-09-12 08:01:37,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:01:37,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 12 hours, 50 minutes, 25 seconds)
2025-09-12 08:13:56,044 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:13:56,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:18:44,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3306.87695 ± 2179.843
2025-09-12 08:18:44,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2501.0469, 534.8188, 2416.1646, 2777.3904, 665.0872, 6168.6724, 978.55334, 6173.0005, 5823.924, 5030.1123]
2025-09-12 08:18:44,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:18:44,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 46 minutes, 6 seconds)
2025-09-12 08:30:59,834 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:30:59,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:35:45,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3863.67261 ± 1472.611
2025-09-12 08:35:45,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5210.003, 2659.8562, 1494.0684, 4922.797, 3584.4163, 5650.58, 5275.0356, 5018.6704, 1751.9633, 3069.3328]
2025-09-12 08:35:45,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:35:45,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 41 minutes, 18 seconds)
2025-09-12 08:47:57,595 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:47:57,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:52:49,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3987.40112 ± 2259.460
2025-09-12 08:52:49,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5617.2163, 6051.27, 1396.6184, 5219.6606, 1997.5562, 6648.081, 1288.0913, 466.08478, 5512.4233, 5677.011]
2025-09-12 08:52:49,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:52:49,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 12 hours, 37 minutes, 19 seconds)
2025-09-12 09:05:05,545 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:05:05,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:09:57,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3200.82910 ± 2453.058
2025-09-12 09:09:57,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [442.18005, 6072.91, 6493.6187, 2935.7966, 1236.8237, 760.771, 5725.4785, 5813.974, 62.921642, 2463.8145]
2025-09-12 09:09:57,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:09:57,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 12 hours, 35 minutes, 34 seconds)
2025-09-12 09:22:14,075 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:22:14,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:27:03,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4932.80615 ± 1459.984
2025-09-12 09:27:03,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5508.4673, 6375.9395, 6112.439, 2132.361, 5754.0386, 6246.143, 3621.982, 6279.218, 3179.988, 4117.4893]
2025-09-12 09:27:03,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:27:03,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (4932.81) for latency ExtremeClogL1U23
2025-09-12 09:27:03,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 12 hours, 31 minutes, 50 seconds)
2025-09-12 09:39:19,176 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:39:19,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:44:08,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4613.25146 ± 1963.880
2025-09-12 09:44:08,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5480.885, 5654.433, 1854.8499, 5973.967, 154.18964, 6255.714, 4943.192, 3842.2954, 5658.392, 6314.601]
2025-09-12 09:44:08,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:44:08,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 12 hours, 14 minutes, 30 seconds)
2025-09-12 09:56:27,864 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:56:27,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:01:14,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4505.53613 ± 2171.630
2025-09-12 10:01:14,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [6775.547, 314.89597, 5985.5547, 5594.7285, 4842.3916, 506.78143, 6001.601, 3889.28, 5760.856, 5383.7275]
2025-09-12 10:01:14,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:01:14,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 11 hours, 58 minutes, 10 seconds)
2025-09-12 10:13:30,401 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:13:30,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:18:22,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 5512.70703 ± 1393.136
2025-09-12 10:18:22,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [6339.8154, 6354.845, 6257.4033, 5871.292, 5825.4375, 4105.0977, 6255.1455, 5949.771, 6367.6367, 1800.6306]
2025-09-12 10:18:22,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:18:22,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (5512.71) for latency ExtremeClogL1U23
2025-09-12 10:18:22,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 11 hours, 41 minutes, 38 seconds)
2025-09-12 10:30:40,536 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:30:40,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:35:34,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4851.78662 ± 1874.315
2025-09-12 10:35:34,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3472.5374, 6260.763, 5654.126, 3875.7798, 6248.8906, 337.17404, 3868.4968, 5943.448, 6731.2686, 6125.379]
2025-09-12 10:35:34,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:35:34,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 11 hours, 24 minutes, 55 seconds)
2025-09-12 10:47:53,815 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:47:53,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:52:43,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4960.20410 ± 2393.187
2025-09-12 10:52:43,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [6490.353, 5345.4375, 6789.8174, 6379.092, 6706.8325, 6498.535, 5714.8574, 618.0833, 3.0777538, 5055.952]
2025-09-12 10:52:43,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:52:43,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 11 hours, 8 minutes, 11 seconds)
2025-09-12 11:05:03,031 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:05:03,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:09:54,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4168.54590 ± 1869.370
2025-09-12 11:09:54,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3050.4097, 6113.951, 2393.49, 5834.386, 4093.2976, 2548.6523, 501.23907, 5965.34, 6150.476, 5034.2153]
2025-09-12 11:09:54,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:09:54,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 10 hours, 51 minutes, 49 seconds)
2025-09-12 11:22:13,567 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:22:13,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:27:03,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3652.34229 ± 2521.656
2025-09-12 11:27:03,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [68.18137, 6456.2744, 1176.4954, 6631.895, 1923.5538, 1041.9084, 6579.713, 2633.5771, 3457.2148, 6554.6094]
2025-09-12 11:27:03,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:27:03,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 10 hours, 34 minutes, 58 seconds)
2025-09-12 11:39:23,856 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:39:23,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:44:19,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3504.83716 ± 2300.801
2025-09-12 11:44:19,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [6053.4272, 6305.2466, 1182.4044, 6668.3413, 137.61736, 3230.831, 5377.638, 2621.6863, 949.2216, 2521.9587]
2025-09-12 11:44:19,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:44:19,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 10 hours, 18 minutes, 45 seconds)
2025-09-12 11:56:38,876 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:56:38,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:01:28,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3550.75317 ± 2018.059
2025-09-12 12:01:28,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4764.4854, 1948.5531, 2806.941, 756.48474, 3034.5488, 5458.5425, 5717.146, 4004.3862, 384.48404, 6631.959]
2025-09-12 12:01:28,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:01:28,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 10 hours, 1 minute, 19 seconds)
2025-09-12 12:13:48,936 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:13:48,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:18:38,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3823.78369 ± 2576.716
2025-09-12 12:18:38,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [495.54123, 6337.6377, 1991.0675, 6775.663, 860.93256, 4958.6055, 3445.813, 7030.865, 5961.766, 379.94562]
2025-09-12 12:18:38,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:18:38,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 9 hours, 44 minutes, 10 seconds)
2025-09-12 12:30:58,968 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:30:58,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:35:48,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4034.45166 ± 2486.792
2025-09-12 12:35:48,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [6306.471, 751.3731, 207.58296, 2836.1812, 6401.6445, 2972.2344, 5755.8335, 6854.7124, 6594.297, 1664.182]
2025-09-12 12:35:48,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:35:48,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 9 hours, 26 minutes, 53 seconds)
2025-09-12 12:48:08,492 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:48:08,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:52:56,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4090.45654 ± 1822.794
2025-09-12 12:52:56,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2734.5369, 27.750154, 5143.5615, 3853.773, 2567.1895, 6282.157, 4190.1216, 4720.263, 6417.992, 4967.2227]
2025-09-12 12:52:56,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:52:56,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 9 hours, 9 minutes, 42 seconds)
2025-09-12 13:05:17,755 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:05:17,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:10:09,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 5815.49902 ± 2032.928
2025-09-12 13:10:09,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5941.4854, -23.766722, 6722.28, 5610.407, 7392.9736, 6288.09, 6765.7964, 6515.6953, 7303.5503, 5638.475]
2025-09-12 13:10:09,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:10:09,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (5815.50) for latency ExtremeClogL1U23
2025-09-12 13:10:09,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 52 minutes, 8 seconds)
2025-09-12 13:22:29,413 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:22:29,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:27:20,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4514.24316 ± 2247.977
2025-09-12 13:27:20,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5761.247, 2609.1511, 5021.765, 6779.31, 6490.897, 6645.3687, 3811.0024, 780.28534, 839.40063, 6404.005]
2025-09-12 13:27:20,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:27:20,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 8 hours, 35 minutes, 14 seconds)
2025-09-12 13:39:43,184 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:39:43,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:44:33,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 5002.87500 ± 2275.491
2025-09-12 13:44:33,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1319.0308, 6587.741, 7078.336, 6701.0273, 6353.4087, 1863.6953, 7329.382, 3630.4724, 6724.9556, 2440.6982]
2025-09-12 13:44:33,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:44:33,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 8 hours, 18 minutes, 22 seconds)
2025-09-12 13:56:55,984 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:56:55,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:01:45,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 5078.92969 ± 2356.372
2025-09-12 14:01:45,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [249.16115, 5810.2827, 6899.072, 6343.834, 6624.3877, 5808.4575, 7468.6177, 6537.2446, 3736.236, 1312.0092]
2025-09-12 14:01:45,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:01:45,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 8 hours, 1 minute, 21 seconds)
2025-09-12 14:14:06,181 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:14:06,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:18:59,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4483.43506 ± 1513.089
2025-09-12 14:18:59,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3469.9888, 3313.9226, 4368.8037, 3261.1697, 6562.0635, 6586.242, 3495.9324, 6767.428, 4539.0967, 2469.7048]
2025-09-12 14:18:59,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:18:59,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 44 minutes, 38 seconds)
2025-09-12 14:31:18,342 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:31:18,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:36:11,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 5271.75732 ± 1733.844
2025-09-12 14:36:11,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1979.2794, 6525.3037, 6077.1753, 3985.0195, 4059.2136, 6376.366, 7040.094, 6844.7554, 6759.0825, 3071.2856]
2025-09-12 14:36:11,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:36:11,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 7 hours, 27 minutes, 23 seconds)
2025-09-12 14:48:34,111 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:48:34,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:53:30,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3827.74292 ± 2480.273
2025-09-12 14:53:30,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [6794.486, 3282.6555, 499.23764, 4256.3223, 6930.728, 350.8405, 5710.7417, 2462.153, 6638.2656, 1351.9989]
2025-09-12 14:53:30,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:53:30,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 7 hours, 10 minutes, 49 seconds)
2025-09-12 15:05:52,221 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:05:52,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:10:45,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 5275.88135 ± 1273.911
2025-09-12 15:10:45,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2422.6584, 6041.279, 7051.5884, 4074.8752, 5489.7725, 6225.858, 5652.8477, 4593.183, 6356.812, 4849.9395]
2025-09-12 15:10:45,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:10:45,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 53 minutes, 43 seconds)
2025-09-12 15:23:08,321 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:23:08,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:27:56,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 5060.06396 ± 2001.189
2025-09-12 15:27:56,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5738.8022, 6311.774, 5376.9536, 6287.4272, 6255.3984, 5599.704, 1730.4017, 5949.6416, 594.6907, 6755.8467]
2025-09-12 15:27:56,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:27:56,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 36 minutes, 27 seconds)
2025-09-12 15:40:18,803 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:40:18,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:45:07,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4111.96387 ± 2136.647
2025-09-12 15:45:07,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2872.8384, 6229.2334, 6284.379, 6580.682, 122.664215, 6192.1597, 3816.1968, 1245.4581, 4182.98, 3593.0522]
2025-09-12 15:45:07,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:45:07,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 6 hours, 19 minutes)
2025-09-12 15:57:30,071 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:57:30,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:02:19,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 5076.57617 ± 2076.469
2025-09-12 16:02:19,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [7176.1587, 6310.91, 6683.775, 6063.9053, 1015.1119, 2324.9644, 2728.7932, 6016.948, 6741.0273, 5704.167]
2025-09-12 16:02:19,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:02:19,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 6 hours, 1 minute, 46 seconds)
2025-09-12 16:14:40,848 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:14:40,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:19:37,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4879.89111 ± 2348.385
2025-09-12 16:19:37,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5368.2437, 4909.3013, 6759.755, 7302.0894, 6711.893, 3296.3835, 6837.475, 230.5592, 1307.0027, 6076.2075]
2025-09-12 16:19:37,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:19:37,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 44 minutes, 26 seconds)
2025-09-12 16:32:00,977 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:32:00,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:36:51,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3431.83472 ± 2325.682
2025-09-12 16:36:51,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4138.3296, 1096.4678, 5026.321, 3218.5535, 1068.9557, 6713.9004, 5863.1655, 6027.1685, 976.4996, 188.9874]
2025-09-12 16:36:51,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:36:51,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 27 minutes, 10 seconds)
2025-09-12 16:49:12,266 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:49:12,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:54:03,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4877.83203 ± 2338.489
2025-09-12 16:54:03,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [7337.88, 7438.4546, 780.95404, 3446.5254, 6004.4673, 6390.151, 5814.7734, 1240.5244, 6808.1245, 3516.469]
2025-09-12 16:54:03,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:54:03,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 5 hours, 10 minutes, 1 second)
2025-09-12 17:06:26,335 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:06:26,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:11:19,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 5244.82715 ± 2516.473
2025-09-12 17:11:19,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [7293.712, 6848.3115, 5648.722, 6817.369, 510.30844, 6545.1357, 4986.0713, 300.97934, 7374.1943, 6123.47]
2025-09-12 17:11:19,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:11:19,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 53 minutes, 4 seconds)
2025-09-12 17:23:42,931 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:23:42,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:28:33,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3693.32739 ± 2748.503
2025-09-12 17:28:33,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [7142.244, 360.97382, 6396.0684, 1267.257, 6162.626, 780.8439, 406.5567, 5369.416, 2312.144, 6735.143]
2025-09-12 17:28:33,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:28:33,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 35 minutes, 56 seconds)
2025-09-12 17:40:56,869 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:40:56,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:45:54,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 5378.97656 ± 2795.898
2025-09-12 17:45:54,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [7792.6343, 7137.3794, 6500.5557, 7014.4185, 3754.5234, 7524.6123, 413.5333, 6392.9478, 7254.012, 5.149863]
2025-09-12 17:45:54,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:45:54,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 18 minutes, 50 seconds)
2025-09-12 17:58:18,391 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:58:18,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:03:08,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4299.94531 ± 2719.879
2025-09-12 18:03:08,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [272.36057, 1630.5592, 6481.652, 2224.9426, 6887.5835, 288.69232, 6482.1836, 7174.222, 4947.302, 6609.9575]
2025-09-12 18:03:08,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:03:08,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 4 hours, 1 minute, 35 seconds)
2025-09-12 18:15:33,770 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:15:33,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:20:25,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 5542.79980 ± 1932.081
2025-09-12 18:20:25,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3776.476, 5751.082, 7565.4624, 5997.4536, 7084.933, 7333.0557, 4293.529, 6579.2314, 6120.8926, 925.8775]
2025-09-12 18:20:25,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:20:25,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 44 minutes, 33 seconds)
2025-09-12 18:32:49,317 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:32:49,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:37:39,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3650.46533 ± 2709.298
2025-09-12 18:37:39,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [6362.5054, -24.020802, 1361.8138, 6726.8594, 345.25156, 6813.053, 6844.6836, 2221.0237, 4272.76, 1580.7208]
2025-09-12 18:37:39,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:37:39,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 27 minutes, 11 seconds)
2025-09-12 18:50:00,878 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:50:00,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:54:51,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 5710.55322 ± 1643.525
2025-09-12 18:54:51,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5661.734, 4171.618, 5577.553, 7387.518, 7534.3135, 3916.9739, 6851.6133, 2409.5156, 6202.7974, 7391.8994]
2025-09-12 18:54:51,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:54:51,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 9 minutes, 51 seconds)
2025-09-12 19:07:15,492 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:07:15,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:12:08,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4032.75391 ± 2389.371
2025-09-12 19:12:08,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [6534.249, 2174.1926, 1520.4519, 4063.489, 846.70123, 7494.15, 982.2805, 6052.1064, 6375.3604, 4284.5605]
2025-09-12 19:12:08,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:12:08,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 52 minutes, 28 seconds)
2025-09-12 19:24:32,248 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:24:32,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:29:29,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 5440.62744 ± 1743.278
2025-09-12 19:29:29,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5717.61, 3323.08, 1410.7655, 6350.1973, 6509.6597, 6966.7954, 7149.9463, 4845.6978, 5281.4688, 6851.055]
2025-09-12 19:29:29,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:29:29,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 35 minutes, 26 seconds)
2025-09-12 19:41:51,373 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:41:51,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:46:39,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3899.02930 ± 2472.578
2025-09-12 19:46:39,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [342.034, 967.713, 5444.5693, 1407.0035, 6362.6416, 6700.1743, 2468.8784, 2590.9329, 5465.806, 7240.5356]
2025-09-12 19:46:39,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:46:39,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 17 minutes, 57 seconds)
2025-09-12 19:59:01,456 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:59:01,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:03:50,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 6218.40088 ± 1838.433
2025-09-12 20:03:50,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1833.6371, 6687.7427, 3872.1267, 6763.808, 7347.52, 8229.731, 7093.062, 7340.609, 5787.778, 7227.9946]
2025-09-12 20:03:50,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:03:50,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (6218.40) for latency ExtremeClogL1U23
2025-09-12 20:03:50,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 2 hours, 39 seconds)
2025-09-12 20:16:13,230 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:16:13,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:21:06,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4385.51465 ± 2343.496
2025-09-12 20:21:06,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [6881.7427, 2365.9998, 607.82697, 4516.04, 6741.791, 6548.205, 3534.8647, 3002.9773, 1906.6707, 7749.027]
2025-09-12 20:21:06,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:21:06,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 43 minutes, 29 seconds)
2025-09-12 20:33:29,930 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:33:29,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:38:20,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 6510.72754 ± 1251.875
2025-09-12 20:38:20,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [7332.4, 7309.7236, 7199.575, 5899.131, 7263.424, 6806.0063, 7733.561, 5412.1445, 3338.5613, 6812.754]
2025-09-12 20:38:20,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:38:20,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (6510.73) for latency ExtremeClogL1U23
2025-09-12 20:38:20,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 26 minutes, 12 seconds)
2025-09-12 20:50:44,064 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:50:44,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:55:36,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4754.86426 ± 1808.284
2025-09-12 20:55:36,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [6978.172, 6682.658, 2461.8018, 5182.8047, 6654.7295, 3810.1738, 2638.089, 6601.577, 4215.7476, 2322.8896]
2025-09-12 20:55:36,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:55:37,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 8 minutes, 53 seconds)
2025-09-12 21:08:02,070 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:08:02,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:13:00,673 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 5106.06348 ± 2448.352
2025-09-12 21:13:00,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [8189.9863, 6492.1333, 4250.07, 6339.6206, 7111.136, 1437.8438, 6327.197, 4039.6658, 6644.1895, 228.79219]
2025-09-12 21:13:00,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:13:00,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 51 minutes, 48 seconds)
2025-09-12 21:25:25,334 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:25:25,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:30:19,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 5809.12061 ± 2323.070
2025-09-12 21:30:19,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [464.52982, 8008.0234, 7470.9146, 6995.497, 5430.451, 7454.267, 7553.212, 2645.4758, 5589.583, 6479.253]
2025-09-12 21:30:19,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:30:19,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 34 minutes, 35 seconds)
2025-09-12 21:42:42,396 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:42:42,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:47:32,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 6561.50537 ± 921.228
2025-09-12 21:47:32,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [7025.2905, 6948.883, 6315.955, 7205.4185, 7483.3003, 6181.763, 6807.9385, 4058.7043, 7125.4683, 6462.332]
2025-09-12 21:47:32,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:47:32,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (6561.51) for latency ExtremeClogL1U23
2025-09-12 21:47:32,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 17 minutes, 17 seconds)
2025-09-12 21:59:56,365 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:59:56,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 22:04:49,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 5044.26758 ± 2059.562
2025-09-12 22:04:49,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [7408.673, 286.38998, 3009.1956, 6013.028, 4868.653, 6848.5146, 4093.8003, 6316.0, 6839.2617, 4759.1577]
2025-09-12 22:04:49,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:04:49,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1251 [DEBUG]: Training session finished
