2025-09-12 22:05:52,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc25-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay
2025-09-12 22:05:52,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc25-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay
2025-09-12 22:05:52,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x151fa26af290>}
2025-09-12 22:05:52,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1111 [DEBUG]: using device: cuda
2025-09-12 22:05:52,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-12 22:05:52,322 baseline-mbpac-noiseperc25-halfcheetah:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-12 22:05:52,322 baseline-mbpac-noiseperc25-halfcheetah:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 22:05:52,329 baseline-mbpac-noiseperc25-halfcheetah:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-12 22:05:53,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-12 22:05:53,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-12 22:17:16,408 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:17:16,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:22:19,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -291.88586 ± 26.360
2025-09-12 22:22:19,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-260.3357, -335.82635, -270.5606, -294.583, -264.72897, -297.5707, -319.98944, -317.96976, -300.66946, -256.62466]
2025-09-12 22:22:19,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:22:19,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (-291.89) for latency ExtremeSparseL4U32
2025-09-12 22:22:19,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 27 hours, 6 minutes, 14 seconds)
2025-09-12 22:33:06,948 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:33:06,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:38:08,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -233.85605 ± 30.167
2025-09-12 22:38:08,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-205.21016, -230.09583, -210.9352, -253.54271, -246.71146, -251.28815, -225.6718, -276.4106, -171.46817, -267.22626]
2025-09-12 22:38:08,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:38:08,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (-233.86) for latency ExtremeSparseL4U32
2025-09-12 22:38:08,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 26 hours, 20 minutes, 13 seconds)
2025-09-12 22:48:57,047 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:48:57,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:54:01,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -134.00430 ± 43.788
2025-09-12 22:54:01,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-56.162247, -103.257385, -177.73729, -149.43486, -125.29069, -205.97485, -82.73797, -174.84903, -117.12368, -147.47514]
2025-09-12 22:54:01,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:54:01,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (-134.00) for latency ExtremeSparseL4U32
2025-09-12 22:54:01,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 25 hours, 56 minutes, 7 seconds)
2025-09-12 23:04:50,296 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:04:50,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:10:01,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -11.67390 ± 42.584
2025-09-12 23:10:01,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [34.5265, -55.339767, -68.49776, -7.2118163, 76.69628, -32.225224, 13.094078, -27.555077, -51.435585, 1.2093731]
2025-09-12 23:10:01,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:10:01,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (-11.67) for latency ExtremeSparseL4U32
2025-09-12 23:10:01,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 25 hours, 39 minutes, 2 seconds)
2025-09-12 23:20:53,473 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:20:53,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:26:04,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 64.15028 ± 52.986
2025-09-12 23:26:04,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [13.515778, 156.80603, 63.16095, 72.3433, 46.846264, 68.31951, 119.96227, 114.96458, 10.957731, -25.37356]
2025-09-12 23:26:04,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:26:04,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (64.15) for latency ExtremeSparseL4U32
2025-09-12 23:26:04,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 25 hours, 23 minutes, 36 seconds)
2025-09-12 23:36:55,414 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:36:55,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:41:57,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 93.48801 ± 59.089
2025-09-12 23:41:57,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [89.33732, -28.97189, 158.48088, 132.56653, 159.98494, 74.78927, 122.506546, 139.25711, 70.27528, 16.654156]
2025-09-12 23:41:57,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:41:57,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (93.49) for latency ExtremeSparseL4U32
2025-09-12 23:41:57,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 24 hours, 57 minutes, 11 seconds)
2025-09-12 23:52:49,303 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:52:49,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:58:01,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 456.08160 ± 61.290
2025-09-12 23:58:01,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [437.84918, 496.44012, 393.88635, 384.2142, 406.1771, 442.5204, 511.16397, 467.6127, 424.42087, 596.531]
2025-09-12 23:58:01,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:58:01,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (456.08) for latency ExtremeSparseL4U32
2025-09-12 23:58:01,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 24 hours, 45 minutes, 52 seconds)
2025-09-13 00:08:52,230 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:08:52,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:13:55,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 432.92276 ± 242.928
2025-09-13 00:13:55,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [191.66336, 696.4912, 460.08282, 367.04697, 856.85284, 490.1594, 119.66185, 330.4783, 695.2447, 121.54648]
2025-09-13 00:13:55,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:13:55,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 24 hours, 30 minutes, 16 seconds)
2025-09-13 00:24:44,447 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:24:44,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:29:54,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 763.34448 ± 181.186
2025-09-13 00:29:54,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [836.0954, 960.466, 874.93854, 805.319, 728.0417, 875.385, 271.96213, 742.2012, 685.43353, 853.6021]
2025-09-13 00:29:54,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:29:54,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (763.34) for latency ExtremeSparseL4U32
2025-09-13 00:29:54,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 24 hours, 13 minutes, 59 seconds)
2025-09-13 00:40:45,119 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:40:45,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:45:49,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 356.85248 ± 148.897
2025-09-13 00:45:49,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [389.03748, 413.0204, 223.44186, 307.07587, 744.88763, 277.6928, 223.9907, 237.11977, 314.212, 438.04636]
2025-09-13 00:45:49,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:45:49,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 23 hours, 55 minutes, 14 seconds)
2025-09-13 00:56:41,857 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:56:41,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:01:55,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 567.98846 ± 285.651
2025-09-13 01:01:55,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [454.4124, 414.63126, 200.40546, 798.2934, 1037.0707, 856.90283, 289.55753, 914.7303, 379.7934, 334.08734]
2025-09-13 01:01:55,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:01:55,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 23 hours, 43 minutes, 28 seconds)
2025-09-13 01:12:46,434 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:12:46,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:17:51,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 723.80750 ± 186.421
2025-09-13 01:17:51,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [467.79865, 381.43857, 731.4579, 841.4396, 563.9508, 948.1197, 934.03094, 878.12585, 800.74133, 690.9719]
2025-09-13 01:17:51,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:17:51,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 23 hours, 24 minutes, 58 seconds)
2025-09-13 01:28:41,233 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:28:41,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:33:49,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 805.09564 ± 265.529
2025-09-13 01:33:49,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1026.3601, 671.44403, 1066.4271, 694.41705, 290.81427, 796.32007, 1077.0587, 1078.0035, 439.61636, 910.4955]
2025-09-13 01:33:49,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:33:49,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (805.10) for latency ExtremeSparseL4U32
2025-09-13 01:33:49,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 23 hours, 10 minutes, 14 seconds)
2025-09-13 01:44:41,335 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:44:41,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:49:44,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 706.93719 ± 257.161
2025-09-13 01:49:44,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [856.95807, 809.7247, 701.025, 1054.0396, 928.95184, 222.80801, 441.15637, 699.74005, 952.83514, 402.1339]
2025-09-13 01:49:44,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:49:44,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 22 hours, 53 minutes, 2 seconds)
2025-09-13 02:00:35,802 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:00:35,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:05:43,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 824.38367 ± 247.882
2025-09-13 02:05:43,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1077.4684, 1006.67755, 579.0622, 1162.0183, 997.81366, 968.0384, 695.5823, 426.83014, 848.12354, 482.22205]
2025-09-13 02:05:43,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:05:43,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (824.38) for latency ExtremeSparseL4U32
2025-09-13 02:05:43,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 22 hours, 38 minutes, 32 seconds)
2025-09-13 02:16:35,051 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:16:35,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:21:49,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 900.94470 ± 226.611
2025-09-13 02:21:49,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1180.415, 852.7966, 971.3281, 833.3745, 853.96014, 1269.2567, 831.3601, 372.7705, 944.67395, 899.5118]
2025-09-13 02:21:49,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:21:49,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (900.94) for latency ExtremeSparseL4U32
2025-09-13 02:21:49,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 22 hours, 22 minutes, 13 seconds)
2025-09-13 02:32:41,218 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:32:41,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:37:43,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 988.51086 ± 149.249
2025-09-13 02:37:43,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [910.5704, 966.00354, 885.94464, 1017.1374, 1174.6522, 1081.0048, 714.31244, 1149.1559, 1167.2219, 819.10535]
2025-09-13 02:37:43,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:37:43,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (988.51) for latency ExtremeSparseL4U32
2025-09-13 02:37:43,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 22 hours, 5 minutes, 48 seconds)
2025-09-13 02:48:38,396 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:48:38,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:53:44,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1077.79419 ± 122.774
2025-09-13 02:53:44,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [978.7049, 1177.396, 1177.968, 1264.1488, 1139.9744, 826.6944, 1039.6373, 1136.1482, 1079.6904, 957.5803]
2025-09-13 02:53:44,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:53:44,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1077.79) for latency ExtremeSparseL4U32
2025-09-13 02:53:44,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 21 hours, 50 minutes, 37 seconds)
2025-09-13 03:04:43,094 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:04:43,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:09:42,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 972.00183 ± 218.550
2025-09-13 03:09:42,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1143.2428, 1038.4635, 1110.2499, 910.44617, 1009.4192, 878.6677, 939.9585, 1119.2391, 386.1869, 1184.1449]
2025-09-13 03:09:42,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:09:42,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 21 hours, 35 minutes, 34 seconds)
2025-09-13 03:20:43,425 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:20:43,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:25:49,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1021.53839 ± 266.136
2025-09-13 03:25:49,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [997.8666, 1243.63, 1210.7765, 1102.7678, 1210.6785, 378.57822, 1066.6582, 1016.7858, 1279.4958, 708.1451]
2025-09-13 03:25:49,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:25:49,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 21 hours, 21 minutes, 28 seconds)
2025-09-13 03:36:51,005 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:36:51,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:42:00,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1045.01843 ± 303.215
2025-09-13 03:42:00,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1122.0549, 451.6152, 1119.0143, 509.54535, 1086.1174, 1431.0406, 1244.985, 1134.5458, 1040.9077, 1310.3585]
2025-09-13 03:42:00,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:42:00,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 21 hours, 6 minutes, 59 seconds)
2025-09-13 03:53:05,566 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:53:05,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:58:09,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1086.75171 ± 345.403
2025-09-13 03:58:09,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1243.7002, 1130.4631, 1253.1692, 1136.3024, 1418.2919, 1183.8799, 1166.1345, 1162.7202, 1088.5571, 84.29899]
2025-09-13 03:58:09,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:58:09,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1086.75) for latency ExtremeSparseL4U32
2025-09-13 03:58:09,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 20 hours, 54 minutes, 48 seconds)
2025-09-13 04:08:52,571 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:08:52,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:13:55,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1035.76636 ± 218.365
2025-09-13 04:13:55,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1082.0162, 1414.572, 941.12616, 755.3607, 1160.1542, 1109.1686, 585.98096, 1063.471, 1170.3193, 1075.4954]
2025-09-13 04:13:55,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:13:55,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 20 hours, 34 minutes, 47 seconds)
2025-09-13 04:24:33,478 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:24:33,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:29:34,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1156.43530 ± 175.184
2025-09-13 04:29:34,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1287.3853, 1450.484, 1207.1603, 971.7388, 785.2377, 1269.0913, 1109.8491, 1079.1079, 1245.6718, 1158.6257]
2025-09-13 04:29:34,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:29:34,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1156.44) for latency ExtremeSparseL4U32
2025-09-13 04:29:34,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 20 hours, 13 minutes, 51 seconds)
2025-09-13 04:40:11,820 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:40:11,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:45:11,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1197.57935 ± 157.026
2025-09-13 04:45:11,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1503.4479, 995.4777, 1376.8153, 1121.1621, 1275.6521, 993.3688, 1137.9978, 1212.822, 1074.706, 1284.3439]
2025-09-13 04:45:11,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:45:11,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1197.58) for latency ExtremeSparseL4U32
2025-09-13 04:45:11,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 19 hours, 50 minutes, 34 seconds)
2025-09-13 04:55:49,528 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:55:49,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:00:44,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1198.91260 ± 221.338
2025-09-13 05:00:44,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1311.9645, 1097.9423, 1253.9275, 1621.4067, 1088.7308, 1044.5493, 733.6948, 1368.508, 1263.8026, 1204.599]
2025-09-13 05:00:44,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:00:44,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1198.91) for latency ExtremeSparseL4U32
2025-09-13 05:00:44,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 19 hours, 25 minutes, 12 seconds)
2025-09-13 05:11:21,565 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:11:21,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:16:23,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1181.39136 ± 171.384
2025-09-13 05:16:23,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1226.9161, 1020.6987, 1087.1007, 1025.0693, 1387.9915, 1329.9221, 1377.0776, 1355.885, 1132.7478, 870.5052]
2025-09-13 05:16:23,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:16:23,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 19 hours, 2 minutes, 11 seconds)
2025-09-13 05:27:01,221 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:27:01,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:32:02,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1332.11401 ± 154.238
2025-09-13 05:32:02,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1133.8474, 1499.9703, 1482.7559, 1312.0555, 1158.4603, 1177.3967, 1227.441, 1604.1428, 1419.1581, 1305.9121]
2025-09-13 05:32:02,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:32:02,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1332.11) for latency ExtremeSparseL4U32
2025-09-13 05:32:02,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 18 hours, 44 minutes, 49 seconds)
2025-09-13 05:42:40,347 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:42:40,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:47:37,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1210.39258 ± 355.570
2025-09-13 05:47:37,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1299.2517, 1202.9663, 1314.2908, 242.87453, 1156.9572, 1239.0719, 1193.1884, 1644.5507, 1548.4744, 1262.2996]
2025-09-13 05:47:37,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:47:37,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 18 hours, 28 minutes, 21 seconds)
2025-09-13 05:58:15,957 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:58:15,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:03:19,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1132.15356 ± 350.570
2025-09-13 06:03:19,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [881.35504, 1351.2626, 1367.0404, 1068.3916, 1463.279, 1122.3578, 1251.3992, 271.6649, 993.9052, 1550.8796]
2025-09-13 06:03:19,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:03:19,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 18 hours, 13 minutes, 46 seconds)
2025-09-13 06:13:57,489 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:13:57,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:18:59,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1159.24719 ± 298.915
2025-09-13 06:18:59,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1207.1589, 1165.9781, 1289.4426, 1331.8148, 1374.9204, 1155.2731, 1261.6791, 1244.7528, 1277.4862, 283.96536]
2025-09-13 06:18:59,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:18:59,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 17 hours, 59 minutes, 50 seconds)
2025-09-13 06:29:37,101 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:29:37,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:34:39,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1171.29712 ± 454.484
2025-09-13 06:34:39,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1088.4899, 1473.5115, 385.1351, 1639.8915, 1337.7876, 234.65462, 1337.0298, 1304.2172, 1363.0464, 1549.2065]
2025-09-13 06:34:39,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:34:39,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 17 hours, 44 minutes, 29 seconds)
2025-09-13 06:45:17,227 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:45:17,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:50:17,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1325.60034 ± 151.064
2025-09-13 06:50:17,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1267.655, 1557.5464, 1011.7566, 1377.109, 1417.5221, 1152.847, 1363.5765, 1376.8843, 1475.8682, 1255.2382]
2025-09-13 06:50:17,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:50:17,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 17 hours, 28 minutes, 40 seconds)
2025-09-13 07:00:56,750 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:00:56,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:05:56,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1389.16064 ± 213.807
2025-09-13 07:05:56,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1619.677, 1613.1362, 1196.6394, 1503.2665, 1500.8964, 1549.3704, 1061.4805, 1337.5835, 1010.40063, 1499.1571]
2025-09-13 07:05:56,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 07:05:56,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1389.16) for latency ExtremeSparseL4U32
2025-09-13 07:05:56,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 17 hours, 13 minutes, 48 seconds)
2025-09-13 07:16:35,931 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:16:35,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:21:30,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1382.20361 ± 325.922
2025-09-13 07:21:30,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1628.6979, 518.0552, 1541.5293, 1094.7175, 1514.3014, 1370.7727, 1429.5586, 1549.2078, 1673.0986, 1502.0963]
2025-09-13 07:21:30,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 07:21:30,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 16 hours, 56 minutes, 28 seconds)
2025-09-13 07:32:09,831 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:32:09,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:37:05,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1379.39355 ± 165.439
2025-09-13 07:37:05,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1322.1329, 1522.3433, 1546.1757, 1495.8477, 1289.0214, 1390.5022, 958.5918, 1378.1663, 1530.6725, 1360.4818]
2025-09-13 07:37:05,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 07:37:05,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 16 hours, 39 minutes, 45 seconds)
2025-09-13 07:47:43,342 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:47:43,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:52:43,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1312.55042 ± 239.042
2025-09-13 07:52:43,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1296.4857, 686.0955, 1594.8651, 1346.7324, 1437.8025, 1165.07, 1377.5111, 1493.5662, 1459.0349, 1268.3396]
2025-09-13 07:52:43,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 07:52:43,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 16 hours, 23 minutes, 32 seconds)
2025-09-13 08:03:22,355 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:03:22,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:08:16,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1365.08142 ± 353.818
2025-09-13 08:08:16,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1185.4003, 1544.7375, 1608.5638, 1616.7521, 1542.1584, 1421.9785, 1593.166, 1304.356, 1451.902, 381.79907]
2025-09-13 08:08:16,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:08:16,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 16 hours, 6 minutes, 55 seconds)
2025-09-13 08:18:57,943 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:18:57,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:23:52,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1152.02332 ± 340.414
2025-09-13 08:23:52,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1229.1608, 1126.8018, 1364.3108, 244.33871, 1245.9569, 1403.3978, 1477.2327, 1391.8499, 939.9207, 1097.2633]
2025-09-13 08:23:52,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:23:52,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 15 hours, 50 minutes, 40 seconds)
2025-09-13 08:34:28,614 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:34:28,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:39:24,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1353.33325 ± 164.008
2025-09-13 08:39:24,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1041.5067, 1337.3384, 1305.0225, 1441.6261, 1242.8225, 1175.4307, 1342.0352, 1558.7324, 1577.3107, 1511.5076]
2025-09-13 08:39:24,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:39:24,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 34 minutes, 46 seconds)
2025-09-13 08:50:03,418 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:50:03,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:55:07,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1229.35327 ± 306.337
2025-09-13 08:55:07,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [828.9134, 1018.9605, 1334.5664, 1614.2345, 1118.3362, 1214.5454, 753.2002, 1176.3855, 1491.3137, 1743.076]
2025-09-13 08:55:07,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:55:07,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 15 hours, 20 minutes, 39 seconds)
2025-09-13 09:05:46,202 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:05:46,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:10:42,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1175.80652 ± 505.183
2025-09-13 09:10:42,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1716.535, 975.4299, 283.30264, 779.6451, 414.202, 1506.4418, 1254.7261, 1565.214, 1587.0732, 1675.4962]
2025-09-13 09:10:42,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 09:10:42,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 15 hours, 4 minutes, 43 seconds)
2025-09-13 09:21:21,775 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:21:21,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:26:21,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1456.84875 ± 355.819
2025-09-13 09:26:21,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1559.5844, 480.5899, 1310.9343, 1398.5123, 1471.2417, 1556.2079, 1691.2148, 1710.8483, 1827.1371, 1562.2167]
2025-09-13 09:26:21,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 09:26:21,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1456.85) for latency ExtremeSparseL4U32
2025-09-13 09:26:21,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 14 hours, 50 minutes, 8 seconds)
2025-09-13 09:36:58,377 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:36:58,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:41:54,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1229.14209 ± 463.463
2025-09-13 09:41:54,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1551.3618, 1682.2917, 1644.107, 1325.6619, 1605.9221, 726.24164, 1573.7292, 719.0372, 1172.1307, 290.9364]
2025-09-13 09:41:54,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 09:41:54,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 14 hours, 34 minutes, 1 second)
2025-09-13 09:52:34,402 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:52:34,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:57:30,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1256.34961 ± 377.910
2025-09-13 09:57:30,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1156.2926, 793.8136, 1305.1704, 1551.8514, 1479.9052, 1568.3755, 359.50735, 1487.2737, 1586.7837, 1274.5231]
2025-09-13 09:57:30,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 09:57:30,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 14 hours, 19 minutes, 1 second)
2025-09-13 10:08:12,668 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:08:12,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:13:13,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1421.22717 ± 126.844
2025-09-13 10:13:13,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1371.5299, 1287.7772, 1329.6592, 1504.0424, 1622.1604, 1257.9441, 1275.6674, 1455.9532, 1533.6587, 1573.8792]
2025-09-13 10:13:13,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:13:13,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 14 hours, 3 minutes, 30 seconds)
2025-09-13 10:23:52,458 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:23:52,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:28:46,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1409.87683 ± 293.785
2025-09-13 10:28:47,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1188.2455, 1466.6812, 1725.7977, 705.4018, 1287.5006, 1559.1791, 1417.0693, 1366.6844, 1784.1064, 1598.1022]
2025-09-13 10:28:47,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:28:47,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 13 hours, 47 minutes, 31 seconds)
2025-09-13 10:39:26,445 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:39:26,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:44:26,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1485.54211 ± 398.209
2025-09-13 10:44:26,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1652.8054, 1553.2297, 1656.4888, 1653.0236, 327.07037, 1438.6405, 1635.9686, 1461.3438, 1722.6099, 1754.2408]
2025-09-13 10:44:26,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:44:26,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1485.54) for latency ExtremeSparseL4U32
2025-09-13 10:44:26,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 13 hours, 32 minutes, 10 seconds)
2025-09-13 10:55:06,860 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:55:06,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:00:07,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1595.24573 ± 84.876
2025-09-13 11:00:07,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1725.0674, 1489.1101, 1623.8239, 1628.7078, 1532.4613, 1665.8253, 1656.9635, 1454.105, 1656.1345, 1520.2582]
2025-09-13 11:00:07,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:00:07,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1595.25) for latency ExtremeSparseL4U32
2025-09-13 11:00:07,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 13 hours, 17 minutes, 52 seconds)
2025-09-13 11:10:46,262 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:10:46,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:15:44,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1334.16162 ± 381.511
2025-09-13 11:15:44,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [401.5814, 1741.8263, 1488.5536, 1257.3538, 1250.245, 1342.1759, 1571.3762, 992.5954, 1567.657, 1728.2521]
2025-09-13 11:15:44,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:15:44,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 13 hours, 2 minutes, 18 seconds)
2025-09-13 11:26:23,935 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:26:23,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:31:23,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1370.61658 ± 427.055
2025-09-13 11:31:23,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1473.6394, 1631.4978, 1701.0712, 1694.684, 1717.2208, 1583.1606, 1478.01, 967.85455, 299.52524, 1159.5017]
2025-09-13 11:31:23,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:31:23,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 12 hours, 45 minutes, 59 seconds)
2025-09-13 11:42:01,078 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:42:01,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:47:03,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1224.99829 ± 487.697
2025-09-13 11:47:03,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1540.9861, 1145.3423, 1394.9247, 1541.8383, 1543.6202, 1672.135, 1143.3738, 364.4276, 1639.0205, 264.31335]
2025-09-13 11:47:03,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:47:03,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 31 minutes, 27 seconds)
2025-09-13 11:57:43,963 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:57:43,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:02:40,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1433.26196 ± 317.161
2025-09-13 12:02:40,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1813.7921, 1624.1959, 1546.0209, 1417.9923, 697.9292, 1377.0471, 1585.8756, 1539.2703, 1030.5475, 1699.9496]
2025-09-13 12:02:40,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 12:02:40,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 15 minutes, 22 seconds)
2025-09-13 12:13:19,763 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:13:19,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:18:16,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1400.74438 ± 466.383
2025-09-13 12:18:16,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1642.879, 1392.7675, 1685.6942, 1525.2802, 1666.1035, 754.68445, 1492.1531, 1845.9637, 291.8428, 1710.0765]
2025-09-13 12:18:16,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 12:18:16,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 11 hours, 58 minutes, 56 seconds)
2025-09-13 12:28:54,877 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:28:54,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:33:51,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1280.51929 ± 519.335
2025-09-13 12:33:51,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1584.8225, 210.47333, 454.38138, 1253.8187, 1272.6132, 2021.4999, 1623.6412, 1403.5652, 1433.7654, 1546.6116]
2025-09-13 12:33:51,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 12:33:51,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 11 hours, 43 minutes, 7 seconds)
2025-09-13 12:44:31,137 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:44:31,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:49:30,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1519.57080 ± 173.477
2025-09-13 12:49:30,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1770.3903, 1384.1182, 1328.7513, 1339.8242, 1391.4568, 1577.1833, 1332.5001, 1660.5994, 1765.6481, 1645.2356]
2025-09-13 12:49:30,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 12:49:30,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 11 hours, 27 minutes, 26 seconds)
2025-09-13 13:00:09,180 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:00:09,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:05:13,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1594.29578 ± 275.290
2025-09-13 13:05:13,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1702.9634, 1905.1637, 1702.5878, 1720.0134, 1626.51, 868.4985, 1744.9955, 1394.9703, 1756.2657, 1520.9889]
2025-09-13 13:05:13,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 13:05:13,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 11 hours, 12 minutes, 9 seconds)
2025-09-13 13:15:53,055 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:15:53,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:20:54,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1500.46021 ± 267.425
2025-09-13 13:20:54,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1588.5123, 1531.6018, 1584.5231, 1525.7985, 1843.134, 1612.9995, 1767.1003, 821.8131, 1369.27, 1359.8507]
2025-09-13 13:20:54,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 13:20:54,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 10 hours, 57 minutes, 7 seconds)
2025-09-13 13:31:32,227 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:31:32,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:36:27,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1338.71301 ± 374.811
2025-09-13 13:36:27,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1649.8241, 1704.2762, 651.9599, 1658.612, 1349.8474, 1252.886, 1705.3998, 1303.9344, 1440.2382, 670.1514]
2025-09-13 13:36:27,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 13:36:27,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 10 hours, 41 minutes, 5 seconds)
2025-09-13 13:47:06,369 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:47:06,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:52:08,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1525.84216 ± 284.982
2025-09-13 13:52:08,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1427.836, 1706.565, 1698.9283, 757.8701, 1395.7986, 1729.7495, 1459.1447, 1718.9379, 1656.5428, 1707.0486]
2025-09-13 13:52:08,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 13:52:08,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 10 hours, 26 minutes, 15 seconds)
2025-09-13 14:02:47,930 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:02:47,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:07:43,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1601.44556 ± 157.718
2025-09-13 14:07:43,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1623.5957, 1726.1113, 1631.5001, 1383.7041, 1760.3623, 1387.5576, 1574.8674, 1529.4883, 1909.1405, 1488.128]
2025-09-13 14:07:43,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 14:07:43,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1601.45) for latency ExtremeSparseL4U32
2025-09-13 14:07:43,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 10 hours, 10 minutes, 8 seconds)
2025-09-13 14:18:23,606 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:18:23,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:23:20,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1499.40979 ± 142.842
2025-09-13 14:23:20,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1454.9419, 1691.3608, 1515.5209, 1531.7518, 1413.0757, 1270.4255, 1491.1691, 1651.2671, 1286.5411, 1688.0435]
2025-09-13 14:23:20,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 14:23:20,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 9 hours, 53 minutes, 44 seconds)
2025-09-13 14:34:01,600 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:34:01,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:38:57,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1501.51538 ± 478.134
2025-09-13 14:38:57,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1774.4775, 1748.1768, 801.86475, 1582.9481, 385.3961, 1895.7983, 1489.9745, 1824.219, 1854.8948, 1657.403]
2025-09-13 14:38:57,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 14:38:57,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 9 hours, 37 minutes, 32 seconds)
2025-09-13 14:49:35,666 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:49:35,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:54:34,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1501.02026 ± 339.853
2025-09-13 14:54:34,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1530.855, 1422.6134, 1352.2222, 1712.1407, 1691.0304, 1550.7375, 1733.3408, 575.48145, 1587.8628, 1853.9169]
2025-09-13 14:54:34,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 14:54:34,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 9 hours, 22 minutes, 24 seconds)
2025-09-13 15:05:15,414 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:05:15,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:10:10,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1600.17676 ± 175.133
2025-09-13 15:10:10,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1685.1727, 1643.048, 1726.2854, 1562.6443, 1785.3259, 1660.2004, 1632.8145, 1618.0082, 1109.655, 1578.6128]
2025-09-13 15:10:10,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 15:10:10,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 9 hours, 6 minutes, 10 seconds)
2025-09-13 15:20:47,876 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:20:47,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:25:43,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1562.41089 ± 363.283
2025-09-13 15:25:43,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1835.5247, 1549.4883, 1706.3505, 857.9702, 1795.7957, 1666.6376, 923.45856, 1990.3169, 1481.5848, 1816.9801]
2025-09-13 15:25:43,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 15:25:43,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 8 hours, 50 minutes, 24 seconds)
2025-09-13 15:36:22,817 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:36:22,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:41:19,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1551.18713 ± 381.670
2025-09-13 15:41:19,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1839.911, 1531.7957, 1787.0687, 1558.5607, 449.30374, 1778.6519, 1743.5874, 1551.237, 1640.3315, 1631.4241]
2025-09-13 15:41:19,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 15:41:19,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 34 minutes, 42 seconds)
2025-09-13 15:51:59,143 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:51:59,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:56:53,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1598.39783 ± 88.588
2025-09-13 15:56:53,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1733.4913, 1596.8949, 1578.7562, 1451.9009, 1568.1033, 1637.0707, 1618.3652, 1541.938, 1505.1584, 1752.2999]
2025-09-13 15:56:53,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 15:56:53,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 18 minutes, 49 seconds)
2025-09-13 16:07:31,414 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:07:31,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:12:29,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1292.71802 ± 501.068
2025-09-13 16:12:29,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1570.8025, 389.00845, 1714.2272, 1440.3137, 1537.5756, 1183.2812, 279.23447, 1681.1722, 1660.9473, 1470.6173]
2025-09-13 16:12:29,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 16:12:29,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 3 minutes, 6 seconds)
2025-09-13 16:23:09,177 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:23:09,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:28:10,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1263.54370 ± 547.988
2025-09-13 16:28:10,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1618.1399, 1571.5558, 166.4789, 992.8566, 1057.2606, 1654.6907, 1679.0077, 1682.6385, 431.01822, 1781.789]
2025-09-13 16:28:10,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 16:28:10,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 47 minutes, 59 seconds)
2025-09-13 16:38:49,223 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:38:49,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:43:46,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1683.69800 ± 160.909
2025-09-13 16:43:46,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1660.6312, 1879.1776, 1840.8146, 1416.3346, 1828.4521, 1441.84, 1661.3137, 1533.8087, 1779.6418, 1794.9663]
2025-09-13 16:43:46,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 16:43:46,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1683.70) for latency ExtremeSparseL4U32
2025-09-13 16:43:46,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 32 minutes, 38 seconds)
2025-09-13 16:54:24,538 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:54:24,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:59:28,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1105.31909 ± 473.162
2025-09-13 16:59:28,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1391.6904, 1486.809, 1778.4907, 694.55194, 815.20386, 1484.4166, 239.74274, 649.4762, 1538.2222, 974.58765]
2025-09-13 16:59:28,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 16:59:28,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 17 minutes, 37 seconds)
2025-09-13 17:10:08,374 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:10:08,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:15:03,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1658.07690 ± 103.031
2025-09-13 17:15:03,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1689.8402, 1797.2699, 1617.9646, 1632.2688, 1728.6873, 1554.1702, 1740.1984, 1529.1428, 1497.2319, 1793.9932]
2025-09-13 17:15:03,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 17:15:03,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 2 minutes, 4 seconds)
2025-09-13 17:25:41,368 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:25:41,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:30:35,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1230.21899 ± 545.995
2025-09-13 17:30:35,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1529.6024, 741.6673, 1826.7557, 1246.5383, 131.16138, 1583.9655, 1536.1322, 1637.715, 474.3185, 1594.3348]
2025-09-13 17:30:35,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 17:30:35,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 46 minutes, 6 seconds)
2025-09-13 17:41:13,965 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:41:13,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:46:17,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1493.89575 ± 290.640
2025-09-13 17:46:17,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1501.5698, 1193.9681, 1598.4753, 1075.5911, 1898.6515, 1870.6614, 1851.9127, 1221.5665, 1248.0674, 1478.4943]
2025-09-13 17:46:17,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 17:46:17,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 30 minutes, 38 seconds)
2025-09-13 17:56:57,364 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:56:57,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:02:01,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1470.69885 ± 427.136
2025-09-13 18:02:01,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1591.7074, 1820.1317, 1935.0076, 1487.6415, 1613.0708, 1678.3722, 590.4025, 750.4771, 1807.3883, 1432.789]
2025-09-13 18:02:01,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 18:02:01,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 15 minutes, 37 seconds)
2025-09-13 18:12:40,719 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:12:40,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:17:35,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1618.20081 ± 100.124
2025-09-13 18:17:35,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1651.0823, 1574.1514, 1503.7236, 1652.1198, 1654.5483, 1582.5405, 1534.1515, 1529.9321, 1874.0635, 1625.6938]
2025-09-13 18:17:35,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 18:17:35,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 59 minutes, 20 seconds)
2025-09-13 18:28:14,509 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:28:14,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:33:10,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1685.62366 ± 168.068
2025-09-13 18:33:10,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1709.452, 1514.5273, 1623.8793, 1745.5912, 1609.7343, 1793.2859, 1780.6859, 1908.4531, 1312.3175, 1858.3092]
2025-09-13 18:33:10,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 18:33:10,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1685.62) for latency ExtremeSparseL4U32
2025-09-13 18:33:10,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 43 minutes, 43 seconds)
2025-09-13 18:43:50,895 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:43:50,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:48:48,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1547.65002 ± 444.215
2025-09-13 18:48:48,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1397.4343, 1616.6053, 1746.0859, 1649.3586, 1457.0493, 1806.7235, 1661.5907, 2029.6497, 317.02866, 1794.9738]
2025-09-13 18:48:48,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 18:48:48,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 28 minutes, 31 seconds)
2025-09-13 18:59:25,427 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:59:25,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:04:25,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1499.68372 ± 451.596
2025-09-13 19:04:25,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1591.8671, 1522.7247, 1806.7072, 882.002, 1564.5769, 1721.2983, 417.90637, 1783.0288, 1869.731, 1836.9951]
2025-09-13 19:04:25,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:04:25,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 12 minutes, 31 seconds)
2025-09-13 19:15:05,404 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:15:05,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:20:02,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1402.47009 ± 477.178
2025-09-13 19:20:02,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [434.99954, 1676.6259, 575.2565, 1653.4268, 1882.4149, 1351.1979, 1340.1001, 1761.6306, 1722.8757, 1626.1729]
2025-09-13 19:20:02,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:20:02,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 56 minutes, 29 seconds)
2025-09-13 19:30:40,947 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:30:40,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:35:36,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1745.89551 ± 215.029
2025-09-13 19:35:36,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1943.965, 2038.6205, 1839.5142, 1652.7347, 1989.4115, 1341.8116, 1492.9905, 1864.4585, 1655.9452, 1639.5038]
2025-09-13 19:35:36,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:35:36,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1745.90) for latency ExtremeSparseL4U32
2025-09-13 19:35:36,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 40 minutes, 51 seconds)
2025-09-13 19:46:16,473 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:46:16,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:51:13,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1589.78381 ± 254.411
2025-09-13 19:51:13,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1823.1368, 1346.3363, 1485.8671, 1903.8746, 1696.4615, 1452.7402, 1610.3591, 1800.5933, 1022.755, 1755.7136]
2025-09-13 19:51:13,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:51:13,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 25 minutes, 22 seconds)
2025-09-13 20:01:53,739 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:01:53,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:06:55,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1740.51819 ± 161.302
2025-09-13 20:06:55,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1756.9061, 1584.9125, 1673.3363, 1978.6606, 1410.7762, 1869.5077, 1875.1102, 1711.106, 1890.9487, 1653.918]
2025-09-13 20:06:55,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 20:06:55,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 9 minutes, 57 seconds)
2025-09-13 20:17:34,494 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:17:34,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:22:29,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1664.75073 ± 236.101
2025-09-13 20:22:29,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1772.736, 1818.3009, 1705.5126, 1727.5333, 1765.128, 1900.5635, 1376.5299, 1691.4692, 1077.1521, 1812.5828]
2025-09-13 20:22:29,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 20:22:29,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 54 minutes, 9 seconds)
2025-09-13 20:33:06,525 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:33:06,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:38:02,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1639.45483 ± 254.904
2025-09-13 20:38:02,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1042.3795, 1460.1206, 1658.6333, 1791.9015, 1839.7255, 1986.0009, 1508.8439, 1653.6451, 1576.9414, 1876.3575]
2025-09-13 20:38:02,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 20:38:02,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 38 minutes, 22 seconds)
2025-09-13 20:48:40,120 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:48:40,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:53:36,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1689.73950 ± 181.550
2025-09-13 20:53:36,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1795.3463, 1613.3368, 1902.505, 1798.1942, 1744.907, 1545.2369, 1687.1423, 1364.7142, 1971.8009, 1474.2106]
2025-09-13 20:53:36,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 20:53:36,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 22 minutes, 46 seconds)
2025-09-13 21:04:13,773 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:04:13,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:09:14,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1729.18848 ± 188.998
2025-09-13 21:09:14,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1629.5651, 1621.5654, 1673.0688, 1541.7555, 1961.2047, 1826.2207, 1903.2673, 1346.1222, 1929.4182, 1859.6964]
2025-09-13 21:09:14,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 21:09:14,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 7 minutes, 14 seconds)
2025-09-13 21:19:53,455 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:19:53,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:24:48,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1620.23657 ± 262.449
2025-09-13 21:24:48,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1803.4286, 1880.2444, 972.1807, 1634.1019, 1925.6334, 1676.927, 1656.6691, 1623.2764, 1360.8844, 1669.0199]
2025-09-13 21:24:48,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 21:24:48,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 51 minutes, 21 seconds)
2025-09-13 21:35:29,158 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:35:29,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:40:29,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1726.34021 ± 183.369
2025-09-13 21:40:29,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1432.9191, 1861.136, 1917.6526, 1894.1698, 1529.3871, 1534.8895, 1962.6343, 1812.4741, 1560.4323, 1757.7079]
2025-09-13 21:40:29,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 21:40:29,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 36 minutes)
2025-09-13 21:51:10,391 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:51:10,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:56:07,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1826.06079 ± 214.041
2025-09-13 21:56:07,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1841.6069, 1999.3025, 1803.1588, 2047.8171, 1762.4677, 1919.8081, 1739.2809, 2067.0925, 1279.1702, 1800.9021]
2025-09-13 21:56:07,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 21:56:07,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1826.06) for latency ExtremeSparseL4U32
2025-09-13 21:56:07,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 20 minutes, 34 seconds)
2025-09-13 22:06:46,567 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:06:46,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:11:42,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1813.05957 ± 178.148
2025-09-13 22:11:42,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1790.61, 1987.8661, 1993.1509, 1860.5913, 1933.1401, 1458.4443, 1961.666, 1613.706, 1615.4893, 1915.9315]
2025-09-13 22:11:42,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 22:11:42,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 4 minutes, 58 seconds)
2025-09-13 22:22:21,579 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:22:21,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:27:18,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1588.90112 ± 399.446
2025-09-13 22:27:18,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1906.5369, 1787.2394, 1783.3588, 489.1762, 1789.4237, 1624.3073, 1675.5046, 1933.7257, 1421.5693, 1478.1694]
2025-09-13 22:27:18,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 22:27:18,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 49 minutes, 16 seconds)
2025-09-13 22:37:53,350 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:37:53,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:42:57,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1354.60217 ± 468.177
2025-09-13 22:42:57,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1785.5469, 1689.107, 1738.6056, 1469.3868, 818.1231, 1236.3478, 1018.5711, 334.64075, 1728.8777, 1726.8147]
2025-09-13 22:42:57,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 22:42:57,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 33 minutes, 47 seconds)
2025-09-13 22:53:33,602 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:53:33,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:58:37,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1652.91919 ± 274.090
2025-09-13 22:58:37,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1012.9048, 1700.4005, 1682.873, 1843.2218, 1356.09, 2066.6514, 1788.704, 1606.2505, 1670.6608, 1801.4354]
2025-09-13 22:58:37,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 22:58:37,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 18 minutes, 8 seconds)
2025-09-13 23:09:13,867 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:09:13,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:14:15,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1593.28748 ± 286.077
2025-09-13 23:14:15,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1183.9365, 1740.7521, 1800.4514, 1803.3196, 1112.1412, 1719.1833, 1241.4633, 1699.2769, 1989.8568, 1642.4938]
2025-09-13 23:14:15,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 23:14:15,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 2 minutes, 29 seconds)
2025-09-13 23:24:50,821 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:24:50,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:29:47,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1730.05725 ± 157.099
2025-09-13 23:29:47,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1754.9127, 1743.46, 1797.6908, 1831.3536, 1440.694, 1815.1329, 1930.9713, 1894.5521, 1595.6383, 1496.1665]
2025-09-13 23:29:47,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 23:29:47,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 46 minutes, 50 seconds)
2025-09-13 23:40:22,830 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:40:22,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:45:20,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1792.09497 ± 127.075
2025-09-13 23:45:20,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1673.4998, 1874.274, 1668.6151, 2017.679, 1622.9635, 1842.0599, 1983.922, 1758.7382, 1738.589, 1740.6079]
2025-09-13 23:45:20,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 23:45:20,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 31 minutes, 12 seconds)
2025-09-13 23:55:56,100 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:55:56,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:00:51,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1771.49121 ± 196.042
2025-09-14 00:00:51,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1924.0406, 1722.9865, 2091.7483, 1860.5104, 1727.1969, 1357.3086, 1957.3435, 1696.5442, 1586.6205, 1790.6117]
2025-09-14 00:00:51,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 00:00:51,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 15 minutes, 34 seconds)
2025-09-14 00:11:26,649 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:11:26,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:16:26,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1479.70361 ± 487.335
2025-09-14 00:16:26,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1608.9043, 767.86847, 1967.5891, 1584.4172, 1563.098, 1890.9158, 1615.6312, 349.23386, 1653.8619, 1795.5156]
2025-09-14 00:16:26,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 00:16:26,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1251 [DEBUG]: Training session finished
