2025-09-12 19:53:39,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc5-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay
2025-09-12 19:53:39,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc5-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay
2025-09-12 19:53:39,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x152898bb56d0>}
2025-09-12 19:53:39,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1111 [DEBUG]: using device: cuda
2025-09-12 19:53:39,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-12 19:53:39,456 baseline-mbpac-noiseperc5-halfcheetah:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-12 19:53:39,456 baseline-mbpac-noiseperc5-halfcheetah:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 19:53:39,464 baseline-mbpac-noiseperc5-halfcheetah:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-12 19:53:41,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-12 19:53:41,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-12 20:05:01,882 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:05:01,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:09:58,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: -362.69208 ± 47.372
2025-09-12 20:09:58,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [-426.45834, -354.9341, -377.02344, -380.44485, -384.21756, -437.14957, -324.8959, -286.7406, -360.06036, -294.9961]
2025-09-12 20:09:58,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:09:58,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (-362.69) for latency ExtremeSparseL4U32
2025-09-12 20:09:59,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 26 hours, 53 minutes, 25 seconds)
2025-09-12 20:20:40,743 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:20:40,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:25:34,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: -26.75596 ± 45.283
2025-09-12 20:25:34,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [23.537832, -74.121994, -38.338116, -54.631065, 42.72885, 32.581177, -75.22829, -40.969894, 0.16630545, -83.28436]
2025-09-12 20:25:34,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:25:34,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (-26.76) for latency ExtremeSparseL4U32
2025-09-12 20:25:34,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 26 hours, 2 minutes, 28 seconds)
2025-09-12 20:36:15,632 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:36:15,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:41:09,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1091.55811 ± 289.472
2025-09-12 20:41:09,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1333.6643, 1110.5913, 1174.2085, 1252.276, 1076.0211, 1113.9546, 1277.0361, 1143.0912, 254.51726, 1180.2194]
2025-09-12 20:41:09,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:41:09,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (1091.56) for latency ExtremeSparseL4U32
2025-09-12 20:41:09,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 25 hours, 34 minutes, 52 seconds)
2025-09-12 20:51:51,691 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:51:51,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:56:46,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1889.57251 ± 579.445
2025-09-12 20:56:46,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [724.6309, 2223.0227, 2280.6987, 2101.539, 2125.3403, 1883.8062, 2179.7131, 789.96893, 2203.6965, 2383.3064]
2025-09-12 20:56:46,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:56:46,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (1889.57) for latency ExtremeSparseL4U32
2025-09-12 20:56:46,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 25 hours, 14 minutes, 8 seconds)
2025-09-12 21:07:19,446 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:07:19,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:12:07,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1927.03149 ± 667.021
2025-09-12 21:12:07,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1443.1638, 1615.9358, 2615.4856, 1176.9902, 2716.547, 2125.4834, 2137.6106, 570.4591, 2533.5544, 2335.0837]
2025-09-12 21:12:07,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:12:07,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (1927.03) for latency ExtremeSparseL4U32
2025-09-12 21:12:07,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 24 hours, 50 minutes, 19 seconds)
2025-09-12 21:22:37,345 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:22:37,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:27:21,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1848.77539 ± 684.519
2025-09-12 21:27:21,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1416.3451, 843.47473, 2221.992, 566.9734, 2683.6785, 2377.5872, 1837.6704, 2393.859, 2479.9958, 1666.1782]
2025-09-12 21:27:21,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:27:21,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 24 hours, 14 minutes, 48 seconds)
2025-09-12 21:37:51,733 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:37:51,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:42:38,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2331.63794 ± 505.032
2025-09-12 21:42:38,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1508.3749, 2856.621, 2207.8425, 2794.2368, 2714.7026, 2700.4915, 1866.7924, 2301.9617, 1540.9027, 2824.454]
2025-09-12 21:42:38,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:42:38,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (2331.64) for latency ExtremeSparseL4U32
2025-09-12 21:42:38,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 23 hours, 53 minutes, 23 seconds)
2025-09-12 21:53:07,677 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:53:07,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:57:57,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2094.94385 ± 748.529
2025-09-12 21:57:57,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [947.0313, 1085.1166, 1852.5848, 2963.7324, 2813.4287, 1547.3245, 2786.7136, 2746.4983, 1498.2606, 2708.7478]
2025-09-12 21:57:57,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:57:57,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 23 hours, 33 minutes, 11 seconds)
2025-09-12 22:08:29,653 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:08:29,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:13:16,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1750.41479 ± 849.958
2025-09-12 22:13:16,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2670.7478, 2792.982, 996.6858, 2642.5713, 2017.8228, 1116.5514, 392.45105, 1282.4124, 963.5752, 2628.3506]
2025-09-12 22:13:16,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:13:16,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 23 hours, 12 minutes, 12 seconds)
2025-09-12 22:23:47,879 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:23:47,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:28:41,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2295.60571 ± 902.279
2025-09-12 22:28:41,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [982.1299, 1696.6923, 2911.2485, 1621.4758, 692.89484, 3146.3809, 3143.2605, 2808.0715, 2873.0325, 3080.8684]
2025-09-12 22:28:41,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:28:41,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 22 hours, 58 minutes, 6 seconds)
2025-09-12 22:39:13,889 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:39:13,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:44:02,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2549.50732 ± 800.743
2025-09-12 22:44:02,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2477.0627, 575.54675, 2979.8525, 3046.976, 3385.4192, 3031.153, 2253.3499, 1758.2466, 3132.0085, 2855.4617]
2025-09-12 22:44:02,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:44:02,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (2549.51) for latency ExtremeSparseL4U32
2025-09-12 22:44:02,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 22 hours, 44 minutes, 56 seconds)
2025-09-12 22:54:32,367 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:54:32,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:59:20,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2785.77808 ± 694.885
2025-09-12 22:59:20,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3277.8757, 2003.054, 2125.051, 1335.3661, 2972.221, 3275.2017, 3133.9895, 3722.607, 3183.977, 2828.4375]
2025-09-12 22:59:20,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:59:20,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (2785.78) for latency ExtremeSparseL4U32
2025-09-12 22:59:20,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 22 hours, 29 minutes, 58 seconds)
2025-09-12 23:09:50,045 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:09:50,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:14:36,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1904.49780 ± 1151.230
2025-09-12 23:14:36,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3476.945, 1094.1339, 1627.8857, 632.6722, 1080.4552, 1010.8742, 2312.352, 3704.333, 3376.0518, 729.27484]
2025-09-12 23:14:36,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:14:36,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 22 hours, 13 minutes, 41 seconds)
2025-09-12 23:25:06,834 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:25:06,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:29:53,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2514.68726 ± 1019.412
2025-09-12 23:29:53,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1536.7856, 3298.353, 3150.5544, 3275.8286, 3283.6274, 3184.4058, 3616.931, 1681.8529, 1630.7041, 487.83005]
2025-09-12 23:29:53,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:29:53,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 21 hours, 57 minutes, 59 seconds)
2025-09-12 23:40:23,616 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:40:23,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:45:09,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2428.80518 ± 1193.587
2025-09-12 23:45:09,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1612.9377, 1384.4744, 3715.8518, 3389.2642, 601.85486, 3264.6187, 2682.8743, 3542.1855, 3509.734, 584.2554]
2025-09-12 23:45:09,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:45:09,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 21 hours, 40 minutes)
2025-09-12 23:55:39,523 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:55:39,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:00:34,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2702.12183 ± 805.740
2025-09-13 00:00:34,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3481.791, 1547.8577, 3424.7334, 2262.7861, 1481.6373, 3341.7258, 3248.8274, 3183.5195, 3343.9658, 1704.3741]
2025-09-13 00:00:34,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:00:34,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 21 hours, 25 minutes, 32 seconds)
2025-09-13 00:11:06,668 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:11:06,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:15:58,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2146.54639 ± 1188.692
2025-09-13 00:15:58,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1917.876, 2649.8406, -12.410414, 1774.9417, 808.78906, 1426.151, 3638.251, 1992.5243, 3772.859, 3496.6406]
2025-09-13 00:15:58,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:15:58,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 21 hours, 12 minutes, 14 seconds)
2025-09-13 00:26:29,022 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:26:29,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:31:20,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2658.58789 ± 1268.090
2025-09-13 00:31:20,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3795.2622, 3906.677, 3868.812, 3512.3638, 1698.3157, 1029.2555, 3140.1753, 186.46408, 1979.8292, 3468.7222]
2025-09-13 00:31:20,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:31:20,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 20 hours, 58 minutes, 21 seconds)
2025-09-13 00:41:51,048 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:41:51,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:46:37,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2505.42529 ± 1243.979
2025-09-13 00:46:37,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1457.9249, 1456.0897, 3471.9397, 1315.5581, 3172.43, 3615.5144, 1881.4717, 4092.7476, 4063.3533, 527.22217]
2025-09-13 00:46:37,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:46:37,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 20 hours, 42 minutes, 55 seconds)
2025-09-13 00:57:08,365 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:57:08,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:01:53,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2909.17090 ± 902.374
2025-09-13 01:01:53,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1223.6962, 3362.4912, 3469.3188, 3212.0576, 1098.7236, 3808.9429, 3254.0774, 2862.914, 3424.539, 3374.9465]
2025-09-13 01:01:53,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:01:53,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (2909.17) for latency ExtremeSparseL4U32
2025-09-13 01:01:53,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 20 hours, 27 minutes, 46 seconds)
2025-09-13 01:12:25,874 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:12:25,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:17:12,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3040.42383 ± 1007.906
2025-09-13 01:17:12,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3292.94, 3546.4707, 3325.0835, 3265.1243, 368.27756, 3364.5308, 3852.6243, 3834.9834, 2058.6687, 3495.5344]
2025-09-13 01:17:12,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:17:12,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (3040.42) for latency ExtremeSparseL4U32
2025-09-13 01:17:12,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 20 hours, 10 minutes, 47 seconds)
2025-09-13 01:27:44,278 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:27:44,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:32:31,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2751.76465 ± 921.487
2025-09-13 01:32:31,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1767.428, 3279.1885, 3492.7446, 1635.883, 3566.9956, 3153.1177, 3740.675, 3605.0442, 2106.613, 1169.9569]
2025-09-13 01:32:31,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:32:31,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 19 hours, 54 minutes, 2 seconds)
2025-09-13 01:43:02,476 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:43:02,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:47:54,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2811.04248 ± 1128.914
2025-09-13 01:47:54,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3650.9734, 3628.1562, 3395.1494, 3776.8542, 383.5991, 3315.12, 1084.7158, 3475.4253, 3202.506, 2197.9243]
2025-09-13 01:47:54,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:47:54,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 19 hours, 39 minutes, 14 seconds)
2025-09-13 01:58:27,489 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:58:27,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:03:16,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2704.97339 ± 1049.242
2025-09-13 02:03:16,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2269.1863, 2041.7512, 3577.6165, 3393.8164, 3707.2676, 1755.9202, 3252.2412, 3387.218, 259.12363, 3405.5955]
2025-09-13 02:03:16,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:03:16,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 19 hours, 25 minutes, 8 seconds)
2025-09-13 02:13:45,763 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:13:45,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:18:34,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2717.04395 ± 934.534
2025-09-13 02:18:34,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3241.0757, 1940.172, 3474.756, 2225.951, 534.1777, 3305.8228, 3072.2368, 2241.4895, 3312.344, 3822.414]
2025-09-13 02:18:34,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:18:34,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 19 hours, 10 minutes, 13 seconds)
2025-09-13 02:29:06,115 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:29:06,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:33:59,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2324.10107 ± 1128.348
2025-09-13 02:33:59,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3357.9277, 497.10654, 2026.3352, 3499.038, 3650.8926, 3087.2915, 1605.6864, 1604.2423, 689.7432, 3222.7478]
2025-09-13 02:33:59,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:33:59,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 18 hours, 56 minutes, 34 seconds)
2025-09-13 02:44:31,638 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:44:31,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:49:19,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2793.40479 ± 1146.886
2025-09-13 02:49:19,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3627.6562, 3570.7583, 1326.5234, 2642.6716, 3921.9333, 3920.2937, 1694.1821, 4198.972, 930.09937, 2100.9587]
2025-09-13 02:49:19,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:49:19,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 18 hours, 41 minutes, 20 seconds)
2025-09-13 02:59:49,923 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:59:49,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:04:42,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2676.27295 ± 1259.621
2025-09-13 03:04:42,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2420.6387, 4153.0483, 4078.943, 347.27548, 3523.4026, 2859.0886, 4014.9133, 2328.55, 853.8198, 2183.0466]
2025-09-13 03:04:42,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:04:42,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 18 hours, 25 minutes, 49 seconds)
2025-09-13 03:15:15,910 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:15:15,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:20:09,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3574.40088 ± 724.623
2025-09-13 03:20:09,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3966.0264, 3834.0503, 3788.098, 3785.7085, 3822.0823, 1648.857, 4143.642, 2847.9246, 3829.4878, 4078.1301]
2025-09-13 03:20:09,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:20:09,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (3574.40) for latency ExtremeSparseL4U32
2025-09-13 03:20:09,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 18 hours, 11 minutes, 41 seconds)
2025-09-13 03:30:40,478 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:30:40,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:35:37,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3249.18311 ± 1179.718
2025-09-13 03:35:37,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3791.6956, 4071.0461, 3814.425, 3640.7668, 3858.3875, 1492.8151, 3967.8013, 451.63293, 3336.493, 4066.7695]
2025-09-13 03:35:37,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:35:37,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 17 hours, 58 minutes, 38 seconds)
2025-09-13 03:46:12,071 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:46:12,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:50:59,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3198.18457 ± 883.643
2025-09-13 03:50:59,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3871.9597, 1700.8171, 3727.5933, 3798.2683, 3491.1, 1470.1228, 2633.3428, 3514.804, 3760.8333, 4013.0032]
2025-09-13 03:50:59,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:50:59,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 17 hours, 42 minutes, 35 seconds)
2025-09-13 04:01:30,270 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:01:30,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:06:22,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2728.44800 ± 1248.481
2025-09-13 04:06:22,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1753.088, 3703.9373, 3946.4087, 1320.0834, 3698.3618, 2049.032, 3789.9097, 535.37286, 2160.9263, 4327.361]
2025-09-13 04:06:22,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:06:22,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 17 hours, 27 minutes, 47 seconds)
2025-09-13 04:16:54,089 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:16:54,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:21:44,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2956.86426 ± 1338.570
2025-09-13 04:21:44,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3788.655, 3796.634, 4052.3062, 585.54004, 3641.18, 467.32687, 3673.0916, 4102.0195, 3474.2166, 1987.6718]
2025-09-13 04:21:44,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:21:44,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 17 hours, 12 minutes, 16 seconds)
2025-09-13 04:32:15,309 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:32:15,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:37:07,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3270.41650 ± 986.509
2025-09-13 04:37:07,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3973.2717, 755.98047, 2058.2163, 3858.0535, 3673.1814, 3544.0713, 3606.3533, 3879.084, 3833.2524, 3522.7026]
2025-09-13 04:37:07,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:37:07,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 16 hours, 55 minutes, 57 seconds)
2025-09-13 04:47:38,130 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:47:38,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:52:25,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3068.81274 ± 1064.934
2025-09-13 04:52:25,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3655.1938, 1645.4769, 3788.2544, 3243.6953, 3667.0095, 681.7633, 3965.3525, 2467.921, 4079.5032, 3493.9543]
2025-09-13 04:52:25,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:52:25,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 16 hours, 38 minutes, 33 seconds)
2025-09-13 05:02:58,183 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:02:58,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:07:48,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2624.09351 ± 1243.698
2025-09-13 05:07:48,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1615.718, 4116.8667, 1174.1437, 3877.7737, 3734.6316, 508.65463, 2892.14, 2179.8245, 4112.765, 2028.4183]
2025-09-13 05:07:48,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:07:48,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 16 hours, 23 minutes, 10 seconds)
2025-09-13 05:18:18,174 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:18:18,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:23:11,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3242.39502 ± 1103.529
2025-09-13 05:23:11,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4133.6333, 3875.9602, 3047.5476, 2974.0286, 3472.4407, 3953.3472, 3990.4658, 3523.171, 132.86966, 3320.487]
2025-09-13 05:23:11,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:23:11,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 16 hours, 7 minutes, 55 seconds)
2025-09-13 05:33:40,669 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:33:40,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:38:26,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3679.60034 ± 607.115
2025-09-13 05:38:26,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3711.9265, 3773.8945, 3831.007, 4154.398, 4096.696, 3528.879, 4316.9272, 2067.3865, 3322.3994, 3992.4888]
2025-09-13 05:38:26,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:38:26,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (3679.60) for latency ExtremeSparseL4U32
2025-09-13 05:38:26,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 15 hours, 51 minutes, 4 seconds)
2025-09-13 05:48:57,984 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:48:58,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:53:43,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2069.71997 ± 1292.929
2025-09-13 05:53:43,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2316.6711, 1991.3362, 107.08088, 664.6479, 1182.8519, 4032.257, 1206.1559, 3353.217, 1839.5657, 4003.4136]
2025-09-13 05:53:43,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:53:43,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 15 hours, 34 minutes, 30 seconds)
2025-09-13 06:04:13,464 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:04:13,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:09:04,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2409.52393 ± 1455.585
2025-09-13 06:09:04,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3773.8176, 565.85406, 548.1573, 669.13324, 3856.7803, 1215.2559, 2083.433, 3630.6313, 3794.4521, 3957.7244]
2025-09-13 06:09:04,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:09:04,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 19 minutes, 43 seconds)
2025-09-13 06:19:37,644 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:19:37,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:24:28,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2163.40894 ± 1398.714
2025-09-13 06:24:28,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4013.4827, 1364.2986, 298.758, 2838.698, 730.06555, 3409.5173, 203.3866, 3206.0789, 1721.6252, 3848.18]
2025-09-13 06:24:28,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:24:28,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 15 hours, 4 minutes, 46 seconds)
2025-09-13 06:35:02,927 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:35:02,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:39:55,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2638.07178 ± 1187.546
2025-09-13 06:39:55,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1288.912, 2880.1316, 3015.6821, 2012.9066, 1823.3126, 244.84067, 3565.419, 3986.0825, 3836.8342, 3726.5962]
2025-09-13 06:39:55,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:39:55,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 14 hours, 50 minutes, 3 seconds)
2025-09-13 06:50:25,106 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:50:25,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:55:17,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3173.29443 ± 1515.349
2025-09-13 06:55:17,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4036.8232, 4256.998, 355.3285, 1060.4224, 1234.6023, 4251.0737, 4177.6855, 4147.9746, 4202.702, 4009.337]
2025-09-13 06:55:17,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:55:17,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 14 hours, 36 minutes, 1 second)
2025-09-13 07:05:46,536 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:05:46,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:10:36,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3317.58203 ± 638.533
2025-09-13 07:10:36,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3875.5125, 3910.6633, 4148.0654, 3604.7356, 2554.7266, 3290.313, 3009.1497, 2151.692, 2765.5203, 3865.443]
2025-09-13 07:10:36,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 07:10:36,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 14 hours, 21 minutes, 8 seconds)
2025-09-13 07:21:06,783 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:21:06,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:25:56,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2690.55151 ± 1322.590
2025-09-13 07:25:56,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [249.88574, 4030.4653, 3800.8262, 3465.154, 2008.3485, 3152.6929, 308.5816, 3484.2988, 3649.989, 2755.2742]
2025-09-13 07:25:56,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 07:25:56,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 14 hours, 5 minutes, 37 seconds)
2025-09-13 07:36:28,392 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:36:28,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:41:17,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3201.71533 ± 1095.175
2025-09-13 07:41:17,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4023.5718, 2703.4768, 2067.4766, 3859.5742, 447.42123, 3956.6772, 3759.1768, 3595.2722, 3738.7502, 3865.755]
2025-09-13 07:41:17,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 07:41:17,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 13 hours, 49 minutes, 32 seconds)
2025-09-13 07:51:48,030 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:51:48,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:56:37,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3298.05396 ± 1113.943
2025-09-13 07:56:37,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1243.1575, 3767.9868, 3758.5496, 3910.5793, 1066.353, 4005.6086, 3817.0295, 3210.5269, 4508.088, 3692.66]
2025-09-13 07:56:37,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 07:56:37,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 13 hours, 33 minutes, 9 seconds)
2025-09-13 08:07:07,889 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:07:07,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:11:55,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3091.28955 ± 1363.335
2025-09-13 08:11:55,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3934.5073, 3903.7495, 4064.4827, 1082.4431, 4049.9395, 1376.5215, 3814.3252, 4029.7515, 624.65265, 4032.5261]
2025-09-13 08:11:55,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:11:55,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 13 hours, 17 minutes)
2025-09-13 08:22:26,822 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:22:26,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:27:19,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3171.35620 ± 1093.452
2025-09-13 08:27:19,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3653.2114, 2307.701, 3961.3335, 2501.087, 594.65247, 2615.4956, 4094.2463, 4063.3467, 3786.0984, 4136.3936]
2025-09-13 08:27:19,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:27:19,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 13 hours, 2 minutes, 27 seconds)
2025-09-13 08:37:50,056 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:37:50,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:42:39,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3165.69360 ± 1126.450
2025-09-13 08:42:39,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4081.4182, 3825.384, 3572.0356, 1530.3496, 3739.3696, 588.84015, 3972.5435, 3422.2815, 2865.048, 4059.6648]
2025-09-13 08:42:39,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:42:39,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 12 hours, 47 minutes, 8 seconds)
2025-09-13 08:53:11,898 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:53:11,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:57:59,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2421.99072 ± 1552.434
2025-09-13 08:57:59,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [638.52075, 3605.113, 3887.5015, 4217.3237, 359.46423, 3779.2854, 2520.1992, 297.84363, 1129.0519, 3785.6033]
2025-09-13 08:57:59,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:57:59,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 12 hours, 31 minutes, 35 seconds)
2025-09-13 09:08:29,792 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:08:29,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:13:18,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3232.91602 ± 941.271
2025-09-13 09:13:18,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4002.1016, 1434.9205, 2547.537, 3670.801, 4026.41, 3777.623, 3419.9336, 3934.5737, 3868.609, 1646.6506]
2025-09-13 09:13:18,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 09:13:18,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 16 minutes, 2 seconds)
2025-09-13 09:23:48,531 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:23:48,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:28:44,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3217.53271 ± 894.233
2025-09-13 09:28:44,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [985.97253, 3754.9937, 3564.92, 3810.958, 3405.0913, 2556.6594, 4013.0674, 4060.8948, 3432.0493, 2590.722]
2025-09-13 09:28:44,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 09:28:44,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 2 minutes, 11 seconds)
2025-09-13 09:39:16,454 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:39:16,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:44:03,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2834.05176 ± 1323.733
2025-09-13 09:44:03,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3878.3398, 4011.9097, 3945.322, 1911.976, 4332.912, 3401.2222, 1726.2943, 3571.0527, 473.36655, 1088.1227]
2025-09-13 09:44:03,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 09:44:03,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 11 hours, 45 minutes, 56 seconds)
2025-09-13 09:54:33,399 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:54:33,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:59:20,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2845.58789 ± 1058.005
2025-09-13 09:59:20,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [942.5885, 3636.8635, 707.9569, 3343.1506, 3973.4338, 2922.2048, 3487.4446, 3405.8691, 3199.5793, 2836.7874]
2025-09-13 09:59:20,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 09:59:20,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 11 hours, 30 minutes, 2 seconds)
2025-09-13 10:09:51,386 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:09:51,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:14:37,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2544.08691 ± 1586.461
2025-09-13 10:14:37,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4271.708, 3586.798, 869.1022, 3891.9067, 3054.599, 983.6928, 220.12418, 4095.3206, 3933.7153, 533.9043]
2025-09-13 10:14:37,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:14:37,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 11 hours, 14 minutes, 22 seconds)
2025-09-13 10:25:07,603 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:25:07,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:29:58,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3557.24097 ± 1120.084
2025-09-13 10:29:58,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4037.0525, 3992.6309, 3850.313, 4269.104, 3927.8066, 3995.2524, 3916.359, 3433.5437, 3900.9568, 249.38899]
2025-09-13 10:29:58,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:29:58,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 10 hours, 59 minutes, 21 seconds)
2025-09-13 10:40:31,907 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:40:31,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:45:27,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2147.11475 ± 1580.641
2025-09-13 10:45:27,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3885.3647, 354.15372, 535.87683, 3676.422, 4259.796, 50.071785, 1094.5033, 1230.0513, 3831.8704, 2553.0376]
2025-09-13 10:45:27,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:45:27,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 10 hours, 44 minutes, 19 seconds)
2025-09-13 10:55:58,545 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:55:58,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:00:45,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3266.72607 ± 940.171
2025-09-13 11:00:45,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3873.3682, 3851.1382, 3808.618, 3913.8901, 2244.4512, 3862.6257, 4038.4277, 1618.3688, 1697.5543, 3758.8176]
2025-09-13 11:00:45,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:00:45,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 10 hours, 29 minutes, 3 seconds)
2025-09-13 11:11:15,496 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:11:15,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:16:01,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3090.93408 ± 1240.389
2025-09-13 11:16:01,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3510.5735, 3286.3474, 3855.8906, 465.56012, 3339.4243, 3631.6482, 3981.915, 3805.107, 876.9993, 4155.8774]
2025-09-13 11:16:01,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:16:02,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 10 hours, 13 minutes, 35 seconds)
2025-09-13 11:26:32,908 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:26:32,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:31:20,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2755.01099 ± 1291.409
2025-09-13 11:31:20,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3915.7844, 2828.1091, 501.78354, 1002.378, 3966.1802, 2629.74, 4338.9053, 1918.7992, 2213.0361, 4235.394]
2025-09-13 11:31:20,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:31:20,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 9 hours, 58 minutes, 25 seconds)
2025-09-13 11:41:51,895 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:41:51,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:46:39,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3121.40674 ± 1305.365
2025-09-13 11:46:39,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4134.4766, 4142.519, 4234.1914, 425.56104, 2350.1145, 3851.6025, 4373.3535, 2508.6316, 1404.4249, 3789.1938]
2025-09-13 11:46:39,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:46:39,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 9 hours, 42 minutes, 45 seconds)
2025-09-13 11:57:12,305 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:57:12,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:02:02,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3209.01758 ± 1048.561
2025-09-13 12:02:02,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4040.5798, 3216.8337, 3838.9656, 3567.9138, 3317.1184, 1324.7041, 3751.2092, 4150.0938, 1052.6464, 3830.1108]
2025-09-13 12:02:02,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 12:02:02,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 9 hours, 26 minutes, 46 seconds)
2025-09-13 12:12:36,621 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:12:36,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:17:24,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3134.13696 ± 1487.731
2025-09-13 12:17:24,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3753.0752, 397.83676, 4083.5886, 3849.2498, 4054.8857, 2988.554, 4173.9907, 3768.6504, 73.75251, 4197.787]
2025-09-13 12:17:24,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 12:17:24,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 9 hours, 11 minutes, 50 seconds)
2025-09-13 12:27:57,723 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:27:57,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:32:51,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3066.19409 ± 1437.498
2025-09-13 12:32:51,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4275.557, 231.9159, 4006.786, 4126.653, 4283.32, 1590.342, 4167.0254, 3860.4883, 2963.1438, 1156.71]
2025-09-13 12:32:51,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 12:32:51,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 8 hours, 57 minutes, 44 seconds)
2025-09-13 12:43:21,847 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:43:21,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:48:08,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2912.12329 ± 1396.095
2025-09-13 12:48:08,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4162.346, 3742.8489, 2373.4404, 3879.8523, 2061.1816, 642.09827, 3713.118, 4196.283, 337.01675, 4013.0452]
2025-09-13 12:48:08,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 12:48:08,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 8 hours, 42 minutes, 17 seconds)
2025-09-13 12:58:39,683 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:58:39,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:03:34,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3621.65942 ± 826.857
2025-09-13 13:03:34,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3906.6047, 3949.596, 4414.6294, 3890.5034, 3972.2437, 3982.535, 4039.1355, 2946.2092, 1374.5914, 3740.5466]
2025-09-13 13:03:34,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 13:03:34,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 27 minutes, 38 seconds)
2025-09-13 13:14:07,957 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:14:07,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:18:56,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3683.01636 ± 871.192
2025-09-13 13:18:56,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4060.5435, 3915.704, 1138.8519, 3526.9756, 3909.065, 4214.66, 3892.5276, 3813.1458, 4245.2964, 4113.3955]
2025-09-13 13:18:56,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 13:18:56,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (3683.02) for latency ExtremeSparseL4U32
2025-09-13 13:18:56,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 12 minutes, 5 seconds)
2025-09-13 13:29:27,327 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:29:27,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:34:18,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3576.13721 ± 1040.239
2025-09-13 13:34:18,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4056.696, 3723.1965, 507.10754, 4083.5144, 4304.2017, 3999.8438, 3910.4456, 3743.3428, 3675.895, 3757.1282]
2025-09-13 13:34:18,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 13:34:18,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 56 minutes, 48 seconds)
2025-09-13 13:44:51,938 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:44:51,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:49:39,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2731.49414 ± 1373.633
2025-09-13 13:49:39,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3971.6267, 387.61792, 3575.4646, 3225.2266, 3558.8965, 1341.5862, 1988.6473, 4188.778, 878.044, 4199.0527]
2025-09-13 13:49:39,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 13:49:39,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 40 minutes, 46 seconds)
2025-09-13 14:00:11,077 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:00:11,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:04:58,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3230.33594 ± 1295.964
2025-09-13 14:04:58,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3941.4075, 4316.473, 487.7696, 3967.931, 4475.1367, 3762.4148, 1702.1085, 3619.5105, 4162.5024, 1868.1025]
2025-09-13 14:04:58,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 14:04:58,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 25 minutes, 34 seconds)
2025-09-13 14:15:31,056 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:15:31,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:20:20,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2653.30908 ± 1352.914
2025-09-13 14:20:20,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2448.8796, 3859.0151, 3753.1758, 1443.7732, 3921.7986, 3573.5833, 217.61961, 529.93933, 3661.9624, 3123.3445]
2025-09-13 14:20:20,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 14:20:20,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 9 minutes, 58 seconds)
2025-09-13 14:30:50,285 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:30:50,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:35:44,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2168.00098 ± 1750.060
2025-09-13 14:35:44,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4500.4995, 193.03171, 959.3642, 203.92555, 576.00793, 3996.3533, 4036.633, 3259.1624, 344.21774, 3610.817]
2025-09-13 14:35:44,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 14:35:44,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 54 minutes, 47 seconds)
2025-09-13 14:46:13,978 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:46:13,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:51:03,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2303.85938 ± 1387.199
2025-09-13 14:51:03,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2143.5261, 871.6832, 3865.401, 1068.3026, 1627.0989, 4066.1562, 1294.3798, 3837.0732, 3894.4656, 370.5078]
2025-09-13 14:51:03,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 14:51:03,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 39 minutes, 1 second)
2025-09-13 15:01:32,436 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:01:32,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:06:20,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2864.30957 ± 1422.404
2025-09-13 15:06:20,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3550.1907, 642.7023, 3729.9731, 200.9509, 4074.1567, 4148.13, 3999.8477, 3669.0173, 3129.2405, 1498.8864]
2025-09-13 15:06:20,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 15:06:20,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 23 minutes, 26 seconds)
2025-09-13 15:16:49,349 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:16:49,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:21:43,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3293.30591 ± 1034.434
2025-09-13 15:21:43,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1786.094, 4090.3596, 4097.4287, 3606.9722, 3706.4202, 3802.8862, 816.93555, 3514.3875, 3823.4475, 3688.1252]
2025-09-13 15:21:43,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 15:21:43,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 8 minutes, 22 seconds)
2025-09-13 15:32:15,851 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:32:15,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:37:03,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2472.67261 ± 1573.818
2025-09-13 15:37:03,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [223.06876, 1143.415, 870.8524, 3941.9524, 3944.2954, 4350.784, 1152.1199, 1312.5424, 4338.0156, 3449.6794]
2025-09-13 15:37:03,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 15:37:03,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 52 minutes, 53 seconds)
2025-09-13 15:47:30,871 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:47:30,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:52:19,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3444.62061 ± 1137.014
2025-09-13 15:52:19,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1604.3915, 4195.24, 4066.401, 4360.563, 3691.3286, 2896.8418, 4199.572, 4338.654, 1061.7611, 4031.4521]
2025-09-13 15:52:19,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 15:52:19,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 36 minutes, 53 seconds)
2025-09-13 16:02:50,145 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:02:50,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:07:38,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3514.81567 ± 1103.285
2025-09-13 16:07:38,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3423.2847, 3743.3145, 4163.7524, 1348.9009, 1410.133, 4288.6973, 4014.9778, 4496.7144, 4158.2314, 4100.154]
2025-09-13 16:07:38,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 16:07:38,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 21 minutes, 38 seconds)
2025-09-13 16:18:07,009 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:18:07,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:23:02,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3291.04102 ± 1212.413
2025-09-13 16:23:02,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [461.65634, 3961.2646, 3899.8206, 2392.752, 3975.8074, 4392.1655, 3802.6692, 3955.148, 1905.0581, 4164.067]
2025-09-13 16:23:02,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 16:23:02,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 6 minutes, 48 seconds)
2025-09-13 16:33:35,460 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:33:35,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:38:28,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2613.39062 ± 1134.071
2025-09-13 16:38:28,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3579.203, 4463.2725, 891.73987, 4036.8547, 1939.3749, 896.76025, 2300.9111, 2652.172, 2736.528, 2637.0894]
2025-09-13 16:38:28,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 16:38:28,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 51 minutes, 40 seconds)
2025-09-13 16:48:59,138 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:48:59,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:53:45,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3377.31519 ± 1150.152
2025-09-13 16:53:45,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4210.698, 4309.6167, 4425.114, 2570.8945, 4290.0205, 3398.3076, 3975.6548, 1031.7332, 3901.0713, 1660.042]
2025-09-13 16:53:45,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 16:53:45,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 36 minutes, 5 seconds)
2025-09-13 17:04:14,675 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:04:14,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:09:07,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2344.17529 ± 1597.913
2025-09-13 17:09:07,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4186.008, 3458.975, 104.07643, 2422.266, 4180.399, 4026.847, 97.538734, 2941.943, 1610.1864, 413.51178]
2025-09-13 17:09:07,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 17:09:07,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 21 minutes, 9 seconds)
2025-09-13 17:19:37,283 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:19:37,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:24:30,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2669.12793 ± 1256.954
2025-09-13 17:24:30,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1697.5896, 3570.2136, 3241.6008, 3734.074, 1638.5507, 1671.261, 2992.842, 3972.101, 4091.82, 81.22519]
2025-09-13 17:24:30,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 17:24:30,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 5 minutes, 58 seconds)
2025-09-13 17:34:58,694 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:34:58,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:39:50,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2817.69141 ± 1372.374
2025-09-13 17:39:50,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4111.906, 3698.296, 3856.4536, 4184.444, 2258.3792, 282.29248, 3207.6162, 2025.4427, 643.70245, 3908.3816]
2025-09-13 17:39:50,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 17:39:50,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 50 minutes, 25 seconds)
2025-09-13 17:50:22,042 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:50:22,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:55:18,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3383.76807 ± 977.250
2025-09-13 17:55:18,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3618.622, 3655.5918, 3151.0361, 4136.083, 3947.9614, 1041.1741, 2126.2283, 4057.9756, 3868.1711, 4234.834]
2025-09-13 17:55:18,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 17:55:18,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 35 minutes, 6 seconds)
2025-09-13 18:05:45,961 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:05:45,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:10:41,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3883.51685 ± 829.068
2025-09-13 18:10:41,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4246.679, 4097.4155, 1527.9199, 4691.794, 4218.2407, 4071.247, 4249.19, 3735.3918, 4285.2627, 3712.026]
2025-09-13 18:10:41,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 18:10:41,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (3883.52) for latency ExtremeSparseL4U32
2025-09-13 18:10:41,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 20 minutes, 1 second)
2025-09-13 18:21:11,196 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:21:11,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:26:01,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3109.16919 ± 1722.714
2025-09-13 18:26:01,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3993.2683, 4220.5317, 4.432637, 281.39566, 4373.4688, 4039.0366, 4534.822, 1301.8594, 4227.1416, 4115.7354]
2025-09-13 18:26:01,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 18:26:01,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 4 minutes, 33 seconds)
2025-09-13 18:36:28,185 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:36:28,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:41:22,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2214.37646 ± 1544.231
2025-09-13 18:41:22,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [340.6603, 482.78754, 1514.1332, 60.821976, 2075.2542, 3999.3467, 3666.8884, 3424.1199, 2041.4747, 4538.2764]
2025-09-13 18:41:22,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 18:41:22,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 49 minutes, 7 seconds)
2025-09-13 18:51:54,057 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:51:54,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:56:45,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3019.43457 ± 1611.875
2025-09-13 18:56:45,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [752.2297, 3148.4238, 4486.792, 681.909, 4192.6865, 3513.227, 4317.82, 4364.4424, 465.1857, 4271.6294]
2025-09-13 18:56:45,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 18:56:45,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 33 minutes, 48 seconds)
2025-09-13 19:07:14,794 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:07:14,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:12:09,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3378.30396 ± 1233.687
2025-09-13 19:12:09,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4112.7026, 541.8547, 4181.5845, 2604.1763, 1898.6469, 4469.162, 4140.4443, 4377.9263, 3423.2246, 4033.3174]
2025-09-13 19:12:09,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:12:09,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 18 minutes, 20 seconds)
2025-09-13 19:22:38,744 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:22:38,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:27:26,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3426.22534 ± 1410.813
2025-09-13 19:27:26,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3829.463, 4301.4424, 4214.1064, 4161.503, 4350.5376, 1002.6826, 4175.597, 277.3587, 3914.496, 4035.0662]
2025-09-13 19:27:26,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:27:26,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 2 minutes, 49 seconds)
2025-09-13 19:37:54,392 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:37:54,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:42:48,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2379.91650 ± 1619.056
2025-09-13 19:42:48,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4153.1997, 771.31445, 1966.6885, 3928.8342, 2742.3562, 654.1707, 350.60825, 4417.8145, 600.2634, 4213.9146]
2025-09-13 19:42:48,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:42:48,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 47 minutes, 29 seconds)
2025-09-13 19:53:17,480 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:53:17,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:58:13,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3389.02026 ± 879.766
2025-09-13 19:58:13,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4078.7617, 3744.7407, 4009.2085, 3381.765, 3660.4138, 2479.3682, 3252.8274, 3995.5925, 4119.3105, 1168.2133]
2025-09-13 19:58:13,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:58:13,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 32 minutes, 12 seconds)
2025-09-13 20:08:48,205 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:08:48,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:13:37,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3708.76880 ± 737.306
2025-09-13 20:13:37,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4153.457, 3919.5178, 4085.6255, 4233.417, 2374.1877, 4530.8315, 3272.148, 3873.2888, 2372.89, 4272.326]
2025-09-13 20:13:37,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 20:13:37,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 16 minutes, 52 seconds)
2025-09-13 20:24:07,566 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:24:07,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:29:00,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3443.86963 ± 1235.967
2025-09-13 20:29:00,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2173.7412, 4264.2734, 4180.8286, 4459.3794, 2373.0344, 4104.104, 554.9765, 4052.6628, 4414.7437, 3860.9514]
2025-09-13 20:29:00,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 20:29:00,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 1 minute, 29 seconds)
2025-09-13 20:39:29,541 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:39:29,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:44:16,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3582.42188 ± 774.905
2025-09-13 20:44:16,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3398.0212, 4024.8352, 4194.991, 4202.419, 2336.0603, 4339.6816, 4040.5073, 3290.198, 3976.4949, 2021.0061]
2025-09-13 20:44:16,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 20:44:16,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 46 minutes, 5 seconds)
2025-09-13 20:54:45,082 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:54:45,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:59:30,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3537.28833 ± 1195.136
2025-09-13 20:59:30,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1919.3684, 4193.7554, 4051.096, 3553.0466, 4103.26, 4338.3535, 4109.995, 4152.0747, 4353.4463, 598.4879]
2025-09-13 20:59:30,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 20:59:30,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 30 minutes, 40 seconds)
2025-09-13 21:09:58,148 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:09:58,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:14:49,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2435.41016 ± 1477.174
2025-09-13 21:14:49,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2316.097, 4319.105, 260.56012, 3605.1643, 460.1923, 1005.00256, 3752.6343, 1881.5067, 2282.1018, 4471.739]
2025-09-13 21:14:49,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 21:14:49,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 15 minutes, 19 seconds)
2025-09-13 21:25:21,900 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:25:21,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:30:13,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3069.24976 ± 925.176
2025-09-13 21:30:13,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2210.9963, 3925.133, 3390.2954, 3447.0256, 3612.4868, 4136.358, 1373.2328, 2504.1626, 4098.524, 1994.2838]
2025-09-13 21:30:13,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 21:30:13,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1251 [DEBUG]: Training session finished
