2025-09-12 19:53:45,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc0-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay
2025-09-12 19:53:45,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc0-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay
2025-09-12 19:53:45,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x149b0f2c5550>}
2025-09-12 19:53:45,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1111 [DEBUG]: using device: cuda
2025-09-12 19:53:45,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-12 19:53:45,483 baseline-mbpac-noiseperc0-halfcheetah:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-12 19:53:45,483 baseline-mbpac-noiseperc0-halfcheetah:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 19:53:45,490 baseline-mbpac-noiseperc0-halfcheetah:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-12 19:53:46,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-12 19:53:46,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-12 20:05:20,041 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:05:20,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:10:29,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: -140.34914 ± 33.789
2025-09-12 20:10:29,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [-108.951775, -138.1934, -159.74435, -163.8909, -177.5985, -112.9275, -74.79334, -123.672844, -151.91907, -191.79973]
2025-09-12 20:10:29,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:10:29,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (-140.35) for latency ExtremeSparseL4U32
2025-09-12 20:10:29,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 27 hours, 35 minutes, 48 seconds)
2025-09-12 20:21:30,866 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:21:30,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:26:38,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 31.24345 ± 79.476
2025-09-12 20:26:38,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [-49.685387, -73.872444, 104.12026, 90.759674, 5.000357, -104.72664, 27.196852, 123.95697, 74.423805, 115.26106]
2025-09-12 20:26:38,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:26:38,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (31.24) for latency ExtremeSparseL4U32
2025-09-12 20:26:38,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 26 hours, 50 minutes, 17 seconds)
2025-09-12 20:37:38,635 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:37:38,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:42:44,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 626.06769 ± 185.002
2025-09-12 20:42:44,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [587.57825, 626.9075, 738.4476, 762.3703, 166.27617, 423.41888, 728.4243, 762.0374, 680.52124, 784.6951]
2025-09-12 20:42:44,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:42:44,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (626.07) for latency ExtremeSparseL4U32
2025-09-12 20:42:44,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 26 hours, 23 minutes, 14 seconds)
2025-09-12 20:53:46,426 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:53:46,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:58:56,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1334.00098 ± 104.876
2025-09-12 20:58:56,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1342.4641, 1207.1436, 1348.9136, 1435.5902, 1329.9712, 1173.4241, 1189.2163, 1433.7161, 1485.5826, 1393.9869]
2025-09-12 20:58:56,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:58:56,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (1334.00) for latency ExtremeSparseL4U32
2025-09-12 20:58:56,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 26 hours, 4 minutes, 13 seconds)
2025-09-12 21:09:58,114 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:09:58,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:14:59,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1511.73596 ± 78.304
2025-09-12 21:14:59,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1601.2554, 1585.6168, 1486.8624, 1382.1433, 1451.0948, 1650.7051, 1516.9264, 1488.809, 1429.6942, 1524.252]
2025-09-12 21:14:59,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:14:59,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (1511.74) for latency ExtremeSparseL4U32
2025-09-12 21:14:59,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 25 hours, 43 minutes, 16 seconds)
2025-09-12 21:25:49,800 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:25:49,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:30:49,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1867.04321 ± 631.423
2025-09-12 21:30:49,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2262.4019, 2034.1149, 2003.1903, 1943.1859, 2351.986, 25.179398, 2215.7183, 1874.3193, 2022.2363, 1938.0988]
2025-09-12 21:30:49,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:30:49,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (1867.04) for latency ExtremeSparseL4U32
2025-09-12 21:30:49,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 25 hours, 10 minutes, 3 seconds)
2025-09-12 21:41:37,149 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:41:37,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:46:43,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2279.66504 ± 269.940
2025-09-12 21:46:43,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2471.1519, 2361.5186, 2376.7988, 2426.8586, 1576.2914, 2544.844, 2009.1472, 2328.9602, 2392.3496, 2308.7305]
2025-09-12 21:46:43,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:46:43,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (2279.67) for latency ExtremeSparseL4U32
2025-09-12 21:46:43,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 24 hours, 49 minutes, 44 seconds)
2025-09-12 21:57:32,989 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:57:32,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:02:36,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2549.91895 ± 175.555
2025-09-12 22:02:36,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2781.0593, 2584.29, 2253.5305, 2500.4846, 2526.0352, 2430.0747, 2497.4976, 2727.4387, 2365.713, 2833.0645]
2025-09-12 22:02:36,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:02:36,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (2549.92) for latency ExtremeSparseL4U32
2025-09-12 22:02:36,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 24 hours, 29 minutes, 37 seconds)
2025-09-12 22:13:27,397 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:13:27,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:18:29,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2617.54956 ± 249.000
2025-09-12 22:18:29,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2468.7642, 2802.4807, 2815.803, 2827.8267, 2058.9412, 2931.9739, 2377.3982, 2572.8103, 2604.9268, 2714.5713]
2025-09-12 22:18:29,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:18:29,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (2617.55) for latency ExtremeSparseL4U32
2025-09-12 22:18:29,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 24 hours, 7 minutes, 49 seconds)
2025-09-12 22:29:21,593 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:29:21,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:34:20,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2435.36108 ± 696.260
2025-09-12 22:34:20,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [394.84134, 2546.7344, 2719.645, 2578.2122, 2691.026, 2470.268, 2719.2131, 2977.7148, 2793.6934, 2462.262]
2025-09-12 22:34:20,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:34:20,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 23 hours, 48 minutes, 20 seconds)
2025-09-12 22:45:14,223 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:45:14,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:50:20,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2564.60352 ± 597.560
2025-09-12 22:50:20,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1366.2117, 2616.7834, 3081.4673, 3206.2737, 1494.1758, 2889.4814, 2590.7305, 2950.8198, 2712.1208, 2737.971]
2025-09-12 22:50:20,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:50:20,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 23 hours, 35 minutes, 25 seconds)
2025-09-12 23:01:10,870 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:01:10,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:06:09,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2658.17114 ± 252.176
2025-09-12 23:06:09,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2897.8867, 2632.2285, 2428.0835, 2854.9404, 2861.9844, 2735.8677, 2985.5554, 2491.5562, 2581.021, 2112.5884]
2025-09-12 23:06:09,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:06:09,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (2658.17) for latency ExtremeSparseL4U32
2025-09-12 23:06:09,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 23 hours, 18 minutes, 1 second)
2025-09-12 23:16:59,108 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:16:59,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:21:59,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2877.48169 ± 156.014
2025-09-12 23:21:59,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2879.9382, 3008.8965, 2928.6465, 2792.884, 2669.9502, 2825.3582, 2926.3562, 3188.053, 2616.6218, 2938.1138]
2025-09-12 23:21:59,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:21:59,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (2877.48) for latency ExtremeSparseL4U32
2025-09-12 23:21:59,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 23 hours, 1 minute, 22 seconds)
2025-09-12 23:32:50,135 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:32:50,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:37:50,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2646.40894 ± 761.057
2025-09-12 23:37:50,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2928.3826, 3143.4756, 2920.6377, 2932.298, 2545.0422, 3190.585, 427.5541, 2802.582, 2700.0222, 2873.5088]
2025-09-12 23:37:50,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:37:50,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 22 hours, 44 minutes, 42 seconds)
2025-09-12 23:48:41,528 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:48:41,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:53:41,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2801.11548 ± 768.976
2025-09-12 23:53:41,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3395.6172, 3098.1836, 3162.43, 2815.8667, 2862.6333, 3210.9458, 2928.5798, 2941.7285, 548.809, 3046.3618]
2025-09-12 23:53:41,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:53:41,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 22 hours, 28 minutes, 51 seconds)
2025-09-13 00:04:31,858 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:04:31,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:09:35,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2875.76904 ± 284.350
2025-09-13 00:09:35,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2418.3372, 2440.8428, 2603.462, 2742.238, 3127.9148, 2965.5032, 3180.1338, 3069.804, 3180.2222, 3029.2327]
2025-09-13 00:09:35,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:09:35,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 22 hours, 11 minutes, 34 seconds)
2025-09-13 00:20:25,475 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:20:25,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:25:33,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2896.02954 ± 230.718
2025-09-13 00:25:33,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3227.5808, 2932.7644, 2907.7917, 2870.0247, 2839.457, 2890.0476, 2947.118, 2854.5312, 3174.0474, 2316.9324]
2025-09-13 00:25:33,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:25:33,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (2896.03) for latency ExtremeSparseL4U32
2025-09-13 00:25:33,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 21 hours, 58 minutes, 5 seconds)
2025-09-13 00:36:24,550 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:36:24,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:41:27,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2774.68994 ± 772.614
2025-09-13 00:41:27,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3019.8496, 506.50223, 2765.7017, 2991.3037, 3108.6672, 3241.8164, 3210.8083, 3103.3638, 2728.51, 3070.3796]
2025-09-13 00:41:27,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:41:27,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 21 hours, 43 minutes)
2025-09-13 00:52:15,493 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:52:15,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:57:20,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2970.51294 ± 183.477
2025-09-13 00:57:20,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2861.7178, 2925.5103, 3146.8271, 2572.6948, 3168.9836, 2813.5325, 3197.995, 3084.4204, 2918.5308, 3014.9155]
2025-09-13 00:57:20,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:57:20,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (2970.51) for latency ExtremeSparseL4U32
2025-09-13 00:57:20,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 21 hours, 27 minutes, 49 seconds)
2025-09-13 01:08:10,298 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:08:10,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:13:14,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3037.91431 ± 219.180
2025-09-13 01:13:14,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3083.7927, 2997.958, 3257.2598, 2983.087, 3126.178, 2796.5378, 2986.361, 3295.932, 2561.839, 3290.197]
2025-09-13 01:13:14,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:13:14,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (3037.91) for latency ExtremeSparseL4U32
2025-09-13 01:13:14,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 21 hours, 12 minutes, 43 seconds)
2025-09-13 01:24:05,889 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:24:05,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:29:09,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3077.86084 ± 321.813
2025-09-13 01:29:09,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2209.094, 3069.1575, 3273.0015, 2893.0837, 3192.7065, 3265.6047, 3033.1138, 3159.9526, 3420.3445, 3262.5498]
2025-09-13 01:29:09,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:29:09,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (3077.86) for latency ExtremeSparseL4U32
2025-09-13 01:29:10,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 20 hours, 57 minutes, 11 seconds)
2025-09-13 01:40:00,388 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:40:00,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:44:58,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3191.64868 ± 288.241
2025-09-13 01:44:58,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3377.3823, 3388.346, 3047.608, 3162.1628, 2712.269, 3536.04, 3427.312, 3259.9277, 3360.5095, 2644.9304]
2025-09-13 01:44:58,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:44:58,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (3191.65) for latency ExtremeSparseL4U32
2025-09-13 01:44:58,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 20 hours, 38 minutes, 47 seconds)
2025-09-13 01:55:50,245 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:55:50,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:00:57,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3196.60889 ± 255.801
2025-09-13 02:00:57,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3499.2388, 2827.8853, 2710.8872, 3238.6235, 3512.397, 3235.1052, 3250.8496, 3395.7043, 3005.329, 3290.0706]
2025-09-13 02:00:57,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:00:57,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (3196.61) for latency ExtremeSparseL4U32
2025-09-13 02:00:57,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 20 hours, 24 minutes, 26 seconds)
2025-09-13 02:11:48,513 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:11:48,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:16:49,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3187.82593 ± 608.327
2025-09-13 02:16:49,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3247.1396, 1480.9562, 3427.5088, 3013.9722, 3535.149, 3743.4192, 3339.5916, 3613.0237, 3068.909, 3408.5923]
2025-09-13 02:16:49,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:16:49,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 20 hours, 8 minutes, 15 seconds)
2025-09-13 02:27:41,417 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:27:41,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:32:41,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3301.90234 ± 187.384
2025-09-13 02:32:41,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3359.8071, 2965.4072, 3247.1333, 3388.5972, 3218.347, 3597.4922, 3133.907, 3428.2866, 3543.215, 3136.8328]
2025-09-13 02:32:41,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:32:41,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (3301.90) for latency ExtremeSparseL4U32
2025-09-13 02:32:41,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 19 hours, 51 minutes, 52 seconds)
2025-09-13 02:43:30,911 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:43:30,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:48:36,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3281.84497 ± 282.095
2025-09-13 02:48:36,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3350.5305, 3515.007, 3070.3115, 3257.5244, 3699.6047, 2868.588, 3355.5647, 3658.92, 2844.3389, 3198.0579]
2025-09-13 02:48:36,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:48:36,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 19 hours, 35 minutes, 38 seconds)
2025-09-13 02:59:25,607 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:59:25,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:04:24,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3213.89087 ± 325.673
2025-09-13 03:04:24,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2556.129, 3125.4963, 3575.3848, 3610.8157, 3188.9429, 3133.2612, 2862.7659, 3615.0198, 3376.4028, 3094.6887]
2025-09-13 03:04:24,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:04:24,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 19 hours, 19 minutes, 46 seconds)
2025-09-13 03:15:14,967 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:15:14,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:20:14,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3340.91089 ± 239.432
2025-09-13 03:20:14,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3169.4055, 3131.561, 3188.8218, 3803.598, 3097.933, 3387.7412, 3423.1333, 3663.6948, 3472.9043, 3070.3164]
2025-09-13 03:20:14,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:20:14,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (3340.91) for latency ExtremeSparseL4U32
2025-09-13 03:20:14,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 19 hours, 1 minute, 39 seconds)
2025-09-13 03:31:06,516 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:31:06,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:36:10,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3352.58838 ± 268.342
2025-09-13 03:36:10,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3591.7444, 3257.062, 3553.6184, 3187.4128, 3211.3499, 3595.091, 2993.6223, 3696.484, 3555.0737, 2884.4226]
2025-09-13 03:36:10,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:36:10,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (3352.59) for latency ExtremeSparseL4U32
2025-09-13 03:36:10,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 18 hours, 46 minutes, 47 seconds)
2025-09-13 03:47:02,296 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:47:02,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:52:05,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3461.76318 ± 251.562
2025-09-13 03:52:05,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3640.4993, 3181.7925, 3089.6252, 3782.0112, 3603.3418, 3536.342, 3215.903, 3227.0305, 3507.5618, 3833.5242]
2025-09-13 03:52:05,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:52:05,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (3461.76) for latency ExtremeSparseL4U32
2025-09-13 03:52:05,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 18 hours, 31 minutes, 33 seconds)
2025-09-13 04:03:05,692 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:03:05,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:08:03,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3320.51123 ± 267.662
2025-09-13 04:08:03,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3120.8467, 3692.906, 2956.422, 3266.9714, 3395.706, 3315.2646, 2929.2983, 3418.2808, 3305.2378, 3804.181]
2025-09-13 04:08:03,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:08:03,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 18 hours, 16 minutes, 27 seconds)
2025-09-13 04:18:45,870 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:18:45,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:23:45,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3509.62817 ± 386.082
2025-09-13 04:23:45,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2958.8823, 4120.5615, 3533.4604, 3166.1162, 3442.4592, 3965.7498, 3107.2456, 3157.137, 3798.0737, 3846.593]
2025-09-13 04:23:45,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:23:45,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (3509.63) for latency ExtremeSparseL4U32
2025-09-13 04:23:45,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 17 hours, 59 minutes, 6 seconds)
2025-09-13 04:34:26,443 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:34:26,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:39:27,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3631.27295 ± 210.769
2025-09-13 04:39:27,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3372.561, 3836.2695, 3524.8584, 3833.4832, 3985.1016, 3684.5361, 3375.4226, 3691.2554, 3668.7861, 3340.4585]
2025-09-13 04:39:27,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:39:27,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (3631.27) for latency ExtremeSparseL4U32
2025-09-13 04:39:27,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 17 hours, 41 minutes, 33 seconds)
2025-09-13 04:50:09,101 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:50:09,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:55:04,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3441.76294 ± 259.847
2025-09-13 04:55:04,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4008.6863, 3074.3025, 3421.2268, 3529.7236, 3282.7302, 3531.3662, 3639.18, 3485.1897, 3080.5178, 3364.7036]
2025-09-13 04:55:04,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:55:04,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 17 hours, 21 minutes, 24 seconds)
2025-09-13 05:05:44,778 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:05:44,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:10:41,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3477.32104 ± 226.778
2025-09-13 05:10:41,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3650.1543, 3423.9907, 3822.708, 3366.329, 3601.8, 3052.0142, 3362.994, 3578.417, 3202.9517, 3711.8523]
2025-09-13 05:10:41,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:10:41,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 17 hours, 1 minute, 42 seconds)
2025-09-13 05:21:23,741 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:21:23,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:26:20,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3435.13232 ± 585.847
2025-09-13 05:26:20,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3814.0552, 3882.0056, 3259.3547, 3907.651, 1880.4105, 3885.8801, 3448.798, 3801.8386, 3244.2786, 3227.0493]
2025-09-13 05:26:20,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:26:20,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 16 hours, 42 minutes, 5 seconds)
2025-09-13 05:37:00,968 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:37:00,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:42:00,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3437.51416 ± 279.281
2025-09-13 05:42:00,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3596.4924, 3177.2083, 3658.04, 3467.9492, 3570.8743, 3451.371, 3686.947, 2843.3552, 3780.0627, 3142.841]
2025-09-13 05:42:00,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:42:00,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 16 hours, 25 minutes, 54 seconds)
2025-09-13 05:52:40,227 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:52:40,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:57:36,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3518.40308 ± 315.383
2025-09-13 05:57:36,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3592.7444, 3620.5356, 3573.351, 3219.2139, 3577.571, 4048.6445, 3490.483, 2765.8818, 3669.7334, 3625.87]
2025-09-13 05:57:36,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:57:36,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 16 hours, 9 minutes, 6 seconds)
2025-09-13 06:08:17,287 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:08:17,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:13:13,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3588.21216 ± 286.539
2025-09-13 06:13:13,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3824.161, 2911.6636, 3854.0667, 3543.6147, 3672.1572, 3860.6042, 3518.6711, 3877.2375, 3379.6833, 3440.2634]
2025-09-13 06:13:13,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:13:13,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 15 hours, 53 minutes, 27 seconds)
2025-09-13 06:23:55,873 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:23:55,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:28:57,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3749.41089 ± 185.521
2025-09-13 06:28:57,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3541.0696, 3875.1558, 3660.0713, 3799.3672, 3645.755, 3672.4736, 4201.712, 3557.3635, 3682.6318, 3858.5068]
2025-09-13 06:28:57,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:28:57,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (3749.41) for latency ExtremeSparseL4U32
2025-09-13 06:28:57,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 39 minutes, 13 seconds)
2025-09-13 06:39:40,410 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:39:40,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:44:41,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3651.31641 ± 217.436
2025-09-13 06:44:41,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3645.0608, 3803.0322, 3656.6606, 3592.9814, 3811.5134, 3652.0144, 3990.3206, 3378.6243, 3791.687, 3191.2695]
2025-09-13 06:44:41,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:44:41,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 15 hours, 24 minutes, 24 seconds)
2025-09-13 06:55:22,819 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:55:22,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:00:19,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3463.82959 ± 276.022
2025-09-13 07:00:19,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3597.0942, 3270.0981, 3483.76, 3579.3125, 2894.5884, 3538.2031, 3788.1292, 3864.023, 3150.9443, 3472.1465]
2025-09-13 07:00:19,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 07:00:19,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 15 hours, 8 minutes, 34 seconds)
2025-09-13 07:11:00,396 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:11:00,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:15:55,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3870.92334 ± 281.903
2025-09-13 07:15:55,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3595.1926, 3347.0674, 3507.8245, 4258.0854, 3938.0361, 4178.3857, 3894.0325, 3896.4502, 4057.5374, 4036.6194]
2025-09-13 07:15:55,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 07:15:55,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (3870.92) for latency ExtremeSparseL4U32
2025-09-13 07:15:55,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 14 hours, 52 minutes, 41 seconds)
2025-09-13 07:26:34,987 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:26:34,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:31:37,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3520.39380 ± 314.261
2025-09-13 07:31:37,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3120.912, 3650.6514, 3939.4294, 3372.5867, 3288.3735, 2957.2073, 3831.5232, 3754.6973, 3470.7793, 3817.776]
2025-09-13 07:31:37,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 07:31:37,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 14 hours, 37 minutes, 59 seconds)
2025-09-13 07:42:18,640 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:42:18,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:47:17,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3567.39331 ± 183.376
2025-09-13 07:47:17,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3667.2715, 3509.8445, 3692.1453, 3451.4731, 3531.657, 3169.0586, 3467.8906, 3614.699, 3668.253, 3901.6409]
2025-09-13 07:47:17,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 07:47:17,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 14 hours, 21 minutes, 37 seconds)
2025-09-13 07:58:00,438 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:58:00,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:02:57,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3675.55981 ± 280.116
2025-09-13 08:02:57,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3468.0789, 3471.5771, 3703.454, 4185.675, 3876.3445, 3248.8416, 3448.199, 3897.4358, 3493.2908, 3962.7048]
2025-09-13 08:02:57,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:02:57,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 14 hours, 5 minutes, 23 seconds)
2025-09-13 08:13:40,304 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:13:40,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:18:36,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3787.94336 ± 247.916
2025-09-13 08:18:36,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3636.524, 3953.9973, 3906.5046, 3441.9902, 3536.314, 3988.5742, 4132.4575, 4130.6426, 3553.6333, 3598.7957]
2025-09-13 08:18:36,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:18:36,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 13 hours, 49 minutes, 45 seconds)
2025-09-13 08:29:17,479 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:29:17,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:34:13,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3716.26440 ± 280.520
2025-09-13 08:34:13,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4027.963, 3647.1697, 3582.611, 4007.121, 3660.9697, 3828.1096, 3874.713, 4035.7092, 3290.349, 3207.9297]
2025-09-13 08:34:13,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:34:13,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 13 hours, 34 minutes, 21 seconds)
2025-09-13 08:44:53,012 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:44:53,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:49:55,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3802.21729 ± 234.163
2025-09-13 08:49:55,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3756.302, 3785.8896, 4034.817, 4117.7773, 3998.8271, 3539.4956, 3955.37, 3922.6753, 3511.2395, 3399.7832]
2025-09-13 08:49:55,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:49:55,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 13 hours, 18 minutes, 38 seconds)
2025-09-13 09:00:36,301 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:00:36,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:05:38,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3679.09521 ± 375.070
2025-09-13 09:05:39,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4019.8164, 4026.3015, 3992.0728, 2922.8953, 4021.4312, 3102.2498, 3535.8171, 3605.2493, 3821.2776, 3743.8457]
2025-09-13 09:05:39,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 09:05:39,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 13 hours, 3 minutes, 37 seconds)
2025-09-13 09:16:20,129 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:16:20,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:21:17,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3791.30713 ± 178.639
2025-09-13 09:21:17,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3996.8796, 3718.3718, 3959.5828, 3508.842, 3433.8564, 3923.6833, 3800.0388, 3901.5027, 3867.3892, 3802.9214]
2025-09-13 09:21:17,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 09:21:17,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 12 hours, 47 minutes, 37 seconds)
2025-09-13 09:32:00,682 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:32:00,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:37:01,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3636.20117 ± 337.557
2025-09-13 09:37:01,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3381.5735, 3636.651, 3975.9897, 3127.6865, 4200.2993, 3259.6821, 3834.131, 3530.1763, 4007.8188, 3408.0022]
2025-09-13 09:37:01,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 09:37:01,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 32 minutes, 51 seconds)
2025-09-13 09:47:43,181 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:47:43,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:52:42,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3836.39526 ± 339.932
2025-09-13 09:52:42,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3983.1821, 3119.8057, 3706.2043, 4092.2773, 3837.391, 3877.89, 4138.8247, 3333.6184, 4063.512, 4211.246]
2025-09-13 09:52:42,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 09:52:42,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 17 minutes, 47 seconds)
2025-09-13 10:03:23,829 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:03:23,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:08:21,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3716.59058 ± 285.079
2025-09-13 10:08:21,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3766.853, 3891.2532, 3850.9944, 3493.071, 3970.2622, 3649.4585, 4316.125, 3412.923, 3411.5789, 3403.3855]
2025-09-13 10:08:21,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:08:21,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 12 hours, 1 minute, 42 seconds)
2025-09-13 10:19:02,208 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:19:02,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:23:56,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3703.26489 ± 476.672
2025-09-13 10:23:56,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3987.482, 4182.731, 3075.955, 2870.9219, 3524.2178, 3987.9666, 3359.6592, 3597.0544, 4437.56, 4009.1028]
2025-09-13 10:23:56,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:23:56,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 11 hours, 44 minutes, 37 seconds)
2025-09-13 10:34:38,316 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:34:38,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:39:34,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3776.84619 ± 294.945
2025-09-13 10:39:34,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3582.9268, 4292.336, 3140.9346, 3926.1191, 3793.3796, 3894.816, 3793.9192, 4019.7551, 3803.7778, 3520.4976]
2025-09-13 10:39:34,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:39:34,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 11 hours, 28 minutes, 52 seconds)
2025-09-13 10:50:16,925 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:50:16,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:55:17,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3611.00391 ± 386.074
2025-09-13 10:55:17,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2619.4292, 4130.9126, 3667.4868, 3793.1814, 3964.4766, 3725.7078, 3397.5598, 3531.7358, 3706.9546, 3572.5945]
2025-09-13 10:55:17,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:55:17,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 11 hours, 13 minutes, 5 seconds)
2025-09-13 11:06:01,058 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:06:01,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:10:58,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3706.78760 ± 174.113
2025-09-13 11:10:58,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3667.624, 3513.7668, 3644.6602, 3769.5752, 4033.6814, 3841.841, 3941.8691, 3505.7805, 3554.3687, 3594.709]
2025-09-13 11:10:58,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:10:58,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 10 hours, 57 minutes, 22 seconds)
2025-09-13 11:21:40,438 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:21:40,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:26:42,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3767.05127 ± 283.314
2025-09-13 11:26:42,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3708.4255, 4024.3096, 3742.2498, 3625.2776, 3336.5266, 3871.393, 4112.5537, 3840.2156, 3266.819, 4142.7373]
2025-09-13 11:26:42,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:26:42,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 10 hours, 42 minutes, 28 seconds)
2025-09-13 11:37:22,561 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:37:22,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:42:24,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3922.36133 ± 176.150
2025-09-13 11:42:24,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4000.0613, 4122.481, 3926.468, 4141.6797, 3935.5413, 3715.5112, 4135.598, 3901.973, 3703.9937, 3640.309]
2025-09-13 11:42:24,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:42:24,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (3922.36) for latency ExtremeSparseL4U32
2025-09-13 11:42:24,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 10 hours, 27 minutes, 40 seconds)
2025-09-13 11:53:04,172 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:53:04,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:58:07,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3734.87891 ± 245.731
2025-09-13 11:58:07,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3604.9932, 3784.7366, 3487.2874, 3750.0886, 3645.7644, 3901.6755, 3723.1335, 3885.5283, 4262.25, 3303.33]
2025-09-13 11:58:07,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:58:07,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 10 hours, 12 minutes, 40 seconds)
2025-09-13 12:08:47,856 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:08:47,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:13:44,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3856.28467 ± 203.224
2025-09-13 12:13:44,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4226.1284, 3571.7415, 3714.2378, 4076.1865, 3822.1003, 3691.708, 3682.3933, 3907.91, 4098.1616, 3772.2825]
2025-09-13 12:13:44,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 12:13:44,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 9 hours, 56 minutes, 9 seconds)
2025-09-13 12:24:27,480 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:24:27,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:29:30,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3601.93896 ± 288.212
2025-09-13 12:29:30,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3305.2903, 3599.4248, 3253.0874, 3936.0923, 3437.686, 3584.5054, 4098.9556, 3276.0718, 3568.965, 3959.3118]
2025-09-13 12:29:30,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 12:29:30,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 9 hours, 41 minutes, 9 seconds)
2025-09-13 12:40:12,686 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:40:12,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:45:14,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3546.76367 ± 772.985
2025-09-13 12:45:14,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1318.7421, 4019.116, 3988.9243, 3399.0002, 3820.5989, 4019.8376, 3623.6646, 3995.6462, 3793.6028, 3488.5012]
2025-09-13 12:45:14,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 12:45:14,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 9 hours, 25 minutes, 22 seconds)
2025-09-13 12:55:58,877 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:55:58,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:01:01,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4001.40112 ± 199.735
2025-09-13 13:01:01,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4068.6904, 4091.164, 4478.168, 3893.813, 3882.326, 3994.9133, 4049.4075, 3712.414, 3789.871, 4053.247]
2025-09-13 13:01:01,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 13:01:01,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (4001.40) for latency ExtremeSparseL4U32
2025-09-13 13:01:01,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 9 hours, 10 minutes, 19 seconds)
2025-09-13 13:11:42,273 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:11:42,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:16:41,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3785.81201 ± 351.593
2025-09-13 13:16:41,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3521.185, 3604.6973, 3989.9387, 3499.737, 4215.2935, 3687.7275, 4131.756, 3130.5527, 3775.0754, 4302.156]
2025-09-13 13:16:41,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 13:16:41,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 8 hours, 54 minutes, 17 seconds)
2025-09-13 13:27:24,077 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:27:24,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:32:24,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3716.51514 ± 346.284
2025-09-13 13:32:24,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4029.1826, 3589.8457, 3909.8643, 2875.865, 3531.4468, 3657.5505, 3930.4224, 4015.0166, 3539.076, 4086.8828]
2025-09-13 13:32:24,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 13:32:24,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 39 minutes, 10 seconds)
2025-09-13 13:43:06,629 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:43:06,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:48:02,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3850.61914 ± 363.772
2025-09-13 13:48:02,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4320.742, 3125.431, 3540.3147, 3595.3738, 3929.0193, 4081.7886, 3571.5684, 4012.3394, 4328.9883, 4000.6292]
2025-09-13 13:48:02,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 13:48:02,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 22 minutes, 39 seconds)
2025-09-13 13:58:45,494 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:58:45,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:03:42,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3534.82178 ± 1103.162
2025-09-13 14:03:42,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3978.2563, 4291.193, 3808.4731, 319.81747, 4063.764, 3846.5574, 3228.5642, 3835.0684, 4082.2737, 3894.2493]
2025-09-13 14:03:42,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 14:03:42,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 6 minutes, 27 seconds)
2025-09-13 14:14:26,336 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:14:26,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:19:22,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3898.17773 ± 184.354
2025-09-13 14:19:22,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3644.4575, 4274.349, 3813.5088, 3718.0706, 3835.002, 3981.222, 3871.7375, 3925.8096, 3765.5815, 4152.039]
2025-09-13 14:19:22,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 14:19:22,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 50 minutes, 10 seconds)
2025-09-13 14:30:05,942 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:30:05,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:35:02,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4005.69067 ± 134.422
2025-09-13 14:35:02,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4183.7183, 3731.4744, 4132.3105, 4091.0, 3899.5525, 3965.3706, 3877.404, 4102.764, 4106.67, 3966.6416]
2025-09-13 14:35:02,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 14:35:02,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (4005.69) for latency ExtremeSparseL4U32
2025-09-13 14:35:02,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 34 minutes, 22 seconds)
2025-09-13 14:45:44,541 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:45:44,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:50:39,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3855.69214 ± 257.102
2025-09-13 14:50:39,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3796.5042, 3997.4841, 3885.0254, 3892.1462, 3396.8296, 3600.9482, 4256.911, 4114.1235, 3555.9448, 4061.0046]
2025-09-13 14:50:39,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 14:50:39,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 18 minutes, 14 seconds)
2025-09-13 15:01:18,656 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:01:18,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:06:12,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3783.18164 ± 319.091
2025-09-13 15:06:12,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3598.4834, 3604.6562, 3428.8755, 4051.3633, 3666.3164, 4392.893, 4034.3135, 3255.0786, 3908.087, 3891.749]
2025-09-13 15:06:12,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 15:06:12,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 2 minutes, 4 seconds)
2025-09-13 15:16:54,453 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:16:54,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:21:57,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3833.84033 ± 417.551
2025-09-13 15:21:57,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3256.9065, 3315.9153, 3796.1682, 4319.347, 3287.526, 4465.7915, 4041.6116, 3850.5542, 4233.8394, 3770.7407]
2025-09-13 15:21:57,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 15:21:57,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 46 minutes, 54 seconds)
2025-09-13 15:32:41,110 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:32:41,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:37:42,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3784.40479 ± 226.469
2025-09-13 15:37:42,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3474.6538, 4202.912, 3648.3215, 3799.9905, 4063.9563, 3448.9888, 3758.3806, 3926.086, 3832.7253, 3688.033]
2025-09-13 15:37:42,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 15:37:42,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 31 minutes, 38 seconds)
2025-09-13 15:48:23,172 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:48:23,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:53:25,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4089.89136 ± 179.541
2025-09-13 15:53:25,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4113.0244, 3846.7678, 3950.0593, 4320.851, 4248.9355, 4106.6895, 4115.869, 3738.6755, 4256.3247, 4201.7188]
2025-09-13 15:53:25,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 15:53:25,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (4089.89) for latency ExtremeSparseL4U32
2025-09-13 15:53:25,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 16 minutes, 18 seconds)
2025-09-13 16:04:09,052 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:04:09,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:09:14,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3827.65869 ± 240.394
2025-09-13 16:09:14,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4124.8013, 4154.834, 3840.3103, 3861.1619, 3781.7925, 3511.4424, 3334.4614, 3767.9211, 3914.8513, 3985.012]
2025-09-13 16:09:14,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 16:09:14,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 1 minute, 28 seconds)
2025-09-13 16:19:59,769 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:19:59,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:25:01,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3944.11792 ± 259.262
2025-09-13 16:25:01,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4109.7837, 4028.6345, 3924.043, 3814.5461, 4488.5723, 4164.513, 3606.7551, 3950.2307, 3774.2178, 3579.8845]
2025-09-13 16:25:01,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 16:25:01,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 46 minutes, 47 seconds)
2025-09-13 16:35:44,411 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:35:44,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:40:44,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3617.68481 ± 219.323
2025-09-13 16:40:44,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3487.6055, 3562.128, 3990.9617, 3772.227, 3342.9375, 3719.4348, 3635.186, 3577.9001, 3859.8953, 3228.5737]
2025-09-13 16:40:44,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 16:40:44,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 30 minutes, 53 seconds)
2025-09-13 16:51:27,187 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:51:27,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:56:33,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4038.30029 ± 211.310
2025-09-13 16:56:33,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4102.0293, 4160.642, 3825.6907, 3875.264, 3618.4595, 4244.268, 4042.1968, 4112.0894, 4003.0718, 4399.2935]
2025-09-13 16:56:33,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 16:56:33,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 15 minutes, 24 seconds)
2025-09-13 17:07:16,684 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:07:16,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:12:19,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4120.79736 ± 278.603
2025-09-13 17:12:19,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3690.184, 4124.73, 4452.1816, 4237.322, 3559.8254, 4418.843, 4168.592, 4359.058, 4055.007, 4142.2305]
2025-09-13 17:12:19,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 17:12:19,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (4120.80) for latency ExtremeSparseL4U32
2025-09-13 17:12:19,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 59 minutes, 46 seconds)
2025-09-13 17:23:01,658 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:23:01,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:28:06,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3657.96558 ± 906.280
2025-09-13 17:28:06,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4150.3784, 4164.5503, 4329.344, 4156.43, 3247.7834, 1146.1575, 3421.4614, 4309.477, 3819.3943, 3834.6797]
2025-09-13 17:28:06,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 17:28:06,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 43 minutes, 54 seconds)
2025-09-13 17:38:50,455 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:38:50,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:43:52,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3982.78247 ± 308.764
2025-09-13 17:43:52,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3644.836, 4358.8335, 4239.293, 4200.4907, 3782.7031, 3749.0874, 4129.702, 3384.1262, 4311.295, 4027.4565]
2025-09-13 17:43:52,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 17:43:52,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 28 minutes, 3 seconds)
2025-09-13 17:54:34,858 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:54:34,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:59:30,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3781.86060 ± 333.440
2025-09-13 17:59:30,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3999.0146, 3817.7065, 3996.1885, 4189.5537, 3107.6082, 3555.8992, 4170.3535, 3343.8774, 3821.3552, 3817.0522]
2025-09-13 17:59:30,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 17:59:30,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 12 minutes, 5 seconds)
2025-09-13 18:10:14,341 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:10:14,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:15:11,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3971.11768 ± 131.417
2025-09-13 18:15:11,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3844.855, 4112.074, 3794.867, 4024.3044, 3803.9963, 4113.5176, 3857.094, 4038.165, 3958.4788, 4163.8257]
2025-09-13 18:15:11,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 18:15:11,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 55 minutes, 54 seconds)
2025-09-13 18:25:54,994 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:25:54,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:30:52,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3740.55664 ± 288.984
2025-09-13 18:30:52,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3581.2234, 3948.8335, 3544.5535, 4172.4233, 4177.045, 3250.5005, 3445.362, 3640.1082, 3824.3142, 3821.2024]
2025-09-13 18:30:52,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 18:30:52,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 39 minutes, 56 seconds)
2025-09-13 18:41:36,374 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:41:36,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:46:40,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3850.81445 ± 193.900
2025-09-13 18:46:40,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3684.3796, 4125.616, 3907.0713, 3918.1077, 4187.0405, 3731.132, 3975.8315, 3585.1057, 3742.1956, 3651.6648]
2025-09-13 18:46:40,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 18:46:40,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 24 minutes, 15 seconds)
2025-09-13 18:57:19,668 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:57:19,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:02:15,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3606.61719 ± 907.615
2025-09-13 19:02:15,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3903.7139, 3679.5437, 2928.474, 4121.8213, 3998.6067, 3924.5117, 4254.228, 4001.7605, 4154.538, 1098.9727]
2025-09-13 19:02:15,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:02:15,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 8 minutes, 8 seconds)
2025-09-13 19:12:57,290 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:12:57,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:17:56,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4203.12646 ± 279.180
2025-09-13 19:17:56,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4207.29, 4041.2065, 4409.904, 4542.2334, 3960.8423, 4089.81, 4434.4717, 4288.682, 4476.74, 3580.0854]
2025-09-13 19:17:56,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:17:56,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (4203.13) for latency ExtremeSparseL4U32
2025-09-13 19:17:56,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 52 minutes, 32 seconds)
2025-09-13 19:28:36,725 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:28:36,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:33:35,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3903.41553 ± 288.144
2025-09-13 19:33:35,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4239.407, 3926.4138, 4100.565, 3449.147, 3731.9172, 3773.879, 4070.412, 3528.6428, 3811.688, 4402.0864]
2025-09-13 19:33:35,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:33:35,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 36 minutes, 48 seconds)
2025-09-13 19:44:15,331 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:44:15,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:49:16,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3737.35229 ± 333.106
2025-09-13 19:49:16,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4117.908, 4195.437, 3705.2947, 3191.5999, 3633.445, 3328.7239, 3368.0823, 4032.0789, 3930.8337, 3870.1226]
2025-09-13 19:49:16,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:49:16,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 21 minutes, 7 seconds)
2025-09-13 19:59:57,413 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:59:57,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:04:54,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3853.93750 ± 296.816
2025-09-13 20:04:54,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4251.724, 4101.5786, 3865.0254, 3663.8232, 3180.2214, 3886.1743, 3556.8887, 3996.3254, 4079.472, 3958.1392]
2025-09-13 20:04:54,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 20:04:54,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 5 minutes, 10 seconds)
2025-09-13 20:15:37,674 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:15:37,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:20:42,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3907.44800 ± 398.040
2025-09-13 20:20:42,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4213.025, 4078.3484, 3664.8667, 4208.674, 3156.4497, 4238.5796, 3309.8818, 4404.8306, 3769.6182, 4030.2083]
2025-09-13 20:20:42,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 20:20:42,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 49 minutes, 49 seconds)
2025-09-13 20:31:27,609 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:31:27,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:36:32,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3839.39209 ± 154.309
2025-09-13 20:36:32,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3609.821, 3786.4524, 4042.661, 3913.9307, 3924.2502, 3948.4104, 3918.0557, 3790.433, 3935.0234, 3524.8823]
2025-09-13 20:36:32,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 20:36:32,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 34 minutes, 19 seconds)
2025-09-13 20:47:16,910 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:47:16,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:52:17,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3853.89380 ± 425.765
2025-09-13 20:52:17,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4679.379, 4176.9604, 3404.5361, 3357.3806, 3436.6086, 3442.2278, 3782.8477, 3978.342, 4298.905, 3981.7507]
2025-09-13 20:52:17,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 20:52:17,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 18 minutes, 41 seconds)
2025-09-13 21:02:59,304 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:02:59,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:08:01,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4041.43359 ± 322.260
2025-09-13 21:08:01,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3857.977, 3668.0154, 4453.6772, 4338.4414, 4082.079, 3493.4182, 4345.2837, 4340.193, 3701.9316, 4133.3193]
2025-09-13 21:08:01,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 21:08:01,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 2 minutes, 59 seconds)
2025-09-13 21:18:43,320 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:18:43,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:23:38,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4145.52441 ± 136.906
2025-09-13 21:23:38,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3944.9194, 4365.1665, 4057.6755, 4235.856, 4109.8477, 4232.925, 4277.6465, 3908.9653, 4149.791, 4172.4517]
2025-09-13 21:23:38,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 21:23:38,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 47 minutes, 14 seconds)
2025-09-13 21:34:20,726 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:34:20,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:39:16,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4054.56299 ± 384.473
2025-09-13 21:39:16,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3339.7634, 4042.5781, 4616.3174, 3561.5125, 4176.293, 4242.3994, 3739.0972, 4019.0444, 4484.674, 4323.95]
2025-09-13 21:39:16,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 21:39:16,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 31 minutes, 25 seconds)
2025-09-13 21:49:58,022 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:49:58,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:54:55,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3989.13086 ± 345.335
2025-09-13 21:54:55,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4154.549, 3626.8123, 4051.724, 4438.5513, 3636.7769, 4172.459, 4348.4854, 4017.155, 3276.2766, 4168.5186]
2025-09-13 21:54:55,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 21:54:55,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 15 minutes, 40 seconds)
2025-09-13 22:05:35,621 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:05:35,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:10:37,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3957.42920 ± 827.048
2025-09-13 22:10:37,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3948.1477, 4693.1646, 1668.1715, 4088.8281, 4441.566, 3521.8625, 4313.9395, 4198.394, 4102.0557, 4598.165]
2025-09-13 22:10:37,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 22:10:37,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1251 [DEBUG]: Training session finished
