2025-09-12 20:09:32,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc10-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay
2025-09-12 20:09:32,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc10-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay
2025-09-12 20:09:32,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x145dea340550>}
2025-09-12 20:09:32,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1111 [DEBUG]: using device: cuda
2025-09-12 20:09:32,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-12 20:09:32,175 baseline-mbpac-noiseperc10-halfcheetah:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-12 20:09:32,176 baseline-mbpac-noiseperc10-halfcheetah:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 20:09:32,183 baseline-mbpac-noiseperc10-halfcheetah:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-12 20:09:33,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-12 20:09:33,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-12 20:21:08,087 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:21:08,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:26:15,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: -330.25119 ± 15.783
2025-09-12 20:26:15,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [-322.48245, -334.8942, -359.26743, -316.29913, -348.37363, -342.64212, -317.26498, -334.10544, -322.30756, -304.87506]
2025-09-12 20:26:15,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:26:15,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (-330.25) for latency ExtremeSparseL4U32
2025-09-12 20:26:15,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 27 hours, 33 minutes, 58 seconds)
2025-09-12 20:36:56,352 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:36:56,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:41:51,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3.09461 ± 48.861
2025-09-12 20:41:51,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [-39.239098, 22.826067, 39.047283, -53.02852, -4.922853, -9.634551, -81.62591, 83.49573, 60.256584, 13.771327]
2025-09-12 20:41:51,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:41:51,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (3.09) for latency ExtremeSparseL4U32
2025-09-12 20:41:51,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 26 hours, 22 minutes, 51 seconds)
2025-09-12 20:52:32,024 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:52:32,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 20:57:28,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 255.91133 ± 93.934
2025-09-12 20:57:28,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [253.37708, 302.34058, 318.47232, 247.7747, 96.73448, 200.0495, 468.24097, 166.93372, 270.1164, 235.07358]
2025-09-12 20:57:28,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:57:28,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (255.91) for latency ExtremeSparseL4U32
2025-09-12 20:57:28,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 25 hours, 49 minutes, 38 seconds)
2025-09-12 21:08:10,766 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:08:10,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:13:10,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1966.05005 ± 691.640
2025-09-12 21:13:10,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2125.5637, 2576.84, 2416.6206, 2005.4661, 833.3177, 423.01175, 2261.7173, 2395.5598, 2254.7124, 2367.6892]
2025-09-12 21:13:10,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:13:10,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (1966.05) for latency ExtremeSparseL4U32
2025-09-12 21:13:10,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 25 hours, 26 minutes, 43 seconds)
2025-09-12 21:23:55,746 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:23:55,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:29:03,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2379.30615 ± 236.682
2025-09-12 21:29:03,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2472.9946, 2583.177, 2770.9973, 2002.395, 2609.2063, 2324.5796, 2074.8086, 2135.9377, 2458.1646, 2360.8018]
2025-09-12 21:29:03,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:29:03,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (2379.31) for latency ExtremeSparseL4U32
2025-09-12 21:29:03,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 25 hours, 10 minutes, 31 seconds)
2025-09-12 21:39:48,525 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:39:48,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:44:45,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2421.57812 ± 752.378
2025-09-12 21:44:45,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2593.0432, 2593.2163, 182.67662, 2818.229, 2561.7625, 2661.5144, 2569.5479, 2854.287, 2707.8914, 2673.6135]
2025-09-12 21:44:45,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:44:45,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (2421.58) for latency ExtremeSparseL4U32
2025-09-12 21:44:45,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 24 hours, 35 minutes, 38 seconds)
2025-09-12 21:55:31,602 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:55:31,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:00:36,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2643.48462 ± 725.093
2025-09-12 22:00:36,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2748.025, 2956.4397, 2928.34, 2774.8855, 2978.716, 504.40256, 2835.635, 2702.808, 2824.4355, 3181.1558]
2025-09-12 22:00:36,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:00:36,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (2643.48) for latency ExtremeSparseL4U32
2025-09-12 22:00:36,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 24 hours, 24 minutes, 48 seconds)
2025-09-12 22:11:22,193 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:11:22,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:16:21,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2748.94971 ± 360.230
2025-09-12 22:16:21,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2871.0798, 3012.825, 2519.3748, 3215.0818, 2746.9214, 2916.5964, 2909.5664, 2884.0195, 1830.4568, 2583.5767]
2025-09-12 22:16:21,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:16:21,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (2748.95) for latency ExtremeSparseL4U32
2025-09-12 22:16:21,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 24 hours, 11 minutes, 23 seconds)
2025-09-12 22:27:08,206 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:27:08,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:32:04,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2780.28394 ± 370.010
2025-09-12 22:32:04,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1779.6969, 3062.0884, 2662.5718, 2888.305, 2643.063, 2750.9731, 3008.7964, 3170.873, 2932.978, 2903.4932]
2025-09-12 22:32:04,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:32:04,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (2780.28) for latency ExtremeSparseL4U32
2025-09-12 22:32:04,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 23 hours, 56 minutes, 12 seconds)
2025-09-12 22:42:50,731 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:42:50,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:47:57,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2579.96777 ± 734.066
2025-09-12 22:47:57,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2833.2734, 3087.8196, 2780.893, 2956.6096, 452.13498, 2963.78, 2915.5571, 2352.03, 2715.2866, 2742.2922]
2025-09-12 22:47:57,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:47:57,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 23 hours, 40 minutes, 12 seconds)
2025-09-12 22:58:44,666 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:58:44,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:03:42,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2520.14868 ± 789.198
2025-09-12 23:03:42,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2976.3018, 344.5521, 3000.3813, 2720.8308, 2894.802, 1901.5792, 2665.889, 2748.6943, 3034.625, 2913.83]
2025-09-12 23:03:42,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:03:42,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 23 hours, 25 minutes, 20 seconds)
2025-09-12 23:14:28,003 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:14:28,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:19:27,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2716.89209 ± 616.008
2025-09-12 23:19:27,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3056.377, 2684.4067, 3069.0027, 2227.3076, 1031.8422, 2958.5605, 3047.4788, 3091.6023, 3053.9102, 2948.4316]
2025-09-12 23:19:27,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:19:27,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 23 hours, 7 minutes, 42 seconds)
2025-09-12 23:30:12,116 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:30:12,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:35:17,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2465.82080 ± 792.780
2025-09-12 23:35:17,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [533.13324, 2856.966, 3017.7324, 1365.7474, 2637.0625, 2853.3755, 2837.4653, 2736.6167, 2690.6235, 3129.488]
2025-09-12 23:35:17,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:35:17,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 22 hours, 53 minutes, 27 seconds)
2025-09-12 23:46:03,631 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:46:03,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:51:01,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2783.01685 ± 431.612
2025-09-12 23:51:01,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3050.0361, 2830.1138, 2766.3447, 1540.1119, 3000.0938, 2782.363, 3068.1902, 2827.4592, 2852.1692, 3113.2844]
2025-09-12 23:51:01,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:51:01,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (2783.02) for latency ExtremeSparseL4U32
2025-09-12 23:51:01,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 22 hours, 37 minutes, 53 seconds)
2025-09-13 00:01:47,150 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:01:47,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:06:45,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2837.45166 ± 205.781
2025-09-13 00:06:45,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2948.3022, 2923.8708, 2718.0803, 2967.0967, 2781.4006, 2893.4944, 2288.0547, 2875.5815, 2897.2717, 3081.3652]
2025-09-13 00:06:45,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:06:45,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (2837.45) for latency ExtremeSparseL4U32
2025-09-13 00:06:45,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 22 hours, 19 minutes, 30 seconds)
2025-09-13 00:17:32,393 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:17:32,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:22:33,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2926.56226 ± 113.917
2025-09-13 00:22:33,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2970.394, 2801.3591, 2925.8875, 3015.4507, 3117.4304, 3088.814, 2759.155, 2887.8762, 2858.4553, 2840.8008]
2025-09-13 00:22:33,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:22:33,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (2926.56) for latency ExtremeSparseL4U32
2025-09-13 00:22:33,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 22 hours, 4 minutes, 38 seconds)
2025-09-13 00:33:20,978 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:33:20,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:38:19,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2815.46558 ± 116.778
2025-09-13 00:38:19,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2619.3843, 2764.417, 2647.5652, 2856.6738, 2859.108, 2861.437, 2964.2542, 2899.737, 2716.819, 2965.2605]
2025-09-13 00:38:19,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:38:19,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 21 hours, 49 minutes, 12 seconds)
2025-09-13 00:49:05,959 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:49:05,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:54:05,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2964.43286 ± 109.929
2025-09-13 00:54:05,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3009.9182, 3026.0544, 2837.6194, 2888.8755, 3124.6147, 2989.5098, 3123.8142, 2789.2021, 2863.94, 2990.7786]
2025-09-13 00:54:05,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:54:05,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (2964.43) for latency ExtremeSparseL4U32
2025-09-13 00:54:05,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 21 hours, 32 minutes, 16 seconds)
2025-09-13 01:04:53,899 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:04:53,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:09:51,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2522.52026 ± 860.856
2025-09-13 01:09:51,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2609.59, 2637.5085, 2880.9363, 2899.981, 91.3023, 3036.1284, 2048.9775, 3067.7593, 2903.602, 3049.419]
2025-09-13 01:09:51,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:09:51,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 21 hours, 17 minutes, 10 seconds)
2025-09-13 01:20:39,623 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:20:39,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:25:39,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2592.85767 ± 875.477
2025-09-13 01:25:39,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2748.5496, 3002.8367, 2957.5918, -21.174746, 2786.315, 2845.043, 2854.2666, 2811.959, 2941.299, 3001.8909]
2025-09-13 01:25:39,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:25:39,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 21 hours, 2 minutes, 38 seconds)
2025-09-13 01:36:26,559 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:36:26,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:41:35,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2960.17212 ± 55.035
2025-09-13 01:41:35,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3042.0647, 2948.974, 2985.9744, 3000.651, 2992.2126, 3011.7607, 2947.7092, 2925.6794, 2850.5137, 2896.1804]
2025-09-13 01:41:35,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:41:35,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 20 hours, 48 minutes, 47 seconds)
2025-09-13 01:52:21,909 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:52:21,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:57:24,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2688.14893 ± 565.682
2025-09-13 01:57:24,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2566.0803, 2866.3162, 3001.4634, 2956.775, 2923.674, 1035.5325, 3026.0151, 2935.6804, 2751.829, 2818.121]
2025-09-13 01:57:24,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:57:24,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 20 hours, 33 minutes, 43 seconds)
2025-09-13 02:08:12,891 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:08:12,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:13:11,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2343.31201 ± 1005.010
2025-09-13 02:13:11,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2729.7427, 2853.6848, 2667.3577, 3077.3796, 461.64767, 2574.4524, 2976.5854, 3080.2463, 259.7281, 2752.2947]
2025-09-13 02:13:11,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:13:11,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 20 hours, 18 minutes, 14 seconds)
2025-09-13 02:23:58,999 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:23:59,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:28:58,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2786.32812 ± 553.605
2025-09-13 02:28:58,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2929.0725, 2998.3057, 3021.1272, 2986.9758, 2978.194, 2908.272, 2865.9692, 1138.9902, 3129.5952, 2906.7793]
2025-09-13 02:28:58,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:28:58,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 20 hours, 2 minutes, 35 seconds)
2025-09-13 02:39:46,697 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:39:46,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:44:54,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2431.38892 ± 894.289
2025-09-13 02:44:54,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2990.0178, 2681.957, 2966.7598, 2989.8157, 2358.3984, 332.38596, 2926.272, 1104.34, 2961.743, 3002.2014]
2025-09-13 02:44:54,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:44:54,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 19 hours, 48 minutes, 34 seconds)
2025-09-13 02:55:41,261 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:55:41,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:00:45,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2804.88257 ± 187.044
2025-09-13 03:00:45,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2846.9534, 2421.908, 2781.5312, 2891.9814, 2813.403, 2995.8342, 3161.0525, 2762.4702, 2676.1746, 2697.52]
2025-09-13 03:00:45,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:00:45,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 19 hours, 31 minutes, 38 seconds)
2025-09-13 03:11:33,671 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:11:33,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:16:37,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2200.77026 ± 887.691
2025-09-13 03:16:37,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [539.5857, 923.7984, 2702.3367, 1871.2811, 2873.3828, 2901.217, 2837.7847, 3046.6597, 2884.2659, 1427.3916]
2025-09-13 03:16:37,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:16:37,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 19 hours, 16 minutes, 26 seconds)
2025-09-13 03:27:24,905 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:27:24,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:32:23,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2945.89355 ± 111.302
2025-09-13 03:32:23,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2824.0017, 3186.1724, 3040.6262, 2968.7493, 2876.21, 2784.758, 2882.567, 2949.6348, 2920.8271, 3025.3884]
2025-09-13 03:32:23,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:32:23,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 19 hours, 21 seconds)
2025-09-13 03:43:10,723 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:43:10,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:48:10,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2781.73926 ± 490.421
2025-09-13 03:48:10,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3066.3052, 3184.374, 3021.0586, 2950.5437, 2858.9736, 1616.1326, 2064.6165, 3146.9998, 2906.7754, 3001.6125]
2025-09-13 03:48:10,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:48:10,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 18 hours, 44 minutes, 30 seconds)
2025-09-13 03:58:55,829 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:58:55,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:04:03,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2741.12720 ± 621.969
2025-09-13 04:04:03,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2869.1602, 2941.9668, 3039.2656, 2618.2893, 3077.2097, 2896.0159, 2987.556, 2868.3354, 926.0039, 3187.469]
2025-09-13 04:04:03,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:04:03,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 18 hours, 28 minutes, 3 seconds)
2025-09-13 04:14:51,625 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:14:51,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:19:57,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2880.21973 ± 115.429
2025-09-13 04:19:57,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2907.9011, 2709.1375, 3014.6572, 2976.8096, 2918.1172, 2650.1575, 2934.138, 2823.1401, 2865.257, 3002.8806]
2025-09-13 04:19:57,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:19:57,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 18 hours, 12 minutes, 55 seconds)
2025-09-13 04:30:45,159 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:30:45,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:35:47,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2340.93701 ± 1049.484
2025-09-13 04:35:47,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [-90.58749, 3017.6511, 2824.868, 3112.5793, 930.7533, 1834.006, 2927.108, 2796.368, 2837.6316, 3218.9946]
2025-09-13 04:35:47,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:35:47,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 17 hours, 56 minutes, 38 seconds)
2025-09-13 04:46:34,960 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:46:34,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:51:34,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2908.52148 ± 155.232
2025-09-13 04:51:34,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2934.2852, 3066.5386, 2968.0332, 2867.1997, 2978.4827, 2940.1184, 3040.6553, 3009.4744, 2512.6091, 2767.8174]
2025-09-13 04:51:34,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:51:34,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 17 hours, 41 minutes, 2 seconds)
2025-09-13 05:02:20,464 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:02:20,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:07:22,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2893.09497 ± 163.245
2025-09-13 05:07:22,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3064.7632, 2731.3596, 3007.116, 2912.9236, 3048.135, 2866.8987, 2550.531, 3080.0627, 2752.1565, 2917.003]
2025-09-13 05:07:22,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:07:22,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 17 hours, 25 minutes, 22 seconds)
2025-09-13 05:18:11,246 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:18:11,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:23:13,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2449.81128 ± 682.684
2025-09-13 05:23:13,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2919.6924, 2861.6562, 2978.761, 2650.9858, 839.68274, 1951.3225, 3051.295, 1790.8519, 3009.1643, 2444.7004]
2025-09-13 05:23:13,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:23:13,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 17 hours, 9 minutes, 19 seconds)
2025-09-13 05:34:02,205 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:34:02,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:39:02,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2803.09863 ± 432.169
2025-09-13 05:39:02,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3119.7703, 2990.6003, 3022.6436, 2950.6975, 2904.7468, 2641.6938, 2962.399, 1553.9694, 2969.3796, 2915.0884]
2025-09-13 05:39:02,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:39:02,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 16 hours, 52 minutes, 17 seconds)
2025-09-13 05:49:50,310 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:49:50,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:54:51,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2726.81616 ± 521.349
2025-09-13 05:54:51,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2998.2175, 1196.6073, 3012.2163, 2814.7246, 2962.4568, 2698.7563, 2825.4075, 2990.1018, 3003.9468, 2765.7266]
2025-09-13 05:54:51,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:54:51,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 16 hours, 36 minutes, 25 seconds)
2025-09-13 06:05:39,359 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:05:39,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:10:38,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2655.78540 ± 821.197
2025-09-13 06:10:38,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [213.79117, 2768.9336, 2995.715, 3070.2024, 2935.7402, 2743.4746, 3070.423, 2826.8528, 2961.4, 2971.3198]
2025-09-13 06:10:38,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:10:38,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 16 hours, 20 minutes, 27 seconds)
2025-09-13 06:21:26,964 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:21:26,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:26:25,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2908.92310 ± 85.986
2025-09-13 06:26:25,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2962.8467, 2860.342, 3036.5576, 2884.3096, 2874.0361, 2915.6829, 2979.3542, 2699.8345, 2930.2473, 2946.0186]
2025-09-13 06:26:25,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:26:25,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 16 hours, 4 minutes, 32 seconds)
2025-09-13 06:37:14,683 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:37:14,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:42:18,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2648.62500 ± 840.036
2025-09-13 06:42:18,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3178.4224, 2800.4177, 2991.872, 2840.1396, 3056.0918, 2706.481, 2805.2788, 2986.1208, 2961.3303, 160.0958]
2025-09-13 06:42:18,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:42:18,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 48 minutes, 51 seconds)
2025-09-13 06:53:07,600 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:53:07,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:58:09,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2589.59497 ± 794.123
2025-09-13 06:58:09,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3198.4863, 2779.9934, 3047.9336, 2780.317, 3156.6577, 501.5343, 3030.3416, 1855.6439, 2479.736, 3065.3044]
2025-09-13 06:58:09,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:58:09,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 15 hours, 33 minutes, 37 seconds)
2025-09-13 07:09:01,288 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:09:01,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:14:00,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2805.38794 ± 336.031
2025-09-13 07:14:00,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2949.4338, 1826.3832, 2912.909, 2848.4497, 3002.2307, 2899.3276, 2775.2598, 3049.6685, 2816.6694, 2973.5464]
2025-09-13 07:14:00,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 07:14:00,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 15 hours, 18 minutes)
2025-09-13 07:24:49,081 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:24:49,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:29:55,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2880.48193 ± 202.093
2025-09-13 07:29:55,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2920.3215, 3047.272, 2766.7676, 2514.7683, 2615.674, 3155.7944, 2961.2239, 2805.048, 2865.4778, 3152.468]
2025-09-13 07:29:55,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 07:29:55,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 15 hours, 3 minutes, 50 seconds)
2025-09-13 07:40:45,537 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:40:45,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:45:48,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2943.82642 ± 169.060
2025-09-13 07:45:48,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2997.4102, 3109.8892, 2828.2058, 2974.5127, 3159.7146, 2808.0662, 3117.7366, 2919.057, 2563.1956, 2960.4763]
2025-09-13 07:45:48,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 07:45:48,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 14 hours, 49 minutes, 1 second)
2025-09-13 07:56:35,667 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:56:35,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:01:38,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2249.78467 ± 1097.721
2025-09-13 08:01:38,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1401.2701, 3152.503, 3030.589, 2743.8528, 270.41153, 2887.5835, 3039.132, 273.17618, 3057.9968, 2641.33]
2025-09-13 08:01:38,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:01:38,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 14 hours, 32 minutes, 39 seconds)
2025-09-13 08:12:28,630 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:12:28,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:17:33,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2364.39014 ± 806.423
2025-09-13 08:17:33,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3004.1157, 1891.0902, 3043.2578, 716.4167, 3082.3447, 3019.4746, 2667.308, 1618.252, 3039.7651, 1561.8759]
2025-09-13 08:17:33,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:17:33,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 14 hours, 17 minutes, 29 seconds)
2025-09-13 08:28:22,235 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:28:22,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:33:28,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2951.66235 ± 148.304
2025-09-13 08:33:28,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2686.6018, 2777.7134, 2978.9568, 3060.4988, 2916.76, 3248.5374, 3012.8918, 2871.9058, 2928.434, 3034.3218]
2025-09-13 08:33:28,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:33:28,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 14 hours, 2 minutes, 19 seconds)
2025-09-13 08:44:16,714 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:44:16,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:49:25,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3004.50562 ± 316.488
2025-09-13 08:49:25,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2096.5466, 3105.1184, 3014.801, 3075.3423, 3132.233, 2915.2964, 3205.9265, 3140.4446, 3272.7385, 3086.6091]
2025-09-13 08:49:25,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:49:25,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (3004.51) for latency ExtremeSparseL4U32
2025-09-13 08:49:25,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 13 hours, 46 minutes, 43 seconds)
2025-09-13 09:00:15,089 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:00:15,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:05:14,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2802.49341 ± 841.384
2025-09-13 09:05:14,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [303.15897, 3100.8157, 3200.862, 3233.4666, 2960.5867, 3068.9316, 3225.5667, 2892.3645, 2915.0432, 3124.1384]
2025-09-13 09:05:14,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 09:05:14,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 13 hours, 30 minutes, 11 seconds)
2025-09-13 09:16:02,141 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:16:02,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:21:04,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2587.87744 ± 946.385
2025-09-13 09:21:04,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3291.3718, 3052.7874, 3106.9941, 2732.748, 1769.0078, 2957.3088, 7.8597174, 2896.9275, 3060.2979, 3003.474]
2025-09-13 09:21:04,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 09:21:04,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 13 hours, 14 minutes, 27 seconds)
2025-09-13 09:31:53,822 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:31:53,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:36:52,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2757.12231 ± 884.727
2025-09-13 09:36:52,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2983.949, 3315.9917, 2929.755, 3152.0183, 3261.7395, 2988.3784, 151.19054, 2908.0615, 2728.9316, 3151.208]
2025-09-13 09:36:52,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 09:36:52,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 12 hours, 57 minutes, 17 seconds)
2025-09-13 09:47:40,058 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:47:40,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:52:39,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2971.14307 ± 495.517
2025-09-13 09:52:39,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3345.7166, 3004.2817, 3152.4226, 1552.0514, 2905.5527, 3266.6765, 2898.566, 3131.3196, 3309.1506, 3145.6914]
2025-09-13 09:52:39,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 09:52:39,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 40 minutes, 16 seconds)
2025-09-13 10:03:29,217 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:03:29,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:08:36,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2704.41455 ± 897.013
2025-09-13 10:08:36,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3104.37, 2755.4329, 3103.2268, 3244.266, 2283.242, 2949.0117, 3227.8594, 2965.7334, 146.95805, 3264.0425]
2025-09-13 10:08:36,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:08:36,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 24 minutes, 24 seconds)
2025-09-13 10:19:25,871 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:19:25,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:24:30,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2824.40161 ± 643.741
2025-09-13 10:24:30,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3274.5562, 3086.666, 1052.198, 3025.5142, 3132.0244, 2320.0154, 2976.8862, 3283.036, 2975.4695, 3117.6504]
2025-09-13 10:24:30,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:24:30,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 12 hours, 9 minutes, 15 seconds)
2025-09-13 10:35:20,423 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:35:20,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:40:18,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2928.62744 ± 449.812
2025-09-13 10:40:18,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3202.758, 3195.5645, 3081.9927, 1631.0792, 3296.3025, 3056.0933, 2964.6204, 3016.6138, 2852.9563, 2988.2954]
2025-09-13 10:40:18,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:40:18,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 11 hours, 53 minutes, 4 seconds)
2025-09-13 10:51:07,540 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:51:07,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:56:05,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2914.68433 ± 409.123
2025-09-13 10:56:05,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3368.4497, 1867.2847, 3144.33, 2985.8433, 2964.2263, 2883.03, 2642.9702, 3182.8423, 3304.0222, 2803.8462]
2025-09-13 10:56:05,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:56:05,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 11 hours, 37 minutes, 12 seconds)
2025-09-13 11:06:56,542 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:06:56,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:11:57,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3006.05298 ± 240.896
2025-09-13 11:11:57,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3062.4521, 3120.9756, 3137.9639, 2363.472, 3027.1965, 3177.035, 3180.1948, 2905.749, 3217.1714, 2868.3184]
2025-09-13 11:11:57,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:11:57,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (3006.05) for latency ExtremeSparseL4U32
2025-09-13 11:11:57,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 11 hours, 21 minutes, 57 seconds)
2025-09-13 11:22:46,584 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:22:46,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:27:51,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3011.79468 ± 400.163
2025-09-13 11:27:51,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3121.0125, 3091.6619, 3180.8323, 3003.917, 3176.2483, 3087.5447, 3257.202, 1828.4381, 3167.6567, 3203.434]
2025-09-13 11:27:51,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:27:51,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (3011.79) for latency ExtremeSparseL4U32
2025-09-13 11:27:51,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 11 hours, 5 minutes, 38 seconds)
2025-09-13 11:38:40,218 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:38:40,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:43:45,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2695.16821 ± 701.651
2025-09-13 11:43:45,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3016.1367, 3288.9404, 2913.986, 3032.3105, 2725.0886, 3019.532, 1941.4525, 887.0143, 2850.4324, 3276.7888]
2025-09-13 11:43:45,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:43:45,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 10 hours, 49 minutes, 52 seconds)
2025-09-13 11:54:35,709 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:54:35,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:59:40,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2472.96338 ± 1017.499
2025-09-13 11:59:40,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2191.7278, 2898.6758, 441.9076, 3162.3591, 650.5966, 3059.95, 3201.7952, 2578.6357, 3211.0085, 3332.9773]
2025-09-13 11:59:40,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:59:40,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 10 hours, 34 minutes, 54 seconds)
2025-09-13 12:10:28,357 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:10:28,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:15:31,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3172.88354 ± 173.057
2025-09-13 12:15:31,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3210.5613, 3315.0728, 2795.13, 3148.461, 3338.0486, 3357.9553, 3314.6357, 3156.4065, 3147.7817, 2944.7847]
2025-09-13 12:15:31,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 12:15:31,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (3172.88) for latency ExtremeSparseL4U32
2025-09-13 12:15:31,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 10 hours, 19 minutes, 34 seconds)
2025-09-13 12:26:20,081 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:26:20,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:31:26,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2554.77124 ± 895.977
2025-09-13 12:31:26,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [250.72018, 2996.8767, 3115.5715, 3004.4438, 3064.1968, 2958.9922, 1450.4182, 2941.752, 2887.1577, 2877.5825]
2025-09-13 12:31:26,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 12:31:26,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 10 hours, 4 minutes)
2025-09-13 12:42:15,783 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:42:15,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:47:15,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2711.59082 ± 903.253
2025-09-13 12:47:15,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3125.3364, 2879.1501, 2932.3455, 2887.5356, 16.040028, 2984.9001, 2960.2502, 3120.7668, 3121.4033, 3088.1814]
2025-09-13 12:47:15,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 12:47:15,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 9 hours, 47 minutes, 36 seconds)
2025-09-13 12:58:05,370 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:58:05,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:03:13,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2633.68921 ± 1005.806
2025-09-13 13:03:13,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3195.2424, 1260.223, 3255.4094, 2882.7993, 3117.734, 3125.09, 3250.5488, 137.62749, 3162.4556, 2949.7642]
2025-09-13 13:03:13,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 13:03:13,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 9 hours, 32 minutes, 10 seconds)
2025-09-13 13:14:05,212 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:14:05,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:19:09,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3028.48901 ± 336.940
2025-09-13 13:19:09,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3071.932, 2990.118, 3334.7522, 3333.5244, 3041.5837, 3188.9424, 3285.8555, 2959.204, 2969.5383, 2109.439]
2025-09-13 13:19:09,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 13:19:09,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 9 hours, 16 minutes, 22 seconds)
2025-09-13 13:29:59,013 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:29:59,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:34:58,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2770.14062 ± 560.757
2025-09-13 13:34:58,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3160.0261, 3087.3584, 3068.8787, 2719.4487, 1570.358, 3053.5173, 1839.9619, 3106.242, 2775.9106, 3319.7039]
2025-09-13 13:34:58,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 13:34:58,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 9 hours, 10 seconds)
2025-09-13 13:45:46,158 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:45:46,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:50:46,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2562.95972 ± 920.721
2025-09-13 13:50:46,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2752.9802, 3105.077, 3348.498, 2939.5857, 284.67358, 2813.443, 1340.5193, 2920.8823, 2976.7192, 3147.2214]
2025-09-13 13:50:46,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 13:50:46,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 43 minutes, 36 seconds)
2025-09-13 14:01:35,696 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:01:35,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:06:33,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3149.42114 ± 182.176
2025-09-13 14:06:33,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3120.9504, 3225.83, 2785.3103, 3007.7878, 3125.9414, 3390.152, 2940.9883, 3275.4673, 3274.7385, 3347.0452]
2025-09-13 14:06:33,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 14:06:33,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 27 minutes, 31 seconds)
2025-09-13 14:17:22,266 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:17:22,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:22:27,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2437.75659 ± 1204.752
2025-09-13 14:22:27,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [195.79727, 3142.7708, 2908.8906, 3325.752, -67.96302, 2804.6196, 2746.6035, 2815.3447, 3240.1677, 3265.5825]
2025-09-13 14:22:27,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 14:22:27,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 11 minutes, 14 seconds)
2025-09-13 14:33:17,167 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:33:17,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:38:19,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2503.25098 ± 1194.725
2025-09-13 14:38:19,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3192.6006, 3122.09, 3310.3428, 2820.549, 176.44989, 3174.4048, 3182.3845, 83.32009, 2892.5068, 3077.8616]
2025-09-13 14:38:19,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 14:38:19,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 54 minutes, 59 seconds)
2025-09-13 14:49:09,041 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:49:09,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:54:09,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2927.45947 ± 609.327
2025-09-13 14:54:09,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3094.0645, 3152.13, 3349.1843, 1648.6587, 3389.8318, 3160.8774, 3416.0957, 3072.0767, 1817.2997, 3174.378]
2025-09-13 14:54:09,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 14:54:09,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 39 minutes, 19 seconds)
2025-09-13 15:05:00,670 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:05:00,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:10:04,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2601.92920 ± 1030.439
2025-09-13 15:10:04,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3101.7139, 966.18445, 3274.287, 2939.473, 3431.6133, 3108.1777, 3207.656, 265.76758, 3195.2952, 2529.126]
2025-09-13 15:10:04,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 15:10:04,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 24 minutes, 6 seconds)
2025-09-13 15:20:54,452 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:20:54,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:26:01,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3168.84619 ± 354.413
2025-09-13 15:26:01,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3287.3113, 3428.4197, 2229.2026, 3179.6643, 3579.6768, 2989.9045, 3421.6814, 3060.8167, 3261.846, 3249.9387]
2025-09-13 15:26:01,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 15:26:01,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 9 minutes, 3 seconds)
2025-09-13 15:36:52,426 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:36:52,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:41:50,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2834.53589 ± 624.851
2025-09-13 15:41:50,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2854.6055, 2689.7314, 3184.3354, 2735.6472, 2885.9485, 3187.0012, 3187.7246, 3288.226, 3261.4543, 1070.6853]
2025-09-13 15:41:50,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 15:41:50,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 52 minutes, 47 seconds)
2025-09-13 15:52:41,530 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:52:41,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:57:43,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2763.21997 ± 791.712
2025-09-13 15:57:43,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3228.5212, 3294.9358, 3355.262, 2055.901, 3211.4868, 3006.205, 2960.8123, 3185.1003, 657.63824, 2676.3352]
2025-09-13 15:57:43,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 15:57:43,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 36 minutes, 59 seconds)
2025-09-13 16:08:34,207 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:08:34,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:13:39,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2807.69824 ± 850.980
2025-09-13 16:13:39,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3149.2805, 3065.2961, 3107.473, 3365.2554, 3177.3223, 3001.5125, 278.62277, 2958.5073, 2970.1914, 3003.521]
2025-09-13 16:13:39,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 16:13:39,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 21 minutes, 32 seconds)
2025-09-13 16:24:28,125 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:24:28,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:29:29,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2887.54395 ± 950.803
2025-09-13 16:29:29,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [94.96301, 3413.5999, 3350.147, 3194.1917, 3133.5605, 3324.8196, 3246.8787, 3140.3254, 3298.1995, 2678.7546]
2025-09-13 16:29:29,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 16:29:29,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 5 minutes, 17 seconds)
2025-09-13 16:40:20,336 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:40:20,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:45:19,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2956.40674 ± 501.347
2025-09-13 16:45:19,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1743.0254, 2297.651, 3319.39, 3056.671, 3179.6392, 3234.6462, 3316.4282, 3010.7456, 2997.9253, 3407.9453]
2025-09-13 16:45:19,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 16:45:19,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 48 minutes, 56 seconds)
2025-09-13 16:56:09,761 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:56:09,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:01:17,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3002.25293 ± 221.990
2025-09-13 17:01:17,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2960.017, 2655.9148, 3293.5042, 2857.0098, 3218.8267, 3055.3643, 2633.8748, 3227.762, 3165.3892, 2954.868]
2025-09-13 17:01:17,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 17:01:17,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 33 minutes, 40 seconds)
2025-09-13 17:12:09,249 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:12:09,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:17:08,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3168.35596 ± 273.875
2025-09-13 17:17:08,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2520.185, 3163.455, 2981.7808, 2954.3994, 3161.5942, 3359.399, 3294.1062, 3349.3115, 3395.0322, 3504.2983]
2025-09-13 17:17:08,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 17:17:08,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 17 minutes, 39 seconds)
2025-09-13 17:27:58,791 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:27:58,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:33:02,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2507.02295 ± 878.897
2025-09-13 17:33:02,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [972.67334, 3320.361, 3037.0845, 3253.569, 3125.4314, 3019.319, 1568.483, 2015.6628, 3396.2988, 1361.346]
2025-09-13 17:33:02,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 17:33:02,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 1 minute, 41 seconds)
2025-09-13 17:43:53,715 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:43:53,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:48:57,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2849.06763 ± 814.936
2025-09-13 17:48:57,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3036.7688, 3290.5654, 3203.3147, 2089.7925, 3040.0432, 647.49554, 3297.4658, 3431.298, 3227.9229, 3226.0068]
2025-09-13 17:48:57,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 17:48:57,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 46 minutes, 5 seconds)
2025-09-13 17:59:48,319 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:59:48,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:04:56,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2640.11597 ± 864.321
2025-09-13 18:04:56,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3222.4407, 2717.4392, 239.14105, 2551.8018, 3183.8186, 3038.0837, 2226.2498, 3332.1787, 2767.7803, 3122.2258]
2025-09-13 18:04:56,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 18:04:56,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 30 minutes, 42 seconds)
2025-09-13 18:15:49,104 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:15:49,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:20:52,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3239.61255 ± 199.783
2025-09-13 18:20:52,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3292.6577, 3230.615, 2906.378, 3451.7615, 3040.6118, 3160.831, 3538.4976, 3520.232, 3104.2927, 3150.248]
2025-09-13 18:20:52,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 18:20:52,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (3239.61) for latency ExtremeSparseL4U32
2025-09-13 18:20:52,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 14 minutes, 40 seconds)
2025-09-13 18:31:44,310 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:31:44,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:36:51,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2561.24756 ± 1194.669
2025-09-13 18:36:51,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1107.8622, 3210.833, 3227.9705, 3378.5405, 3508.3164, 104.44065, 1162.1827, 3380.4663, 3077.4326, 3454.4297]
2025-09-13 18:36:51,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 18:36:51,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 59 minutes, 11 seconds)
2025-09-13 18:47:42,898 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:47:42,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:52:49,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2962.85986 ± 373.765
2025-09-13 18:52:49,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1944.3655, 2930.6765, 2969.964, 3209.1877, 3041.6445, 3319.732, 2785.3435, 3215.0024, 3224.8408, 2987.8425]
2025-09-13 18:52:49,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 18:52:49,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 43 minutes, 23 seconds)
2025-09-13 19:03:42,260 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:03:42,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:08:41,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2353.75439 ± 1185.120
2025-09-13 19:08:41,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3042.9607, 3375.8079, 3257.6865, 3347.8806, 1981.5192, 901.2678, 3034.5637, 82.8944, 3446.5684, 1066.3943]
2025-09-13 19:08:41,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:08:41,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 27 minutes, 17 seconds)
2025-09-13 19:19:33,405 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:19:33,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:24:38,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2885.07373 ± 764.019
2025-09-13 19:24:38,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3340.3762, 3331.9407, 3136.2046, 3024.868, 3240.687, 3421.7231, 3381.8384, 1730.4584, 1075.7001, 3166.941]
2025-09-13 19:24:38,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:24:38,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 11 minutes, 15 seconds)
2025-09-13 19:35:29,283 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:35:29,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:40:32,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2854.68384 ± 762.302
2025-09-13 19:40:32,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3083.219, 3354.4512, 3051.2356, 3215.3567, 3138.7068, 740.44226, 3104.172, 2949.9326, 2363.759, 3545.5625]
2025-09-13 19:40:32,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:40:33,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 55 minutes, 16 seconds)
2025-09-13 19:51:25,727 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:51:25,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:56:32,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3211.14624 ± 195.652
2025-09-13 19:56:32,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2847.1018, 3379.3206, 3288.0618, 3091.2986, 3197.0984, 2975.2285, 3486.318, 3272.6511, 3121.6868, 3452.6953]
2025-09-13 19:56:32,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:56:32,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 39 minutes, 21 seconds)
2025-09-13 20:07:23,474 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:07:23,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:12:30,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3099.63330 ± 545.679
2025-09-13 20:12:30,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3415.2722, 3293.347, 3273.0708, 3336.624, 3145.0105, 3155.2986, 3160.6558, 3486.6433, 1494.3341, 3236.0762]
2025-09-13 20:12:30,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 20:12:30,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 23 minutes, 26 seconds)
2025-09-13 20:23:24,984 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:23:24,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:28:32,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3115.42578 ± 521.920
2025-09-13 20:28:32,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3165.6108, 3120.114, 3596.1594, 3395.051, 3456.9272, 2880.803, 3236.6545, 1656.9146, 3244.3965, 3401.628]
2025-09-13 20:28:32,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 20:28:32,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 7 minutes, 45 seconds)
2025-09-13 20:39:27,512 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:39:27,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:44:26,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2858.10156 ± 479.541
2025-09-13 20:44:26,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3299.9397, 3103.5237, 3300.871, 2393.71, 3099.3804, 3204.2756, 2333.1985, 1807.2577, 3125.5767, 2913.2842]
2025-09-13 20:44:26,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 20:44:26,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 51 minutes, 44 seconds)
2025-09-13 20:55:18,052 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:55:18,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:00:19,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3026.10010 ± 391.585
2025-09-13 21:00:19,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1903.0482, 3002.806, 3217.4038, 3188.6975, 3168.9048, 3323.6038, 3019.3562, 2990.2812, 3122.3564, 3324.5457]
2025-09-13 21:00:19,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 21:00:19,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 35 minutes, 43 seconds)
2025-09-13 21:11:11,440 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:11:11,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:16:11,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2660.49268 ± 1004.750
2025-09-13 21:16:11,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3093.1372, 1899.0027, 3085.0776, 1471.8052, 3260.6506, 3622.1187, 3150.7874, 3265.2266, 358.51492, 3398.605]
2025-09-13 21:16:11,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 21:16:11,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 19 minutes, 39 seconds)
2025-09-13 21:27:02,622 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:27:02,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:32:04,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2780.01611 ± 1167.573
2025-09-13 21:32:04,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3450.8877, 3214.6995, 3176.4966, 3618.3203, 3430.6165, 841.5963, 135.48872, 3316.8862, 3546.4707, 3068.6963]
2025-09-13 21:32:04,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 21:32:04,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 3 minutes, 39 seconds)
2025-09-13 21:42:57,112 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:42:57,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:47:56,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2646.14795 ± 943.365
2025-09-13 21:47:56,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3330.6519, 2834.9634, 3289.9727, 569.04584, 1078.8317, 3403.7249, 3236.126, 2682.8384, 2945.3872, 3089.9348]
2025-09-13 21:47:56,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 21:47:56,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 47 minutes, 38 seconds)
2025-09-13 21:58:45,496 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:58:45,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:03:44,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2954.76343 ± 758.078
2025-09-13 22:03:44,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1005.06854, 3421.3516, 3339.0303, 3075.9966, 3273.651, 3371.9585, 3151.913, 3605.581, 3218.7212, 2084.3625]
2025-09-13 22:03:44,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 22:03:44,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 31 minutes, 43 seconds)
2025-09-13 22:14:36,471 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:14:36,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:19:41,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2889.87549 ± 813.776
2025-09-13 22:19:41,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [738.4456, 3166.111, 3449.3313, 3453.5046, 3328.403, 3019.834, 3168.6785, 3148.914, 3372.283, 2053.2476]
2025-09-13 22:19:41,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 22:19:41,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 15 minutes, 52 seconds)
2025-09-13 22:30:35,067 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:30:35,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:35:35,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3047.49170 ± 748.807
2025-09-13 22:35:35,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3158.6885, 3296.9124, 3183.7097, 3523.863, 850.20593, 3407.939, 3066.7197, 3326.4934, 3116.853, 3543.5308]
2025-09-13 22:35:35,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 22:35:35,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1251 [DEBUG]: Training session finished
