2025-09-12 20:45:46,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc15-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay
2025-09-12 20:45:46,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc15-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay
2025-09-12 20:45:46,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14a7e281eb90>}
2025-09-12 20:45:46,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1111 [DEBUG]: using device: cuda
2025-09-12 20:45:46,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-12 20:45:46,580 baseline-mbpac-noiseperc15-halfcheetah:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-12 20:45:46,580 baseline-mbpac-noiseperc15-halfcheetah:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 20:45:46,587 baseline-mbpac-noiseperc15-halfcheetah:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-12 20:45:47,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-12 20:45:47,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-12 20:57:11,913 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:57:11,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:02:13,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: -276.77142 ± 25.685
2025-09-12 21:02:13,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [-276.1135, -284.22427, -233.2375, -235.46103, -272.21265, -262.80682, -295.81403, -316.40836, -289.70874, -301.72714]
2025-09-12 21:02:13,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:02:13,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (-276.77) for latency ExtremeSparseL4U32
2025-09-12 21:02:13,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 27 hours, 6 minutes, 2 seconds)
2025-09-12 21:13:04,344 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:13:04,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:18:10,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: -209.09634 ± 51.977
2025-09-12 21:18:10,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [-177.81105, -229.3833, -202.99663, -284.60248, -172.9513, -128.52022, -268.9915, -162.2611, -283.11987, -180.32596]
2025-09-12 21:18:10,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:18:10,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (-209.10) for latency ExtremeSparseL4U32
2025-09-12 21:18:10,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 26 hours, 26 minutes, 33 seconds)
2025-09-12 21:29:02,940 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:29:02,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:34:07,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: -1.89609 ± 34.199
2025-09-12 21:34:07,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [-12.551253, -11.452208, -44.25008, 0.5628215, 61.43164, -40.135956, 41.707863, 30.375761, -37.16815, -7.4813614]
2025-09-12 21:34:07,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:34:07,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (-1.90) for latency ExtremeSparseL4U32
2025-09-12 21:34:07,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 26 hours, 2 minutes, 34 seconds)
2025-09-12 21:44:54,824 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:44:54,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:49:57,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 473.77798 ± 247.621
2025-09-12 21:49:57,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [407.54257, 830.50793, 423.1512, 520.1447, 139.68875, 43.903934, 533.54645, 390.6727, 575.3406, 873.28094]
2025-09-12 21:49:57,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:49:57,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (473.78) for latency ExtremeSparseL4U32
2025-09-12 21:49:57,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 25 hours, 39 minutes, 58 seconds)
2025-09-12 22:00:47,305 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:00:47,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:05:49,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 792.22437 ± 457.331
2025-09-12 22:05:49,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1160.177, 462.15787, 160.50832, 211.48232, 1310.6357, 1124.1469, 1454.2534, 404.06754, 551.21234, 1083.6027]
2025-09-12 22:05:49,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:05:49,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (792.22) for latency ExtremeSparseL4U32
2025-09-12 22:05:49,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 25 hours, 20 minutes, 34 seconds)
2025-09-12 22:16:36,537 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:16:36,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:21:44,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1359.34155 ± 837.243
2025-09-12 22:21:44,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1764.3138, 1694.0549, -8.057166, 2154.007, 2065.758, 38.0275, 1518.8932, 1910.3589, 2129.1406, 326.91983]
2025-09-12 22:21:44,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:21:44,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (1359.34) for latency ExtremeSparseL4U32
2025-09-12 22:21:44,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 24 hours, 55 minutes, 9 seconds)
2025-09-12 22:32:32,846 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:32:32,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:37:39,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1479.98315 ± 710.153
2025-09-12 22:37:39,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1058.688, 2156.7422, 491.91272, 1939.8247, 2185.4688, 670.69336, 2257.394, 1713.4056, 1939.7961, 385.9061]
2025-09-12 22:37:39,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:37:39,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (1479.98) for latency ExtremeSparseL4U32
2025-09-12 22:37:39,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 24 hours, 38 minutes, 20 seconds)
2025-09-12 22:48:27,870 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:48:27,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:53:30,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1634.64136 ± 734.636
2025-09-12 22:53:30,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [277.5699, 1997.1721, 894.46423, 827.4598, 2659.5154, 1338.6344, 2448.5977, 1914.5454, 2202.7568, 1785.697]
2025-09-12 22:53:30,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:53:30,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (1634.64) for latency ExtremeSparseL4U32
2025-09-12 22:53:30,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 24 hours, 20 minutes, 46 seconds)
2025-09-12 23:04:18,629 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:04:18,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:09:18,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2111.29590 ± 143.156
2025-09-12 23:09:18,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1884.7557, 2132.0383, 1910.5157, 2165.632, 2159.0972, 2411.1794, 2217.3723, 2110.6755, 2086.9395, 2034.7545]
2025-09-12 23:09:18,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:09:18,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (2111.30) for latency ExtremeSparseL4U32
2025-09-12 23:09:18,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 24 hours, 4 minutes, 14 seconds)
2025-09-12 23:20:07,164 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:20:07,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:25:11,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2267.03174 ± 512.638
2025-09-12 23:25:11,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2448.681, 2720.5493, 2370.5737, 2902.4153, 1088.1406, 2338.0369, 2375.8171, 1558.381, 2361.8916, 2505.829]
2025-09-12 23:25:11,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:25:11,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (2267.03) for latency ExtremeSparseL4U32
2025-09-12 23:25:11,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 23 hours, 48 minutes, 40 seconds)
2025-09-12 23:35:58,699 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:35:58,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:41:05,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2333.01343 ± 170.146
2025-09-12 23:41:05,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2488.344, 2379.8455, 2169.7927, 2258.2922, 2478.862, 2152.9055, 2216.637, 2264.8877, 2711.938, 2208.6316]
2025-09-12 23:41:05,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:41:05,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (2333.01) for latency ExtremeSparseL4U32
2025-09-12 23:41:05,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 23 hours, 32 minutes, 14 seconds)
2025-09-12 23:51:54,942 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:51:54,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:57:04,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1745.68982 ± 940.150
2025-09-12 23:57:04,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2693.4036, 307.90393, 1379.0001, 2411.653, 868.6912, 115.17, 2494.766, 2462.7307, 2490.8767, 2232.7024]
2025-09-12 23:57:04,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:57:04,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 23 hours, 17 minutes, 43 seconds)
2025-09-13 00:07:53,658 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:07:53,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:12:54,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2119.67041 ± 721.574
2025-09-13 00:12:54,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [678.48267, 2668.186, 2376.944, 2400.5334, 2335.1838, 704.9853, 2611.768, 2431.6812, 2406.4805, 2582.4585]
2025-09-13 00:12:54,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:12:54,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 23 hours, 1 minute, 34 seconds)
2025-09-13 00:23:43,609 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:23:43,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:28:53,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2050.40820 ± 526.261
2025-09-13 00:28:53,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2492.4304, 1161.5367, 2325.4111, 2627.9402, 1257.6658, 2291.6958, 1908.349, 1542.3756, 2664.2327, 2232.4458]
2025-09-13 00:28:53,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:28:53,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 22 hours, 48 minutes, 50 seconds)
2025-09-13 00:39:45,867 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:39:45,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:44:48,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1880.88342 ± 715.244
2025-09-13 00:44:48,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2158.9634, 2529.7998, 2267.6968, 110.72498, 1488.8156, 1342.7483, 1678.9839, 2336.7832, 2469.3862, 2424.9316]
2025-09-13 00:44:48,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:44:48,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 22 hours, 33 minutes, 27 seconds)
2025-09-13 00:55:37,911 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:55:37,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:00:45,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2348.45117 ± 538.292
2025-09-13 01:00:45,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1959.8341, 1021.6556, 2405.9595, 2701.914, 2992.8306, 2781.6345, 2174.646, 2733.593, 2160.7795, 2551.665]
2025-09-13 01:00:45,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:00:45,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (2348.45) for latency ExtremeSparseL4U32
2025-09-13 01:00:45,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 22 hours, 18 minutes, 26 seconds)
2025-09-13 01:11:35,081 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:11:35,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:16:39,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2081.74707 ± 763.243
2025-09-13 01:16:39,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1644.3629, 209.48563, 2328.276, 2270.522, 2495.1013, 2504.7048, 2823.3572, 2064.1018, 2974.627, 1502.9314]
2025-09-13 01:16:39,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:16:39,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 22 hours, 1 minute, 10 seconds)
2025-09-13 01:27:28,326 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:27:28,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:32:36,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1923.77368 ± 1086.214
2025-09-13 01:32:36,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2583.0369, 2547.2966, 2642.786, 141.28583, 2861.8013, 2573.4011, 238.39061, 2360.9568, 463.01627, 2825.7651]
2025-09-13 01:32:36,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:32:36,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 21 hours, 46 minutes, 58 seconds)
2025-09-13 01:43:27,464 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:43:27,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:48:28,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1796.79956 ± 860.163
2025-09-13 01:48:28,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [930.91675, 2593.6301, 1978.5504, 2360.8503, 2422.3792, 58.66575, 1431.032, 2672.3396, 939.0314, 2580.6013]
2025-09-13 01:48:28,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:48:28,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 21 hours, 29 minutes, 1 second)
2025-09-13 01:59:18,186 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:59:18,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:04:19,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2091.88037 ± 996.593
2025-09-13 02:04:19,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2435.2207, 2733.8613, 2620.703, 336.40195, 2849.4036, -59.566757, 2199.115, 2443.7693, 2605.5635, 2754.3293]
2025-09-13 02:04:19,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:04:19,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 21 hours, 12 minutes, 12 seconds)
2025-09-13 02:15:08,910 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:15:08,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:20:10,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2210.70386 ± 744.266
2025-09-13 02:20:10,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1280.0769, 2099.9424, 2567.31, 1275.5717, 2626.1995, 2880.3398, 3051.917, 878.61035, 2600.6821, 2846.3882]
2025-09-13 02:20:10,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:20:10,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 20 hours, 54 minutes, 44 seconds)
2025-09-13 02:30:59,542 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:30:59,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:35:59,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2040.40881 ± 847.175
2025-09-13 02:35:59,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2909.0757, 2551.5325, 2531.9731, 1163.9924, 2669.1936, 2751.8232, 1475.8484, 2837.3682, 757.1791, 756.0994]
2025-09-13 02:35:59,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:35:59,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 20 hours, 37 minutes, 43 seconds)
2025-09-13 02:46:49,554 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:46:49,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:51:50,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2349.77539 ± 842.289
2025-09-13 02:51:50,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2922.033, 2887.485, 2837.75, 2802.9143, 44.159042, 2752.538, 2860.573, 2158.6904, 1842.4478, 2389.1638]
2025-09-13 02:51:50,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:51:50,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (2349.78) for latency ExtremeSparseL4U32
2025-09-13 02:51:50,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 20 hours, 20 minutes, 15 seconds)
2025-09-13 03:02:40,243 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:02:40,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:07:49,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2263.72827 ± 626.054
2025-09-13 03:07:49,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2452.3154, 2406.8682, 2852.8464, 2668.4363, 2615.5361, 1031.292, 2590.6147, 1045.0281, 2581.8313, 2392.5134]
2025-09-13 03:07:49,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:07:49,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 20 hours, 6 minutes, 18 seconds)
2025-09-13 03:18:40,693 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:18:40,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:23:40,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2023.24976 ± 869.444
2025-09-13 03:23:40,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2901.1204, 1174.0192, 3075.27, 1328.6471, 2283.777, 135.4766, 2619.867, 2273.5344, 2638.95, 1801.8358]
2025-09-13 03:23:40,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:23:40,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 19 hours, 50 minutes, 24 seconds)
2025-09-13 03:34:31,254 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:34:31,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:39:39,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1719.44238 ± 797.715
2025-09-13 03:39:39,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [825.4657, 2153.664, 1916.3698, 236.81775, 1914.0142, 1253.0165, 2911.7515, 2512.8728, 2380.1123, 1090.3396]
2025-09-13 03:39:39,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:39:39,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 19 hours, 36 minutes, 18 seconds)
2025-09-13 03:50:30,763 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:50:30,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:55:36,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2131.27197 ± 860.453
2025-09-13 03:55:36,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2059.7944, 2335.3694, 548.8871, 2192.7017, 2760.9011, 2398.695, 2881.4287, 2942.7317, 458.70676, 2733.5056]
2025-09-13 03:55:36,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:55:36,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 19 hours, 22 minutes, 12 seconds)
2025-09-13 04:06:25,969 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:06:25,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:11:30,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2339.99390 ± 937.906
2025-09-13 04:11:30,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2955.558, 2783.147, 2801.2322, 2682.122, -101.86332, 2515.8494, 1327.451, 2604.5938, 2651.977, 3179.8726]
2025-09-13 04:11:30,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:11:30,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 19 hours, 7 minutes, 10 seconds)
2025-09-13 04:22:21,260 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:22:21,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:27:23,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1697.50513 ± 901.166
2025-09-13 04:27:23,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3110.8723, 217.23714, 1621.4368, 696.12866, 2681.6714, 1237.4033, 2045.0077, 1635.6996, 2748.773, 980.8206]
2025-09-13 04:27:23,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:27:23,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 18 hours, 49 minutes, 52 seconds)
2025-09-13 04:38:16,054 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:38:16,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:43:20,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2241.49854 ± 792.890
2025-09-13 04:43:20,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2415.5159, 2575.5032, 2629.7258, 2375.9976, 2752.0046, 2729.6218, 1563.5808, 86.70592, 2482.165, 2804.1658]
2025-09-13 04:43:20,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:43:20,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 18 hours, 35 minutes, 14 seconds)
2025-09-13 04:54:11,012 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:54:11,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:59:13,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2552.00269 ± 621.243
2025-09-13 04:59:13,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2834.0542, 2640.3008, 2872.6914, 3097.1658, 2853.0518, 1622.5577, 2835.5642, 2804.6558, 1080.2979, 2879.688]
2025-09-13 04:59:13,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:59:13,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (2552.00) for latency ExtremeSparseL4U32
2025-09-13 04:59:13,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 18 hours, 18 minutes, 2 seconds)
2025-09-13 05:10:04,038 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:10:04,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:15:12,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2465.30273 ± 879.988
2025-09-13 05:15:12,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2201.4717, 2527.2275, 3066.4536, 3052.19, 2802.7935, -41.053246, 3086.5571, 2631.1577, 2466.7976, 2859.434]
2025-09-13 05:15:12,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:15:12,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 18 hours, 2 minutes, 31 seconds)
2025-09-13 05:26:03,361 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:26:03,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:31:11,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1979.45081 ± 892.999
2025-09-13 05:31:11,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2761.4075, 2877.7253, 1159.4581, 2676.993, 1375.8833, 1500.9414, 1864.0343, 80.51832, 2727.2988, 2770.2485]
2025-09-13 05:31:11,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:31:11,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 17 hours, 47 minutes, 45 seconds)
2025-09-13 05:42:01,434 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:42:01,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:47:09,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2453.13208 ± 848.571
2025-09-13 05:47:09,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2847.6433, 2729.287, 2634.5847, 2773.7832, 820.12067, 738.182, 3030.1575, 3142.2822, 2861.475, 2953.805]
2025-09-13 05:47:09,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:47:09,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 17 hours, 32 minutes, 45 seconds)
2025-09-13 05:57:59,646 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:57:59,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:03:01,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2158.80786 ± 795.909
2025-09-13 06:03:01,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2597.9749, 1111.5928, 903.5526, 1680.6566, 1198.8737, 2949.7412, 3074.903, 2633.9197, 2753.8894, 2682.9746]
2025-09-13 06:03:01,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:03:01,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 17 hours, 15 minutes, 46 seconds)
2025-09-13 06:13:52,656 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:13:52,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:18:53,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2295.04883 ± 665.326
2025-09-13 06:18:53,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2932.141, 2722.8684, 2576.3025, 2650.206, 1232.3922, 1295.5234, 2422.7007, 1420.7229, 3083.4312, 2614.2]
2025-09-13 06:18:53,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:18:53,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 16 hours, 59 minutes, 46 seconds)
2025-09-13 06:29:44,544 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:29:44,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:34:48,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2733.27002 ± 265.042
2025-09-13 06:34:48,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2897.8357, 3051.909, 2929.9966, 2837.2427, 2599.3833, 2050.8496, 2727.149, 2889.5183, 2742.7537, 2606.06]
2025-09-13 06:34:48,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:34:48,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (2733.27) for latency ExtremeSparseL4U32
2025-09-13 06:34:48,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 16 hours, 43 minutes, 3 seconds)
2025-09-13 06:45:39,777 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:45:39,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:50:44,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2733.26221 ± 164.253
2025-09-13 06:50:44,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2408.6511, 2654.5054, 2835.2615, 3026.376, 2591.0408, 2606.6687, 2866.7002, 2803.7122, 2779.7861, 2759.921]
2025-09-13 06:50:44,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:50:44,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 16 hours, 26 minutes, 28 seconds)
2025-09-13 07:01:36,065 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:01:36,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:06:41,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1800.67847 ± 837.922
2025-09-13 07:06:41,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2605.0369, 2278.2078, 2859.529, 445.7562, 527.35474, 2747.4548, 1177.9583, 2215.3633, 1504.8955, 1645.2289]
2025-09-13 07:06:41,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 07:06:41,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 16 hours, 10 minutes, 24 seconds)
2025-09-13 07:17:31,575 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:17:31,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:22:37,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2107.06689 ± 920.529
2025-09-13 07:22:37,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2828.9954, 2050.202, 603.62067, 2827.3735, 1792.8569, 569.94196, 1464.7697, 3037.8533, 2855.3765, 3039.6802]
2025-09-13 07:22:37,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 07:22:37,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 55 minutes, 15 seconds)
2025-09-13 07:33:30,586 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:33:30,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:38:37,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2385.47168 ± 746.172
2025-09-13 07:38:37,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2932.6606, 2485.8538, 2647.5337, 2582.9446, 3026.426, 422.07974, 2996.1475, 2698.1306, 2317.4023, 1745.5365]
2025-09-13 07:38:37,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 07:38:37,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 15 hours, 40 minutes, 50 seconds)
2025-09-13 07:49:28,651 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:49:28,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:54:30,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1980.15686 ± 738.805
2025-09-13 07:54:30,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2564.0637, 834.1046, 1332.3276, 2703.7512, 1372.2208, 2667.8264, 1936.245, 996.85254, 2800.5156, 2593.661]
2025-09-13 07:54:30,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 07:54:30,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 15 hours, 24 minutes, 35 seconds)
2025-09-13 08:05:21,986 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:05:21,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:10:25,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2134.58447 ± 1180.027
2025-09-13 08:10:25,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2727.965, 773.39264, 165.98848, 147.67914, 2882.0293, 3198.0718, 3023.2422, 2987.2346, 2673.066, 2767.1755]
2025-09-13 08:10:25,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:10:25,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 15 hours, 8 minutes, 17 seconds)
2025-09-13 08:21:17,381 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:21:17,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:26:27,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1850.79041 ± 1011.897
2025-09-13 08:26:27,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3054.739, 1273.1238, 1018.0504, 257.9682, 233.66675, 2531.161, 2116.8794, 2589.931, 2588.4136, 2843.971]
2025-09-13 08:26:27,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:26:27,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 14 hours, 53 minutes, 24 seconds)
2025-09-13 08:37:19,037 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:37:19,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:42:19,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1933.87598 ± 999.118
2025-09-13 08:42:19,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [527.93713, 3080.0156, 668.3586, 1915.1355, 3067.889, 2918.4736, 2588.7139, 2580.2734, 1334.2366, 657.7275]
2025-09-13 08:42:19,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:42:19,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 14 hours, 36 minutes, 44 seconds)
2025-09-13 08:53:13,072 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:53:13,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:58:19,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2125.59863 ± 958.455
2025-09-13 08:58:19,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2896.4907, 678.4749, 2760.186, 2674.5745, 215.24362, 2922.25, 2913.6055, 1439.9855, 1995.6793, 2759.4956]
2025-09-13 08:58:19,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:58:19,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 14 hours, 20 minutes, 43 seconds)
2025-09-13 09:09:11,900 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:09:11,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:14:16,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2489.35693 ± 635.777
2025-09-13 09:14:16,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [661.9328, 2688.1824, 2748.9072, 2755.4868, 2330.8425, 2871.5056, 2679.1123, 2626.1145, 3036.1814, 2495.3042]
2025-09-13 09:14:16,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 09:14:16,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 14 hours, 5 minutes, 24 seconds)
2025-09-13 09:25:08,106 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:25:08,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:30:16,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2587.77417 ± 662.288
2025-09-13 09:30:16,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1554.508, 2808.0442, 2693.1436, 2919.6138, 3025.3386, 2222.9, 3317.017, 1236.3555, 3188.4033, 2912.417]
2025-09-13 09:30:16,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 09:30:16,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 13 hours, 50 minutes, 27 seconds)
2025-09-13 09:41:08,411 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:41:08,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:46:08,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2432.33545 ± 968.871
2025-09-13 09:46:08,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2842.8135, 2551.1714, 2734.7202, 167.28474, 3094.4907, 3081.1067, 3032.8245, 976.41943, 2634.149, 3208.3726]
2025-09-13 09:46:08,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 09:46:08,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 13 hours, 32 minutes, 42 seconds)
2025-09-13 09:56:59,406 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:56:59,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:02:01,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2038.64453 ± 973.212
2025-09-13 10:02:01,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2808.0598, 293.8802, 660.08984, 909.40295, 2652.1553, 2815.1252, 2805.489, 2354.9268, 2052.6763, 3034.6416]
2025-09-13 10:02:01,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:02:01,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 13 hours, 16 minutes, 55 seconds)
2025-09-13 10:12:52,747 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:12:52,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:17:53,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2259.46826 ± 1134.848
2025-09-13 10:17:53,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [111.39469, 3100.0159, 2792.2207, 3128.232, 3216.4785, 2976.1604, 101.11531, 1904.3232, 2515.6758, 2749.0667]
2025-09-13 10:17:53,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:17:53,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 12 hours, 59 minutes, 50 seconds)
2025-09-13 10:28:47,202 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:28:47,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:33:48,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2193.73804 ± 812.745
2025-09-13 10:33:48,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2632.5586, 1211.9502, 2551.854, 467.22708, 2876.065, 2938.3057, 2514.5608, 2163.2637, 3064.535, 1517.061]
2025-09-13 10:33:48,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:33:48,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 43 minutes, 30 seconds)
2025-09-13 10:44:40,936 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:44:40,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:49:49,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1849.36975 ± 1244.376
2025-09-13 10:49:49,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2904.654, 183.14012, 2795.9802, 2970.6929, 973.0862, 177.61565, 2664.3252, 2908.9128, 81.96223, 2833.3296]
2025-09-13 10:49:49,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:49:49,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 27 minutes, 50 seconds)
2025-09-13 11:00:44,227 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:00:44,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:05:51,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2309.25146 ± 910.434
2025-09-13 11:05:51,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1901.2778, 2955.233, 2986.6985, 2850.7312, 3195.3027, 1022.7072, 2768.9778, 3174.507, 1717.3511, 519.7307]
2025-09-13 11:05:51,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:05:51,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 12 hours, 13 minutes, 25 seconds)
2025-09-13 11:16:42,698 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:16:42,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:21:51,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2483.13330 ± 992.123
2025-09-13 11:21:51,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2816.403, 3029.4666, 2554.5203, 1479.3257, 3163.2317, 3386.6968, 3033.1003, -83.987, 2982.2478, 2470.3271]
2025-09-13 11:21:51,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:21:51,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 11 hours, 58 minutes, 36 seconds)
2025-09-13 11:32:46,015 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:32:46,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:37:47,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2272.99805 ± 632.390
2025-09-13 11:37:47,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2830.023, 2238.9712, 2667.659, 1906.3976, 1149.2145, 2933.9048, 1370.3999, 1972.9819, 3123.667, 2536.7607]
2025-09-13 11:37:47,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:37:47,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 11 hours, 43 minutes, 7 seconds)
2025-09-13 11:48:38,997 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:48:39,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:53:49,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2399.31665 ± 962.325
2025-09-13 11:53:49,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3024.3447, 3218.0752, 525.23175, 3172.848, 1532.6102, 3339.1414, 1270.224, 1830.9786, 3174.4526, 2905.2605]
2025-09-13 11:53:49,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:53:49,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 11 hours, 28 minutes, 13 seconds)
2025-09-13 12:04:45,244 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:04:45,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:09:45,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2554.68433 ± 902.522
2025-09-13 12:09:45,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2540.965, 1392.1077, 3201.1458, 2847.9272, 2921.4963, 336.13077, 3272.5532, 2980.8657, 3239.4932, 2814.1602]
2025-09-13 12:09:45,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 12:09:45,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 11 hours, 11 minutes, 24 seconds)
2025-09-13 12:20:37,008 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:20:37,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:25:43,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2398.41602 ± 926.262
2025-09-13 12:25:43,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2394.5552, 2986.7727, 1069.7336, 3073.4243, 2378.6404, 3031.6267, 241.8697, 2789.8003, 2832.9714, 3184.7666]
2025-09-13 12:25:43,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 12:25:44,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 10 hours, 54 minutes, 56 seconds)
2025-09-13 12:36:36,963 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:36:36,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:41:43,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2447.31201 ± 998.437
2025-09-13 12:41:43,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2267.1682, 3117.2678, 676.69763, 2988.0452, 487.1659, 3092.1846, 3024.1062, 2262.1592, 3438.917, 3119.4082]
2025-09-13 12:41:43,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 12:41:43,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 10 hours, 38 minutes, 51 seconds)
2025-09-13 12:52:36,946 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:52:36,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:57:42,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1370.10596 ± 1235.283
2025-09-13 12:57:42,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [458.48932, 146.85892, 537.98425, 623.61835, 3204.8855, 1175.1049, 3258.8337, 3200.2278, 484.94077, 610.1158]
2025-09-13 12:57:42,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 12:57:42,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 10 hours, 23 minutes, 20 seconds)
2025-09-13 13:08:34,336 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:08:34,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:13:44,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2012.03552 ± 882.084
2025-09-13 13:13:44,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2376.795, 3127.0876, 2841.2686, 3025.0613, 1033.2803, 2803.7527, 1836.532, 719.4631, 1092.5754, 1264.5378]
2025-09-13 13:13:44,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 13:13:44,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 10 hours, 7 minutes, 23 seconds)
2025-09-13 13:24:38,780 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:24:38,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:29:40,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2479.26807 ± 918.497
2025-09-13 13:29:40,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3169.7236, 3126.4167, 2708.7964, 2081.5771, 1618.3352, 2810.63, 3014.3567, 133.72287, 3010.0347, 3119.0886]
2025-09-13 13:29:40,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 13:29:40,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 9 hours, 51 minutes, 20 seconds)
2025-09-13 13:40:32,358 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:40:32,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:45:34,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2375.88184 ± 981.613
2025-09-13 13:45:34,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [627.37854, 3107.4744, 3019.9373, 2984.3813, 3243.62, 2766.1506, 862.29034, 2986.0479, 1210.3263, 2951.2117]
2025-09-13 13:45:34,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 13:45:34,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 9 hours, 34 minutes, 52 seconds)
2025-09-13 13:56:27,578 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:56:27,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:01:34,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2294.36475 ± 1180.731
2025-09-13 14:01:34,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2825.6475, 2906.7107, 3233.3296, 18.06566, 3360.6914, 3269.8762, 2861.7239, 1114.914, 524.11475, 2828.5723]
2025-09-13 14:01:34,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 14:01:34,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 9 hours, 18 minutes, 58 seconds)
2025-09-13 14:12:28,969 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:12:28,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:17:30,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2154.92969 ± 1026.742
2025-09-13 14:17:30,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2859.505, 270.34546, 2834.927, 1533.5808, 1767.1953, 792.6016, 1810.2295, 3414.9976, 3034.247, 3231.668]
2025-09-13 14:17:30,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 14:17:30,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 9 hours, 2 minutes, 36 seconds)
2025-09-13 14:28:22,238 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:28:22,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:33:27,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2149.57764 ± 1109.223
2025-09-13 14:33:27,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [339.83414, 2777.507, 2959.6636, 2682.5464, 2837.1477, -47.857975, 2963.7136, 2135.076, 3312.5564, 1535.5927]
2025-09-13 14:33:27,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 14:33:27,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 46 minutes, 4 seconds)
2025-09-13 14:44:22,800 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:44:22,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:49:26,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2505.62964 ± 1075.227
2025-09-13 14:49:26,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2884.8208, 3032.386, 2836.6858, 2939.7944, 839.2512, 3162.7803, 3144.3694, 2994.5137, -30.012451, 3251.707]
2025-09-13 14:49:26,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 14:49:26,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 30 minutes, 34 seconds)
2025-09-13 15:00:20,358 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:00:20,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:05:28,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2548.31128 ± 701.882
2025-09-13 15:05:28,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2575.7253, 2894.9238, 3136.03, 3380.9177, 830.96356, 1960.0448, 3039.9238, 2240.5254, 2913.8743, 2510.1838]
2025-09-13 15:05:28,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 15:05:28,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 15 minutes, 22 seconds)
2025-09-13 15:16:23,365 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:16:23,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:21:26,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2555.99536 ± 860.204
2025-09-13 15:21:26,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2990.7188, 3148.1616, 2874.8987, 750.0341, 1113.7412, 2653.515, 3229.2837, 3236.6587, 2340.5872, 3222.3538]
2025-09-13 15:21:26,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 15:21:26,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 59 minutes, 11 seconds)
2025-09-13 15:32:20,079 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:32:20,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:37:24,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2000.20923 ± 1226.065
2025-09-13 15:37:24,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3092.0466, 164.911, 351.39087, 380.566, 3273.495, 1333.6599, 3062.375, 2904.855, 2515.083, 2923.7092]
2025-09-13 15:37:24,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 15:37:24,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 43 minutes, 28 seconds)
2025-09-13 15:48:19,293 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:48:19,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:53:27,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2045.90955 ± 1206.862
2025-09-13 15:53:27,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1756.0847, 2976.9758, 263.9185, 3039.0269, 3174.6628, 2466.2163, 414.5431, 240.7316, 2989.73, 3137.2065]
2025-09-13 15:53:27,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 15:53:27,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 28 minutes, 2 seconds)
2025-09-13 16:04:21,762 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:04:21,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:09:30,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2471.15381 ± 923.718
2025-09-13 16:09:30,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3499.222, 3208.2664, 3153.2993, 1602.644, 3089.3474, 1317.918, 3332.5374, 1003.56006, 1542.0897, 2962.6536]
2025-09-13 16:09:30,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 16:09:30,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 12 minutes, 20 seconds)
2025-09-13 16:20:25,709 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:20:25,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:25:27,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1712.77795 ± 896.419
2025-09-13 16:25:27,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [664.7549, 795.07935, 1590.5325, 2357.3906, 1110.6084, 1891.828, 722.77295, 1646.6451, 3259.8123, 3088.3562]
2025-09-13 16:25:27,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 16:25:27,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 55 minutes, 52 seconds)
2025-09-13 16:36:21,192 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:36:21,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:41:27,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2741.95435 ± 836.169
2025-09-13 16:41:27,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3254.9944, 2982.0793, 1520.827, 2674.144, 3553.1785, 3262.4421, 3299.1533, 3174.31, 811.232, 2887.1824]
2025-09-13 16:41:27,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 16:41:27,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (2741.95) for latency ExtremeSparseL4U32
2025-09-13 16:41:27,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 40 minutes, 4 seconds)
2025-09-13 16:52:19,816 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:52:19,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:57:27,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1989.02759 ± 1225.774
2025-09-13 16:57:27,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [310.8869, 508.99, 3065.155, 2997.2158, 2698.8103, 3055.1926, 300.85175, 928.896, 3273.1206, 2751.1567]
2025-09-13 16:57:27,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 16:57:27,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 24 minutes, 13 seconds)
2025-09-13 17:08:21,050 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:08:21,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:13:22,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2842.45679 ± 457.763
2025-09-13 17:13:22,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3019.3706, 3210.088, 2893.2554, 3221.8865, 2104.9854, 3114.1555, 3301.7693, 1995.1969, 3120.048, 2443.813]
2025-09-13 17:13:22,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 17:13:22,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (2842.46) for latency ExtremeSparseL4U32
2025-09-13 17:13:22,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 7 minutes, 35 seconds)
2025-09-13 17:24:16,313 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:24:16,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:29:19,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2153.28955 ± 985.388
2025-09-13 17:29:19,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [460.40955, 484.78348, 2988.216, 2357.8816, 2040.1869, 2468.5347, 2242.455, 3430.9958, 1728.1133, 3331.319]
2025-09-13 17:29:19,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 17:29:19,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 51 minutes, 9 seconds)
2025-09-13 17:40:12,950 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:40:12,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:45:16,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2702.47021 ± 942.269
2025-09-13 17:45:16,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [208.86736, 2437.3323, 3263.8113, 1999.0543, 3342.7449, 3097.9631, 2672.301, 3435.1733, 3266.3289, 3301.127]
2025-09-13 17:45:16,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 17:45:16,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 35 minutes, 16 seconds)
2025-09-13 17:56:11,316 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:56:11,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:01:19,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2340.86499 ± 1064.559
2025-09-13 18:01:19,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [254.13474, 1413.0854, 892.3898, 2832.7158, 2974.3843, 3187.6135, 3051.9392, 3473.5164, 2089.3757, 3239.4958]
2025-09-13 18:01:19,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 18:01:19,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 19 minutes, 30 seconds)
2025-09-13 18:12:16,764 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:12:16,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:17:21,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2380.40332 ± 1172.095
2025-09-13 18:17:21,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [920.32166, 3441.3584, 3051.594, 2651.8245, 3645.721, 552.7249, 450.35767, 2920.6038, 3030.6697, 3138.858]
2025-09-13 18:17:21,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 18:17:21,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 3 minutes, 35 seconds)
2025-09-13 18:28:16,386 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:28:16,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:33:17,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2426.42529 ± 1128.202
2025-09-13 18:33:17,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3474.8137, 3126.5732, 3268.8691, 632.63934, 3398.739, 954.8581, 2340.0398, 3125.217, 3234.313, 708.1901]
2025-09-13 18:33:17,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 18:33:17,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 47 minutes, 42 seconds)
2025-09-13 18:44:11,832 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:44:11,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:49:14,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2294.41504 ± 1199.114
2025-09-13 18:49:14,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2523.9045, 2022.014, 2429.7795, 2856.9941, 3399.523, 3048.5305, 3201.488, 3362.8738, -154.96336, 254.00507]
2025-09-13 18:49:14,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 18:49:14,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 31 minutes, 43 seconds)
2025-09-13 19:00:08,273 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:00:08,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:05:11,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2930.02808 ± 816.466
2025-09-13 19:05:11,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1770.7599, 3456.4504, 993.7391, 3341.8098, 2776.0913, 3441.5918, 3456.0852, 3380.6204, 3447.5725, 3235.5605]
2025-09-13 19:05:11,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:05:11,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (2930.03) for latency ExtremeSparseL4U32
2025-09-13 19:05:11,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 15 minutes, 45 seconds)
2025-09-13 19:16:07,739 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:16:07,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:21:15,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1945.15649 ± 1377.292
2025-09-13 19:21:15,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3426.2551, -125.23457, 3422.2454, 822.4515, 1655.3116, 462.43307, 2501.3206, 3463.1804, 471.57928, 3352.023]
2025-09-13 19:21:15,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:21:15,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 59 minutes, 46 seconds)
2025-09-13 19:32:09,877 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:32:09,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:37:17,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2640.19458 ± 725.181
2025-09-13 19:37:17,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1856.1038, 2997.6267, 2924.4805, 3404.8823, 1423.2616, 3515.4004, 2124.0803, 3256.484, 3115.082, 1784.542]
2025-09-13 19:37:17,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:37:17,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 43 minutes, 49 seconds)
2025-09-13 19:48:11,489 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:48:11,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:53:15,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2706.16260 ± 861.565
2025-09-13 19:53:15,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3343.8708, 3358.632, 3228.1843, 3376.9692, 3002.276, 1296.7565, 3181.9004, 2037.3499, 3219.6792, 1016.00916]
2025-09-13 19:53:15,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:53:15,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 27 minutes, 55 seconds)
2025-09-13 20:04:11,336 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:04:11,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:09:20,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3057.20483 ± 315.037
2025-09-13 20:09:20,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2968.2659, 3305.277, 2244.338, 2829.5764, 3052.464, 3300.024, 3399.4026, 3105.5515, 3190.888, 3176.2622]
2025-09-13 20:09:20,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 20:09:20,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (3057.20) for latency ExtremeSparseL4U32
2025-09-13 20:09:20,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 12 minutes, 15 seconds)
2025-09-13 20:20:18,113 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:20:18,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:25:21,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2618.57153 ± 1147.486
2025-09-13 20:25:21,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3393.7493, 3091.7334, 3087.1282, 1502.3091, 3577.1572, 3608.799, 2978.7874, 24.780952, 1443.2755, 3477.994]
2025-09-13 20:25:21,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 20:25:21,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 56 minutes, 21 seconds)
2025-09-13 20:36:14,990 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:36:15,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:41:17,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2548.87769 ± 1095.320
2025-09-13 20:41:17,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3037.735, 2803.6162, 3204.8013, 588.40045, 3242.1775, 3156.3472, 3332.8452, 3358.6423, 246.99312, 2517.2212]
2025-09-13 20:41:17,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 20:41:17,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 40 minutes, 3 seconds)
2025-09-13 20:52:11,362 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:52:11,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:57:19,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2204.20654 ± 1371.039
2025-09-13 20:57:19,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3828.4119, 3257.6938, 367.84778, 2825.363, 228.17986, 799.80023, 967.66, 3782.7844, 2716.6624, 3267.6624]
2025-09-13 20:57:19,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 20:57:19,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 24 minutes, 4 seconds)
2025-09-13 21:08:13,652 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:08:13,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:13:20,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2529.71021 ± 1174.288
2025-09-13 21:13:20,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1557.7993, 3165.3625, 3457.9868, 3224.198, 3425.9688, 782.2771, 3718.3704, 179.65848, 3035.5725, 2749.9087]
2025-09-13 21:13:20,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 21:13:20,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 8 minutes, 7 seconds)
2025-09-13 21:24:15,668 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:24:15,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:29:17,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2329.62256 ± 1080.527
2025-09-13 21:29:17,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3195.3362, 460.76413, 720.01697, 2512.2178, 3335.3235, 2933.2668, 1013.774, 3363.6348, 2992.176, 2769.7175]
2025-09-13 21:29:17,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 21:29:17,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 51 minutes, 55 seconds)
2025-09-13 21:40:11,527 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:40:11,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:45:17,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2458.07202 ± 1106.830
2025-09-13 21:45:17,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3121.8987, 3200.597, 3096.1675, 147.05518, 1030.1116, 3469.483, 2757.0588, 2917.0105, 3456.5078, 1384.8304]
2025-09-13 21:45:17,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 21:45:17,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 35 minutes, 54 seconds)
2025-09-13 21:56:13,051 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:56:13,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:01:23,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2438.68994 ± 830.401
2025-09-13 22:01:23,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2622.1274, 1244.2208, 3239.4055, 1180.5664, 1957.36, 3675.755, 1796.8359, 2589.2793, 2660.02, 3421.3298]
2025-09-13 22:01:23,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 22:01:23,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 20 minutes, 6 seconds)
2025-09-13 22:12:21,276 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:12:21,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:17:25,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2517.94214 ± 1149.741
2025-09-13 22:17:25,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [97.33782, 3216.567, 3442.1677, 3433.5193, 1757.9558, 792.7483, 2775.591, 3267.5015, 3387.832, 3008.2021]
2025-09-13 22:17:25,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 22:17:25,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 4 minutes, 4 seconds)
2025-09-13 22:28:22,632 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:28:22,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:33:25,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2546.73682 ± 1140.144
2025-09-13 22:33:25,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3143.1558, 2669.577, 3579.3508, 3113.885, 810.3345, 3683.9673, 359.3044, 3358.7815, 3219.6511, 1529.3625]
2025-09-13 22:33:25,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 22:33:25,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 48 minutes, 2 seconds)
2025-09-13 22:44:17,940 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:44:17,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:49:26,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3149.08081 ± 367.976
2025-09-13 22:49:26,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3137.7283, 3435.5664, 3139.9802, 3201.113, 3462.069, 3467.2004, 3229.646, 3252.9365, 3035.1174, 2129.4512]
2025-09-13 22:49:26,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 22:49:26,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (3149.08) for latency ExtremeSparseL4U32
2025-09-13 22:49:26,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 32 minutes, 3 seconds)
2025-09-13 23:00:20,105 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:00:20,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:05:24,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2351.88818 ± 1213.696
2025-09-13 23:05:24,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3397.2688, 3063.7068, 3320.0737, 3484.8933, 393.70505, 1378.1025, 2694.9377, 276.4124, 1925.0889, 3584.6946]
2025-09-13 23:05:24,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 23:05:24,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 16 minutes, 1 second)
2025-09-13 23:16:21,471 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:16:21,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:21:32,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2907.38770 ± 643.993
2025-09-13 23:21:32,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2877.7258, 3007.4539, 3168.585, 3283.5466, 2549.2324, 3441.9373, 1122.6199, 3040.2483, 3377.479, 3205.0486]
2025-09-13 23:21:32,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 23:21:32,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1251 [DEBUG]: Training session finished
