2025-09-12 21:30:56,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc20-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay
2025-09-12 21:30:56,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc20-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay
2025-09-12 21:30:56,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x154fdaef8050>}
2025-09-12 21:30:56,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1111 [DEBUG]: using device: cuda
2025-09-12 21:30:56,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-12 21:30:56,805 baseline-mbpac-noiseperc20-halfcheetah:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-12 21:30:56,806 baseline-mbpac-noiseperc20-halfcheetah:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 21:30:56,813 baseline-mbpac-noiseperc20-halfcheetah:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-12 21:30:58,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-12 21:30:58,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-12 21:42:33,088 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:42:33,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 21:47:30,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -228.88359 ± 32.486
2025-09-12 21:47:30,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-228.43483, -238.94432, -205.62744, -276.82388, -204.71687, -245.28758, -159.00327, -215.03926, -262.4651, -252.49335]
2025-09-12 21:47:30,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:47:30,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (-228.88) for latency ExtremeSparseL4U32
2025-09-12 21:47:30,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 27 hours, 18 minutes, 7 seconds)
2025-09-12 21:58:22,076 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:58:22,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:03:26,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -197.91760 ± 56.766
2025-09-12 22:03:26,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-134.18379, -102.53133, -218.20346, -210.08594, -244.89207, -238.4957, -304.41153, -188.89957, -196.34662, -141.12593]
2025-09-12 22:03:26,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:03:26,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (-197.92) for latency ExtremeSparseL4U32
2025-09-12 22:03:26,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 26 hours, 31 minutes, 14 seconds)
2025-09-12 22:14:10,573 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:14:10,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:19:12,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1.92449 ± 72.686
2025-09-12 22:19:12,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-78.83682, -85.95792, -67.2922, 30.743355, 36.38814, 122.16899, -4.0882854, 103.346695, -74.081215, 36.854126]
2025-09-12 22:19:12,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:19:12,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (1.92) for latency ExtremeSparseL4U32
2025-09-12 22:19:12,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 25 hours, 59 minutes, 41 seconds)
2025-09-12 22:29:58,763 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:29:58,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:35:02,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 119.88049 ± 177.446
2025-09-12 22:35:02,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [9.664803, 424.58337, 467.11493, -98.73936, 123.39572, -40.178265, 119.226036, -2.2420995, 95.92408, 100.05567]
2025-09-12 22:35:02,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:35:02,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (119.88) for latency ExtremeSparseL4U32
2025-09-12 22:35:02,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 25 hours, 37 minutes, 52 seconds)
2025-09-12 22:45:51,125 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:45:51,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:50:54,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 399.36517 ± 140.104
2025-09-12 22:50:54,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [137.71294, 348.86, 546.46216, 351.74396, 356.85794, 478.32587, 550.7729, 446.65198, 203.85051, 572.4136]
2025-09-12 22:50:54,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:50:54,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (399.37) for latency ExtremeSparseL4U32
2025-09-12 22:50:54,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 25 hours, 18 minutes, 56 seconds)
2025-09-12 23:01:42,072 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:01:42,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:06:42,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 491.91193 ± 68.184
2025-09-12 23:06:42,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [593.18225, 562.74133, 494.24884, 545.5692, 560.50146, 494.74866, 425.62247, 431.3527, 423.62482, 387.52728]
2025-09-12 23:06:42,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:06:42,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (491.91) for latency ExtremeSparseL4U32
2025-09-12 23:06:42,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 24 hours, 48 minutes, 57 seconds)
2025-09-12 23:17:31,154 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:17:31,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:22:32,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 539.54095 ± 50.340
2025-09-12 23:22:32,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [548.4686, 570.0603, 445.7247, 596.91254, 591.33374, 537.63696, 593.05853, 542.1862, 465.86752, 504.16043]
2025-09-12 23:22:32,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:22:32,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (539.54) for latency ExtremeSparseL4U32
2025-09-12 23:22:32,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 24 hours, 31 minutes, 15 seconds)
2025-09-12 23:33:19,515 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:33:19,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:38:18,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 562.27527 ± 69.537
2025-09-12 23:38:18,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [619.06036, 593.0482, 589.7597, 611.0751, 405.5341, 635.2879, 532.7801, 527.94415, 487.1783, 621.0843]
2025-09-12 23:38:18,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:38:18,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (562.28) for latency ExtremeSparseL4U32
2025-09-12 23:38:18,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 24 hours, 15 minutes, 19 seconds)
2025-09-12 23:49:06,063 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:49:06,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:54:07,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 614.67456 ± 58.251
2025-09-12 23:54:07,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [655.36865, 661.48566, 592.5295, 621.0401, 611.1009, 544.63586, 711.82324, 661.67896, 581.4087, 505.67383]
2025-09-12 23:54:07,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:54:07,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (614.67) for latency ExtremeSparseL4U32
2025-09-12 23:54:07,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 23 hours, 59 minutes, 7 seconds)
2025-09-13 00:04:55,466 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:04:55,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:09:57,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 574.79468 ± 72.799
2025-09-13 00:09:57,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [578.71344, 592.42725, 570.32806, 537.7709, 685.2135, 473.0601, 461.4968, 607.499, 547.8704, 693.56744]
2025-09-13 00:09:57,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:09:57,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 23 hours, 42 minutes, 40 seconds)
2025-09-13 00:20:44,958 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:20:44,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:25:43,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 598.80811 ± 57.390
2025-09-13 00:25:43,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [623.5604, 561.7094, 656.5391, 575.5315, 690.9496, 520.0183, 500.8005, 588.82874, 639.4876, 630.6555]
2025-09-13 00:25:43,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:25:43,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 23 hours, 26 minutes, 26 seconds)
2025-09-13 00:36:31,453 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:36:31,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:41:37,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 624.96497 ± 60.744
2025-09-13 00:41:37,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [558.319, 576.15924, 762.29144, 542.80585, 642.5976, 648.4855, 596.9767, 608.8171, 673.27625, 639.9216]
2025-09-13 00:41:37,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:41:37,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (624.96) for latency ExtremeSparseL4U32
2025-09-13 00:41:37,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 23 hours, 11 minutes, 55 seconds)
2025-09-13 00:52:26,406 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:52:26,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:57:31,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 618.09735 ± 76.021
2025-09-13 00:57:31,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [593.5208, 583.841, 807.23145, 599.5516, 561.58685, 556.5591, 571.26245, 581.81586, 714.8222, 610.782]
2025-09-13 00:57:31,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:57:31,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 22 hours, 58 minutes, 36 seconds)
2025-09-13 01:08:19,544 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:08:19,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:13:17,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 663.29559 ± 61.149
2025-09-13 01:13:17,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [631.8736, 715.6062, 730.2365, 645.9137, 739.274, 710.6249, 641.0863, 645.69904, 650.42816, 522.2136]
2025-09-13 01:13:17,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:13:17,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (663.30) for latency ExtremeSparseL4U32
2025-09-13 01:13:17,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 22 hours, 41 minutes, 43 seconds)
2025-09-13 01:24:04,940 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:24:04,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:29:07,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 699.65869 ± 89.675
2025-09-13 01:29:07,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [647.9977, 693.50366, 750.67426, 571.3088, 801.89844, 590.0492, 766.0879, 674.65814, 636.1467, 864.26184]
2025-09-13 01:29:07,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:29:07,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (699.66) for latency ExtremeSparseL4U32
2025-09-13 01:29:07,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 22 hours, 25 minutes, 55 seconds)
2025-09-13 01:39:56,828 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:39:56,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:44:56,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 758.17877 ± 129.337
2025-09-13 01:44:56,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [644.8812, 629.0311, 584.59436, 1041.8358, 819.67065, 775.23047, 764.7347, 890.69226, 685.0197, 746.0971]
2025-09-13 01:44:56,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:44:56,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (758.18) for latency ExtremeSparseL4U32
2025-09-13 01:44:56,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 22 hours, 10 minutes, 48 seconds)
2025-09-13 01:55:46,405 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:55:46,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:00:46,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 789.66583 ± 142.677
2025-09-13 02:00:46,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [681.31287, 621.2179, 777.4472, 867.7641, 1112.7355, 957.5775, 730.51166, 661.6329, 728.55304, 757.9052]
2025-09-13 02:00:46,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:00:46,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (789.67) for latency ExtremeSparseL4U32
2025-09-13 02:00:46,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 21 hours, 53 minutes, 54 seconds)
2025-09-13 02:11:36,943 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:11:36,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:16:40,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 968.17596 ± 178.063
2025-09-13 02:16:40,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [854.3368, 889.12036, 1110.8146, 786.1167, 1240.798, 879.9848, 1108.7936, 1056.2544, 634.47455, 1121.0669]
2025-09-13 02:16:40,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:16:40,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (968.18) for latency ExtremeSparseL4U32
2025-09-13 02:16:40,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 21 hours, 37 minutes, 50 seconds)
2025-09-13 02:27:29,154 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:27:29,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:32:30,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 977.75696 ± 209.445
2025-09-13 02:32:30,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [886.69763, 864.5605, 909.0502, 1196.211, 656.94714, 1053.8655, 1026.7517, 694.24225, 1369.5182, 1119.7247]
2025-09-13 02:32:30,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:32:30,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (977.76) for latency ExtremeSparseL4U32
2025-09-13 02:32:30,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 21 hours, 23 minutes, 22 seconds)
2025-09-13 02:43:19,070 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:43:19,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:48:22,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1257.16187 ± 247.327
2025-09-13 02:48:22,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1566.3582, 1337.6792, 1338.1476, 1594.214, 873.33344, 1218.4276, 838.66284, 1089.3635, 1450.2158, 1265.2167]
2025-09-13 02:48:22,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:48:22,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (1257.16) for latency ExtremeSparseL4U32
2025-09-13 02:48:23,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 21 hours, 8 minutes, 9 seconds)
2025-09-13 02:59:10,974 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:59:10,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:04:10,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1530.95898 ± 345.220
2025-09-13 03:04:10,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [991.7137, 1722.4495, 1754.994, 1587.9082, 1659.27, 1891.7108, 899.40576, 1525.5787, 1290.1757, 1986.3828]
2025-09-13 03:04:10,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:04:10,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (1530.96) for latency ExtremeSparseL4U32
2025-09-13 03:04:10,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 20 hours, 51 minutes, 57 seconds)
2025-09-13 03:14:59,035 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:14:59,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:20:04,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1441.73169 ± 427.757
2025-09-13 03:20:04,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1967.8914, 1134.354, 1847.5521, 1726.1586, 1745.1406, 1554.8918, 1398.1007, 781.96985, 642.1482, 1619.1089]
2025-09-13 03:20:04,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:20:04,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 20 hours, 37 minutes, 4 seconds)
2025-09-13 03:30:54,368 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:30:54,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:36:01,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1956.24438 ± 101.685
2025-09-13 03:36:01,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1805.2343, 2100.281, 1927.1974, 2101.7246, 2055.5803, 1960.533, 1938.989, 1967.413, 1908.005, 1797.4861]
2025-09-13 03:36:01,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:36:01,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (1956.24) for latency ExtremeSparseL4U32
2025-09-13 03:36:01,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 20 hours, 21 minutes, 59 seconds)
2025-09-13 03:46:50,436 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:46:50,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:51:55,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1813.47876 ± 545.659
2025-09-13 03:51:55,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1916.9342, 2203.8948, 2167.939, 1558.381, 2201.1733, 2399.178, 899.89044, 741.7635, 1850.5049, 2195.128]
2025-09-13 03:51:55,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:51:55,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 20 hours, 7 minutes, 13 seconds)
2025-09-13 04:02:45,525 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:02:45,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:07:45,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1866.29688 ± 524.302
2025-09-13 04:07:45,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1980.7355, 2105.4397, 2201.8406, 2331.4065, 1764.3788, 690.486, 1991.0476, 1077.1559, 2323.9707, 2196.5083]
2025-09-13 04:07:45,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:07:45,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 19 hours, 50 minutes, 30 seconds)
2025-09-13 04:18:33,300 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:18:33,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:23:37,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1816.06482 ± 595.247
2025-09-13 04:23:37,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1112.0189, 2066.1555, 1639.4904, 2463.8264, 2273.9858, 2536.8755, 2412.7703, 1553.2054, 727.4153, 1374.904]
2025-09-13 04:23:37,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:23:37,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 19 hours, 35 minutes, 40 seconds)
2025-09-13 04:34:27,176 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:34:27,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:39:30,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2237.08594 ± 276.005
2025-09-13 04:39:30,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2368.397, 2151.2744, 1939.1276, 1592.1051, 2159.2405, 2506.5999, 2405.8027, 2434.9463, 2526.189, 2287.1758]
2025-09-13 04:39:30,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:39:30,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2237.09) for latency ExtremeSparseL4U32
2025-09-13 04:39:30,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 19 hours, 19 minutes, 34 seconds)
2025-09-13 04:50:22,476 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:50:22,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:55:19,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2100.10693 ± 496.894
2025-09-13 04:55:19,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2283.5898, 2770.088, 2454.9297, 2290.6145, 1104.2013, 1819.0939, 2443.0173, 1513.7356, 1783.6107, 2538.1895]
2025-09-13 04:55:19,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:55:19,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 19 hours, 1 minute, 57 seconds)
2025-09-13 05:06:09,477 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:06:09,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:11:09,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2079.37939 ± 727.482
2025-09-13 05:11:09,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2595.0552, 2530.8518, 2377.8337, 1835.1703, 507.22464, 890.62103, 2604.0417, 2363.7966, 2543.8455, 2545.352]
2025-09-13 05:11:09,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:11:09,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 18 hours, 45 minutes, 2 seconds)
2025-09-13 05:22:01,024 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:22:01,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:27:02,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1843.85291 ± 700.141
2025-09-13 05:27:02,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2461.6982, 2537.559, 1828.9521, 2039.0188, 2331.7021, 1158.1039, 1760.7933, 590.47925, 939.4498, 2790.7732]
2025-09-13 05:27:02,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:27:02,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 18 hours, 29 minutes, 58 seconds)
2025-09-13 05:37:52,499 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:37:52,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:42:49,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2003.61682 ± 814.294
2025-09-13 05:42:49,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2598.0479, 2509.765, 1692.1162, 2539.356, 372.3976, 1984.5688, 2477.7039, 2598.7136, 602.07635, 2661.424]
2025-09-13 05:42:49,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:42:49,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 18 hours, 13 minutes, 6 seconds)
2025-09-13 05:53:38,165 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:53:38,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:58:37,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1940.74939 ± 840.890
2025-09-13 05:58:37,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1028.916, 2460.777, 595.9377, 2775.4492, 2128.1936, 606.70776, 2756.0833, 2073.539, 2065.9646, 2915.926]
2025-09-13 05:58:37,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:58:37,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 17 hours, 56 minutes)
2025-09-13 06:09:26,732 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:09:26,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:14:30,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2114.98486 ± 587.488
2025-09-13 06:14:30,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2599.3108, 2476.586, 2483.9226, 2435.3074, 1616.4945, 2367.1973, 721.56836, 2457.6438, 2469.132, 1522.683]
2025-09-13 06:14:30,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:14:30,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 17 hours, 41 minutes, 5 seconds)
2025-09-13 06:25:18,517 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:25:18,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:30:16,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2557.55615 ± 409.041
2025-09-13 06:30:16,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2984.4353, 2652.3599, 2515.4128, 2816.2947, 1406.8285, 2562.722, 2587.0652, 2614.3076, 2847.2722, 2588.8655]
2025-09-13 06:30:16,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:30:16,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2557.56) for latency ExtremeSparseL4U32
2025-09-13 06:30:16,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 17 hours, 24 minutes, 19 seconds)
2025-09-13 06:41:07,338 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:41:07,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:46:09,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2423.82202 ± 459.825
2025-09-13 06:46:09,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2748.0305, 1185.3164, 2569.1985, 2036.1438, 2473.6396, 2684.532, 2750.6929, 2627.6084, 2714.5483, 2448.5095]
2025-09-13 06:46:09,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:46:09,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 17 hours, 8 minutes, 36 seconds)
2025-09-13 06:56:59,121 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:56:59,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:01:56,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2368.24854 ± 616.835
2025-09-13 07:01:56,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2592.0474, 2849.3296, 2555.1401, 2649.2112, 2154.543, 667.6338, 2132.0085, 2621.588, 2941.757, 2519.2239]
2025-09-13 07:01:56,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 07:01:56,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 16 hours, 52 minutes, 31 seconds)
2025-09-13 07:12:46,614 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:12:46,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:17:45,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2285.20337 ± 364.126
2025-09-13 07:17:45,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2684.9338, 2209.031, 1881.9213, 2515.0918, 1694.4863, 2535.5513, 1790.4716, 2768.7898, 2516.838, 2254.9175]
2025-09-13 07:17:45,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 07:17:45,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 16 hours, 37 minutes, 5 seconds)
2025-09-13 07:28:34,473 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:28:34,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:33:33,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2564.34277 ± 491.807
2025-09-13 07:33:33,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2318.8726, 2848.1914, 2749.0283, 2912.748, 2889.7246, 2552.023, 2835.6042, 2432.3054, 2889.5269, 1215.4042]
2025-09-13 07:33:33,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 07:33:33,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2564.34) for latency ExtremeSparseL4U32
2025-09-13 07:33:33,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 16 hours, 20 minutes, 11 seconds)
2025-09-13 07:44:20,501 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:44:20,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:49:18,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2109.35547 ± 707.209
2025-09-13 07:49:18,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2618.7524, 2004.3638, 2525.6326, 2438.1807, 2250.8208, 698.72394, 2898.9248, 2237.3792, 2571.0464, 849.7299]
2025-09-13 07:49:18,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 07:49:18,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 16 hours, 4 minutes, 17 seconds)
2025-09-13 08:00:08,607 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:00:08,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:05:13,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2463.60254 ± 531.710
2025-09-13 08:05:13,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2831.7542, 2442.7087, 2464.4277, 2506.8774, 2764.117, 1332.8645, 1609.3295, 3028.51, 2771.2876, 2884.15]
2025-09-13 08:05:13,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:05:13,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 48 minutes, 47 seconds)
2025-09-13 08:16:05,734 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:16:05,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:21:10,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2244.72632 ± 643.295
2025-09-13 08:21:10,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2738.9294, 1653.1561, 2503.1902, 2597.118, 2774.166, 795.3212, 2747.9695, 1524.4594, 2609.4395, 2503.5156]
2025-09-13 08:21:10,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:21:10,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 15 hours, 35 minutes, 8 seconds)
2025-09-13 08:32:03,822 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:32:03,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:37:06,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2385.90869 ± 476.854
2025-09-13 08:37:06,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2640.2988, 1714.5426, 2582.4663, 1502.4113, 2547.0308, 2752.5625, 1833.1145, 2890.1338, 2807.5742, 2588.9526]
2025-09-13 08:37:06,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:37:06,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 15 hours, 20 minutes, 30 seconds)
2025-09-13 08:47:57,270 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:47:57,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:52:59,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2514.16064 ± 707.085
2025-09-13 08:52:59,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2998.5298, 2937.4478, 2303.9087, 604.0344, 2010.4792, 2999.69, 2922.4597, 2708.9531, 2796.588, 2859.5132]
2025-09-13 08:52:59,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 08:52:59,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 15 hours, 5 minutes, 34 seconds)
2025-09-13 09:03:49,739 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:03:49,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:08:48,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2435.25073 ± 311.163
2025-09-13 09:08:48,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2855.466, 2605.3127, 1667.3115, 2312.7913, 2609.0195, 2501.82, 2319.7776, 2600.1619, 2236.9695, 2643.8792]
2025-09-13 09:08:48,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 09:08:48,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 14 hours, 50 minutes, 24 seconds)
2025-09-13 09:19:36,955 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:19:36,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:24:39,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2369.18848 ± 685.203
2025-09-13 09:24:39,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [530.38837, 2650.2612, 2911.747, 1865.8945, 2555.4136, 2753.9827, 2891.6172, 2178.8186, 2731.2266, 2622.536]
2025-09-13 09:24:39,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 09:24:39,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 14 hours, 33 minutes, 49 seconds)
2025-09-13 09:35:31,130 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:35:31,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:40:34,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2284.46338 ± 789.923
2025-09-13 09:40:34,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2722.914, 870.0788, 2652.5125, 2843.9495, 2627.231, 2582.42, 587.5486, 2702.6575, 2853.8647, 2401.454]
2025-09-13 09:40:34,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 09:40:34,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 14 hours, 17 minutes, 24 seconds)
2025-09-13 09:51:24,104 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:51:24,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:56:30,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1918.85974 ± 979.094
2025-09-13 09:56:30,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2209.5554, 2228.9983, 1613.7198, 2754.3262, 368.39944, 2848.3547, 933.2283, 384.2644, 2924.3079, 2923.4436]
2025-09-13 09:56:30,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 09:56:30,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 14 hours, 1 minute, 34 seconds)
2025-09-13 10:07:20,232 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:07:20,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:12:22,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2769.16821 ± 406.508
2025-09-13 10:12:22,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2718.9434, 2880.7727, 2814.4795, 2909.389, 2899.0974, 2710.2559, 2996.5212, 1608.7914, 3013.0354, 3140.3967]
2025-09-13 10:12:22,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:12:22,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2769.17) for latency ExtremeSparseL4U32
2025-09-13 10:12:22,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 13 hours, 45 minutes, 37 seconds)
2025-09-13 10:23:12,759 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:23:12,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:28:09,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1946.00024 ± 881.874
2025-09-13 10:28:09,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2613.6426, 2645.6863, 2905.1504, 2148.5164, 642.9321, 1889.6106, 611.84534, 748.9571, 2486.082, 2767.578]
2025-09-13 10:28:09,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:28:09,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 13 hours, 29 minutes, 20 seconds)
2025-09-13 10:38:58,743 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:38:58,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:43:57,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2384.88159 ± 584.204
2025-09-13 10:43:57,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [786.2519, 2404.6997, 2708.6338, 2348.108, 2485.6167, 2445.5598, 2837.5989, 2119.9653, 2762.5374, 2949.845]
2025-09-13 10:43:57,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:43:57,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 13 hours, 12 minutes, 56 seconds)
2025-09-13 10:54:47,666 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:54:47,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:59:46,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2042.38147 ± 997.501
2025-09-13 10:59:46,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [604.34064, 2533.8284, 2899.525, 3106.5813, 3049.0098, 2671.2942, 586.8201, 1725.3727, 676.2872, 2570.7556]
2025-09-13 10:59:46,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 10:59:46,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 12 hours, 56 minutes, 14 seconds)
2025-09-13 11:10:37,316 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:10:37,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:15:40,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2006.29553 ± 815.048
2025-09-13 11:15:40,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2886.2095, 1029.6356, 1517.1147, 928.7993, 1248.5875, 2852.2136, 1303.9233, 2785.1353, 2786.4124, 2724.924]
2025-09-13 11:15:40,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:15:40,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 40 minutes)
2025-09-13 11:26:29,917 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:26:29,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:31:28,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2200.56079 ± 807.235
2025-09-13 11:31:28,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2783.3457, 2683.242, 3003.8757, 2372.5186, 732.13434, 743.25494, 2951.443, 2653.2905, 2332.1194, 1750.3835]
2025-09-13 11:31:28,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:31:28,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 23 minutes, 33 seconds)
2025-09-13 11:42:19,413 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:42:19,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:47:20,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2048.35889 ± 920.821
2025-09-13 11:47:20,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2887.828, 697.09906, 1678.2478, 1033.208, 2962.2476, 1355.2766, 2863.3809, 2920.3574, 1032.8405, 3053.103]
2025-09-13 11:47:20,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 11:47:20,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 12 hours, 8 minutes, 27 seconds)
2025-09-13 11:58:12,191 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:58:12,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:03:17,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2369.24072 ± 647.145
2025-09-13 12:03:17,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2653.2231, 2691.813, 2626.4478, 695.6137, 2912.3457, 2981.3687, 1752.3556, 2289.4238, 2450.609, 2639.206]
2025-09-13 12:03:17,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 12:03:17,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 11 hours, 53 minutes, 56 seconds)
2025-09-13 12:14:09,127 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:14:09,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:19:06,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2502.45923 ± 635.705
2025-09-13 12:19:06,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3165.2407, 2536.2515, 1089.4607, 2600.607, 1621.3286, 2819.4553, 2409.777, 3055.4575, 3130.5298, 2596.484]
2025-09-13 12:19:06,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 12:19:06,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 11 hours, 38 minutes, 5 seconds)
2025-09-13 12:30:00,167 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:30:00,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:35:00,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2517.33936 ± 503.877
2025-09-13 12:35:00,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2845.099, 2825.1702, 3035.073, 2938.1025, 2748.6843, 2835.7559, 1465.1754, 2122.4478, 1829.143, 2528.7446]
2025-09-13 12:35:00,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 12:35:00,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 11 hours, 22 minutes, 14 seconds)
2025-09-13 12:45:55,464 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:45:55,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:50:56,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1957.17993 ± 934.287
2025-09-13 12:50:56,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2403.6, 2640.422, 711.31036, 867.27124, 3005.871, 593.03094, 1215.5323, 2826.773, 2859.8223, 2448.166]
2025-09-13 12:50:56,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 12:50:56,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 11 hours, 7 minutes, 25 seconds)
2025-09-13 13:01:46,851 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:01:46,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:06:48,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2609.36377 ± 672.992
2025-09-13 13:06:48,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3187.788, 2932.7744, 2774.8628, 2980.7952, 1802.7142, 3093.3215, 3115.3474, 1082.4312, 2088.6724, 3034.9302]
2025-09-13 13:06:48,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 13:06:48,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 10 hours, 51 minutes, 35 seconds)
2025-09-13 13:17:39,532 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:17:39,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:22:37,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2182.98364 ± 847.890
2025-09-13 13:22:37,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2198.2417, 2591.4106, 901.96063, 2710.4822, 739.2166, 2672.5527, 3123.3835, 2899.0542, 2806.7822, 1186.751]
2025-09-13 13:22:37,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 13:22:37,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 10 hours, 34 minutes, 46 seconds)
2025-09-13 13:33:28,588 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:33:28,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:38:26,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2701.03906 ± 534.719
2025-09-13 13:38:26,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2831.968, 2812.4412, 2031.1558, 2427.199, 2926.4563, 3280.363, 3082.8132, 1492.1786, 3221.874, 2903.941]
2025-09-13 13:38:26,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 13:38:26,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 10 hours, 18 minutes, 50 seconds)
2025-09-13 13:49:17,253 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:49:17,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:54:20,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2487.57837 ± 849.587
2025-09-13 13:54:20,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3093.9082, 3045.857, 3260.0354, 2910.8132, 2843.418, 595.04474, 2460.1104, 2947.7874, 1133.8723, 2584.9377]
2025-09-13 13:54:20,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 13:54:20,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 10 hours, 3 minutes, 2 seconds)
2025-09-13 14:05:13,231 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:05:13,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:10:16,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2708.52344 ± 624.944
2025-09-13 14:10:16,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3118.0852, 2908.905, 2962.638, 2023.9155, 2689.9717, 1097.3005, 2919.403, 3115.9778, 3200.0657, 3048.9717]
2025-09-13 14:10:16,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 14:10:16,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 9 hours, 47 minutes, 5 seconds)
2025-09-13 14:21:07,552 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:21:07,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:26:11,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2546.44849 ± 663.234
2025-09-13 14:26:11,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1897.7129, 2657.5002, 858.3989, 2808.4102, 3268.074, 2637.8, 3140.8608, 2664.084, 2600.0518, 2931.5945]
2025-09-13 14:26:11,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 14:26:11,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 9 hours, 31 minutes, 37 seconds)
2025-09-13 14:37:03,113 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:37:03,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:42:03,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2105.29688 ± 1090.847
2025-09-13 14:42:03,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3107.0383, 2941.527, 3150.3987, 376.5943, 3236.4014, 2550.2935, 2838.834, 1037.5709, 695.69696, 1118.614]
2025-09-13 14:42:03,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 14:42:03,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 9 hours, 15 minutes, 58 seconds)
2025-09-13 14:52:52,578 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:52:52,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:57:50,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2663.17603 ± 517.585
2025-09-13 14:57:50,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2180.794, 2437.0703, 3015.6458, 2501.1663, 1466.4631, 2942.6726, 2688.4885, 3308.5479, 2933.7058, 3157.2075]
2025-09-13 14:57:50,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 14:57:50,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 8 hours, 59 minutes, 54 seconds)
2025-09-13 15:08:40,935 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:08:40,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:13:41,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2242.79492 ± 896.891
2025-09-13 15:13:41,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [750.3975, 2555.287, 3486.0232, 2745.3997, 1493.7292, 2831.2258, 1540.3231, 2954.4087, 3017.3933, 1053.7622]
2025-09-13 15:13:41,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 15:13:41,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 43 minutes, 39 seconds)
2025-09-13 15:24:35,522 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:24:35,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:29:35,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2677.63989 ± 733.814
2025-09-13 15:29:35,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3264.7844, 2854.7473, 655.2521, 2548.4294, 2893.1953, 3473.8804, 2568.6125, 2980.584, 2993.5696, 2543.344]
2025-09-13 15:29:35,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 15:29:35,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 27 minutes, 39 seconds)
2025-09-13 15:40:28,737 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:40:28,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:45:32,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2399.90381 ± 957.769
2025-09-13 15:45:32,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [559.87866, 2863.504, 3042.6558, 2995.5835, 3260.888, 532.1748, 2658.4575, 2752.9846, 3013.8096, 2319.099]
2025-09-13 15:45:32,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 15:45:32,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 11 minutes, 58 seconds)
2025-09-13 15:56:24,018 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:56:24,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:01:24,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2195.51807 ± 916.549
2025-09-13 16:01:24,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3037.938, 3119.3943, 2886.4248, 2064.3252, 1274.0763, 1053.7216, 1293.672, 2997.1033, 947.8545, 3280.669]
2025-09-13 16:01:24,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 16:01:24,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 56 minutes, 8 seconds)
2025-09-13 16:12:16,272 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:12:16,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:17:16,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2081.16602 ± 1065.627
2025-09-13 16:17:16,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1393.3661, 3092.8716, 675.2376, 2820.1643, 3279.49, 688.11566, 2986.1416, 551.7551, 2411.0068, 2913.5122]
2025-09-13 16:17:16,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 16:17:16,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 40 minutes, 43 seconds)
2025-09-13 16:28:10,075 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:28:10,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:33:10,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2353.37500 ± 1016.573
2025-09-13 16:33:10,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1888.4656, 2894.4187, 3089.5535, 2518.8052, 3275.0244, 3257.8347, 702.65, 2927.304, 256.38217, 2723.313]
2025-09-13 16:33:10,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 16:33:10,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 25 minutes, 5 seconds)
2025-09-13 16:44:00,771 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:44:00,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:49:04,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2322.25000 ± 985.710
2025-09-13 16:49:04,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3171.5598, 2964.5574, 2804.3896, 1591.4379, 665.2647, 546.70605, 3275.9138, 3214.667, 2770.3835, 2217.6218]
2025-09-13 16:49:04,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 16:49:04,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 9 minutes, 10 seconds)
2025-09-13 16:59:54,941 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:59:54,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:04:57,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2524.23486 ± 747.935
2025-09-13 17:04:57,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2328.5417, 2953.1975, 2795.804, 1067.7147, 2376.9976, 3284.2512, 1231.816, 3031.8015, 3180.6528, 2991.5728]
2025-09-13 17:04:57,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 17:04:57,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 52 minutes, 54 seconds)
2025-09-13 17:15:48,344 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:15:48,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:20:52,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2641.63086 ± 679.283
2025-09-13 17:20:52,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3097.1333, 2748.996, 3020.996, 1776.644, 3043.7866, 3326.7417, 3042.676, 2882.8477, 2452.6597, 1023.82855]
2025-09-13 17:20:52,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 17:20:52,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 37 minutes, 16 seconds)
2025-09-13 17:31:43,131 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:31:43,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:36:47,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2227.25513 ± 1037.566
2025-09-13 17:36:47,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [220.32489, 3009.2363, 3323.8696, 2077.2676, 2699.9734, 3144.3677, 558.13635, 2638.1455, 2960.54, 1640.6907]
2025-09-13 17:36:47,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 17:36:47,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 21 minutes, 37 seconds)
2025-09-13 17:47:37,938 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:47:37,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:52:37,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2546.31714 ± 907.996
2025-09-13 17:52:37,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [480.78677, 3084.5393, 2839.7542, 2804.9268, 2892.618, 3439.823, 1115.7789, 2710.5398, 3192.1729, 2902.2336]
2025-09-13 17:52:37,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 17:52:37,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 5 minutes, 29 seconds)
2025-09-13 18:03:29,238 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:03:29,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:08:35,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2974.25415 ± 344.194
2025-09-13 18:08:35,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2839.218, 3028.2898, 3298.2559, 2541.8755, 2844.7603, 3165.9873, 3194.8035, 3146.4639, 2247.9355, 3434.9517]
2025-09-13 18:08:35,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 18:08:35,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2974.25) for latency ExtremeSparseL4U32
2025-09-13 18:08:35,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 49 minutes, 54 seconds)
2025-09-13 18:19:30,673 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:19:30,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:24:30,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2719.17920 ± 715.595
2025-09-13 18:24:30,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1603.9485, 2741.636, 3617.4111, 2565.4873, 1246.7489, 3137.6318, 2684.191, 3185.488, 3124.1917, 3285.0571]
2025-09-13 18:24:30,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 18:24:30,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 34 minutes, 6 seconds)
2025-09-13 18:35:23,270 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:35:23,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:40:26,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2865.52808 ± 613.095
2025-09-13 18:40:26,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2902.5942, 2717.6226, 1185.2278, 2631.5789, 3066.1726, 3520.1313, 3110.264, 3327.0488, 3032.7317, 3161.9092]
2025-09-13 18:40:26,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 18:40:27,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 18 minutes, 19 seconds)
2025-09-13 18:51:18,985 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:51:18,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:56:20,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2417.21143 ± 1021.035
2025-09-13 18:56:20,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [900.6887, 3369.2893, 1035.0021, 3210.819, 3332.4084, 868.5598, 2272.9556, 2855.8018, 3404.2886, 2922.3018]
2025-09-13 18:56:20,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 18:56:20,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 2 minutes, 18 seconds)
2025-09-13 19:07:10,253 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:07:10,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:12:10,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2496.05664 ± 941.193
2025-09-13 19:12:10,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3003.3098, 3318.0852, 3155.3645, 2545.7021, 3216.9219, 580.5386, 2058.7573, 3242.944, 2890.4075, 948.53503]
2025-09-13 19:12:10,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:12:10,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 46 minutes, 23 seconds)
2025-09-13 19:23:03,322 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:23:03,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:28:10,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2424.47803 ± 1055.449
2025-09-13 19:28:10,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3346.665, 340.50433, 581.77167, 3398.375, 3302.4915, 2866.9314, 2978.2258, 2053.3184, 2526.8635, 2849.6335]
2025-09-13 19:28:10,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:28:10,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 30 minutes, 32 seconds)
2025-09-13 19:39:01,495 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:39:01,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:43:59,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2986.97656 ± 601.733
2025-09-13 19:43:59,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2884.131, 3267.002, 2991.611, 3096.8772, 1236.1953, 3325.9006, 3218.4966, 3166.082, 3308.2532, 3375.217]
2025-09-13 19:43:59,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:43:59,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2986.98) for latency ExtremeSparseL4U32
2025-09-13 19:43:59,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 14 minutes, 21 seconds)
2025-09-13 19:54:54,942 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:54:54,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:00:02,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2295.71021 ± 1184.201
2025-09-13 20:00:02,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3245.9976, 3126.3777, 3061.2712, 1144.1873, 3003.1936, 2279.4287, 309.0507, 2918.098, 288.14505, 3581.352]
2025-09-13 20:00:02,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 20:00:02,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 58 minutes, 45 seconds)
2025-09-13 20:10:54,426 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:10:54,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:15:57,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2142.87012 ± 667.554
2025-09-13 20:15:57,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2148.1619, 2639.6694, 2358.5486, 959.6634, 1886.644, 3079.0933, 1360.3212, 1959.0741, 1850.9187, 3186.6072]
2025-09-13 20:15:57,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 20:15:57,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 42 minutes, 54 seconds)
2025-09-13 20:26:47,873 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:26:47,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:31:50,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2459.50537 ± 958.999
2025-09-13 20:31:50,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3219.0422, 3100.9788, 3394.0842, 3080.2544, 3113.3809, 810.39575, 2865.9844, 903.0798, 2719.4546, 1388.3994]
2025-09-13 20:31:50,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 20:31:50,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 27 minutes, 6 seconds)
2025-09-13 20:42:41,967 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:42:41,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:47:41,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3025.24561 ± 448.032
2025-09-13 20:47:41,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1953.8958, 3451.36, 2526.157, 2992.2932, 3336.7168, 3272.5107, 3414.1382, 2846.693, 3189.3218, 3269.3665]
2025-09-13 20:47:41,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 20:47:41,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (3025.25) for latency ExtremeSparseL4U32
2025-09-13 20:47:41,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 10 minutes, 51 seconds)
2025-09-13 20:58:31,920 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:58:31,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:03:31,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2910.80908 ± 569.822
2025-09-13 21:03:31,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3341.8508, 2948.7378, 1362.3517, 2887.0261, 3321.923, 3452.9998, 2901.3704, 2739.1912, 2837.436, 3315.2026]
2025-09-13 21:03:31,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 21:03:31,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 54 minutes, 58 seconds)
2025-09-13 21:14:22,674 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:14:22,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:19:25,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2704.44092 ± 854.531
2025-09-13 21:19:25,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1030.9863, 3220.3435, 3441.0981, 1173.3165, 2380.697, 2975.4868, 3505.353, 2968.8381, 3079.9595, 3268.33]
2025-09-13 21:19:25,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 21:19:25,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 38 minutes, 47 seconds)
2025-09-13 21:30:18,734 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:30:18,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:35:21,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3108.25098 ± 505.231
2025-09-13 21:35:21,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3336.3135, 3365.0059, 3076.0745, 2959.3586, 3390.9116, 3491.6223, 3165.326, 3620.826, 1732.8727, 2944.1995]
2025-09-13 21:35:21,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 21:35:21,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (3108.25) for latency ExtremeSparseL4U32
2025-09-13 21:35:21,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 22 minutes, 55 seconds)
2025-09-13 21:46:12,701 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:46:12,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:51:10,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2623.94092 ± 984.833
2025-09-13 21:51:10,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3007.3862, 3292.6843, 3458.4119, 3412.8267, 3457.3013, 630.7844, 3303.0159, 2146.2107, 2456.5134, 1074.2733]
2025-09-13 21:51:10,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 21:51:10,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 6 minutes, 56 seconds)
2025-09-13 22:02:03,085 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:02:03,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:07:08,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2533.37646 ± 984.646
2025-09-13 22:07:08,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [805.2634, 847.7496, 1670.2227, 2819.2056, 3292.983, 2779.9375, 3396.883, 3499.1292, 2942.4385, 3279.9521]
2025-09-13 22:07:08,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 22:07:08,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 51 minutes, 13 seconds)
2025-09-13 22:18:01,946 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:18:01,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:23:05,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2689.32373 ± 779.942
2025-09-13 22:23:05,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3056.8938, 3102.0295, 598.8125, 2692.7427, 2121.5396, 3410.9053, 2807.6929, 3284.4976, 3132.1477, 2685.9739]
2025-09-13 22:23:05,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 22:23:05,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 35 minutes, 28 seconds)
2025-09-13 22:33:56,844 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:33:56,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:38:55,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2618.00562 ± 789.406
2025-09-13 22:38:55,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [898.5911, 2858.2834, 3269.2239, 1796.1833, 2004.9543, 3486.4148, 3155.4385, 3374.199, 2915.6826, 2421.0842]
2025-09-13 22:38:55,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 22:38:55,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 19 minutes, 30 seconds)
2025-09-13 22:49:49,516 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:49:49,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:54:52,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1981.58789 ± 1315.051
2025-09-13 22:54:52,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3465.4617, 3177.2358, 305.79074, 3300.0745, 533.5111, 992.61194, 1152.3435, 476.21417, 3414.5476, 2998.087]
2025-09-13 22:54:52,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 22:54:52,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 3 minutes, 36 seconds)
2025-09-13 23:05:44,291 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:05:44,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:10:48,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2910.73633 ± 384.683
2025-09-13 23:10:48,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3094.3403, 3033.7659, 3280.6326, 2985.0762, 2490.568, 2025.7418, 3387.217, 2853.063, 3189.807, 2767.1494]
2025-09-13 23:10:48,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 23:10:48,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 47 minutes, 47 seconds)
2025-09-13 23:21:40,873 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:21:40,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:26:41,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3218.57080 ± 154.023
2025-09-13 23:26:41,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3192.2683, 3177.99, 3186.9146, 3237.289, 3312.5679, 3473.15, 3305.0015, 3127.886, 3320.007, 2852.633]
2025-09-13 23:26:41,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 23:26:41,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (3218.57) for latency ExtremeSparseL4U32
2025-09-13 23:26:41,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 31 minutes, 49 seconds)
2025-09-13 23:37:35,887 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:37:35,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:42:34,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2638.44824 ± 797.871
2025-09-13 23:42:34,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1882.0356, 3452.2874, 3564.8347, 1445.9432, 3343.7222, 2418.7244, 3317.1206, 1862.4191, 1765.5574, 3331.8372]
2025-09-13 23:42:34,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 23:42:34,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 15 minutes, 53 seconds)
2025-09-13 23:53:28,542 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:53:28,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:58:29,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2717.39844 ± 776.767
2025-09-13 23:58:29,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2460.4429, 3209.7632, 2856.0183, 3136.704, 1011.88715, 3652.0305, 2957.429, 3380.9412, 2895.1726, 1613.5981]
2025-09-13 23:58:29,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 23:58:29,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1251 [DEBUG]: Training session finished
