2025-09-12 23:05:47,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc5-hopper/ExtremeSparseL4U32-mbpac_memdelay
2025-09-12 23:05:47,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc5-hopper/ExtremeSparseL4U32-mbpac_memdelay
2025-09-12 23:05:47,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14967e140190>}
2025-09-12 23:05:47,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1111 [DEBUG]: using device: cuda
2025-09-12 23:05:47,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1133 [INFO]: Creating new trainer
2025-09-12 23:05:47,349 baseline-mbpac-noiseperc5-hopper:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-09-12 23:05:47,349 baseline-mbpac-noiseperc5-hopper:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 23:05:47,356 baseline-mbpac-noiseperc5-hopper:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=3, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-12 23:05:48,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1194 [DEBUG]: Starting training session...
2025-09-12 23:05:48,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 1/100
2025-09-12 23:16:45,286 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:16:45,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:17:00,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 87.91077 ± 60.808
2025-09-12 23:17:00,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [53.988457, 39.52504, 53.574963, 200.62537, 207.66295, 71.73175, 45.55342, 36.88111, 101.811966, 67.75271]
2025-09-12 23:17:00,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [35.0, 28.0, 32.0, 92.0, 91.0, 57.0, 36.0, 26.0, 58.0, 42.0]
2025-09-12 23:17:00,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (87.91) for latency ExtremeSparseL4U32
2025-09-12 23:17:00,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 18 hours, 28 minutes, 35 seconds)
2025-09-12 23:27:28,389 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:27:28,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:28:10,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 156.40359 ± 93.112
2025-09-12 23:28:10,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [157.67903, 134.91096, 114.88832, 87.23113, 201.45842, 20.378012, 88.7762, 348.7043, 124.63928, 285.37036]
2025-09-12 23:28:10,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [134.0, 125.0, 108.0, 93.0, 171.0, 25.0, 77.0, 324.0, 127.0, 256.0]
2025-09-12 23:28:10,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (156.40) for latency ExtremeSparseL4U32
2025-09-12 23:28:10,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 18 hours, 15 minutes, 42 seconds)
2025-09-12 23:38:35,606 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:38:35,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:38:58,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 148.06598 ± 51.720
2025-09-12 23:38:58,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [154.17928, 123.43121, 209.63672, 48.391926, 228.26506, 106.62279, 114.71374, 144.69518, 146.93507, 203.7888]
2025-09-12 23:38:58,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 68.0, 120.0, 31.0, 105.0, 60.0, 85.0, 85.0, 77.0, 95.0]
2025-09-12 23:38:58,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 17 hours, 52 minutes, 30 seconds)
2025-09-12 23:49:10,817 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:49:10,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:49:37,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 194.73169 ± 64.195
2025-09-12 23:49:37,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [176.90509, 224.65532, 141.55489, 117.09906, 209.8998, 301.6103, 213.89832, 268.01263, 79.14954, 214.53203]
2025-09-12 23:49:37,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [81.0, 96.0, 75.0, 63.0, 95.0, 138.0, 94.0, 111.0, 47.0, 110.0]
2025-09-12 23:49:37,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (194.73) for latency ExtremeSparseL4U32
2025-09-12 23:49:37,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 17 hours, 31 minutes, 45 seconds)
2025-09-13 00:00:01,915 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:00:01,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:00:22,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 144.09689 ± 63.567
2025-09-13 00:00:22,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [56.995373, 151.09615, 165.1456, 150.71143, 217.23804, 179.60175, 28.852146, 120.14753, 250.30539, 120.87541]
2025-09-13 00:00:22,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [36.0, 75.0, 80.0, 75.0, 94.0, 84.0, 32.0, 76.0, 103.0, 64.0]
2025-09-13 00:00:22,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 17 hours, 16 minutes, 53 seconds)
2025-09-13 00:10:41,829 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:10:41,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:11:23,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 253.22302 ± 131.361
2025-09-13 00:11:23,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [104.02713, 78.51269, 252.65544, 404.87387, 141.855, 471.66428, 198.65544, 405.00613, 308.7171, 166.26324]
2025-09-13 00:11:23,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [63.0, 48.0, 154.0, 203.0, 105.0, 276.0, 115.0, 224.0, 145.0, 93.0]
2025-09-13 00:11:23,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (253.22) for latency ExtremeSparseL4U32
2025-09-13 00:11:23,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 17 hours, 2 minutes, 30 seconds)
2025-09-13 00:21:46,108 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:21:46,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:22:24,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 223.67131 ± 96.877
2025-09-13 00:22:24,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [325.49173, 309.43106, 303.0504, 212.60312, 74.40956, 336.0221, 90.86888, 97.80847, 259.926, 227.1017]
2025-09-13 00:22:24,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [150.0, 173.0, 149.0, 125.0, 44.0, 171.0, 68.0, 70.0, 189.0, 185.0]
2025-09-13 00:22:24,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 16 hours, 48 minutes, 51 seconds)
2025-09-13 00:32:44,874 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:32:44,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:33:10,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 169.06293 ± 91.810
2025-09-13 00:33:10,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [128.65147, 67.13519, 113.32711, 342.6399, 96.672005, 318.8343, 215.48918, 148.72679, 76.31272, 182.84058]
2025-09-13 00:33:10,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [73.0, 42.0, 67.0, 158.0, 56.0, 150.0, 113.0, 78.0, 55.0, 94.0]
2025-09-13 00:33:10,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 16 hours, 37 minutes, 11 seconds)
2025-09-13 00:43:31,303 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:43:31,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:44:04,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 241.96521 ± 103.217
2025-09-13 00:44:04,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [375.2758, 219.86923, 101.24158, 53.656513, 328.4235, 368.73743, 173.72644, 244.26794, 311.2744, 243.17938]
2025-09-13 00:44:04,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [148.0, 100.0, 57.0, 33.0, 171.0, 143.0, 84.0, 151.0, 132.0, 107.0]
2025-09-13 00:44:04,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 16 hours, 30 minutes, 51 seconds)
2025-09-13 00:54:25,958 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:54:25,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:55:10,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 315.76251 ± 199.428
2025-09-13 00:55:10,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [157.4745, 332.24234, 162.88445, 757.84326, 153.02383, 381.23346, 27.713572, 467.29465, 440.03287, 277.8824]
2025-09-13 00:55:10,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [105.0, 164.0, 115.0, 282.0, 101.0, 184.0, 33.0, 207.0, 188.0, 140.0]
2025-09-13 00:55:10,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (315.76) for latency ExtremeSparseL4U32
2025-09-13 00:55:10,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 16 hours, 26 minutes, 15 seconds)
2025-09-13 01:05:30,490 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:05:30,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:06:04,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 238.30331 ± 116.566
2025-09-13 01:06:04,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [76.778854, 381.94772, 70.53407, 164.67264, 401.92758, 252.02223, 227.46164, 139.49332, 348.0534, 320.14154]
2025-09-13 01:06:04,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [46.0, 167.0, 44.0, 84.0, 188.0, 118.0, 109.0, 74.0, 155.0, 150.0]
2025-09-13 01:06:04,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 16 hours, 13 minutes, 12 seconds)
2025-09-13 01:16:41,104 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:16:41,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:17:24,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 346.52280 ± 206.917
2025-09-13 01:17:24,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [626.99133, 67.17299, 502.27563, 107.56204, 215.14381, 635.6212, 452.63202, 225.64775, 491.48248, 140.69875]
2025-09-13 01:17:24,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [216.0, 43.0, 182.0, 58.0, 136.0, 259.0, 173.0, 109.0, 201.0, 74.0]
2025-09-13 01:17:24,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (346.52) for latency ExtremeSparseL4U32
2025-09-13 01:17:24,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 16 hours, 7 minutes, 52 seconds)
2025-09-13 01:27:29,337 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:27:29,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:28:14,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 370.53830 ± 221.089
2025-09-13 01:28:14,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [260.4143, 640.95874, 218.577, 717.51324, 537.60864, 132.42778, 613.64655, 203.32832, 298.93045, 81.978004]
2025-09-13 01:28:14,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [117.0, 229.0, 98.0, 279.0, 222.0, 67.0, 216.0, 92.0, 134.0, 58.0]
2025-09-13 01:28:14,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (370.54) for latency ExtremeSparseL4U32
2025-09-13 01:28:14,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 15 hours, 58 minutes, 3 seconds)
2025-09-13 01:38:47,478 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:38:47,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:39:27,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 290.61383 ± 82.676
2025-09-13 01:39:27,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [303.6468, 331.86856, 201.07645, 408.72952, 307.3744, 148.4225, 253.87578, 295.32272, 427.56046, 228.26094]
2025-09-13 01:39:27,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [138.0, 145.0, 96.0, 203.0, 139.0, 101.0, 133.0, 127.0, 195.0, 108.0]
2025-09-13 01:39:27,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 15 hours, 52 minutes, 40 seconds)
2025-09-13 01:49:38,855 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:49:38,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:50:50,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 586.43396 ± 226.402
2025-09-13 01:50:50,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [964.2563, 765.86255, 927.569, 426.7198, 251.67343, 593.63074, 558.3349, 602.8498, 424.186, 349.2575]
2025-09-13 01:50:50,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [368.0, 334.0, 382.0, 169.0, 120.0, 258.0, 226.0, 264.0, 199.0, 158.0]
2025-09-13 01:50:50,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (586.43) for latency ExtremeSparseL4U32
2025-09-13 01:50:50,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 15 hours, 46 minutes, 16 seconds)
2025-09-13 02:01:15,374 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:01:15,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:02:10,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 448.88177 ± 324.346
2025-09-13 02:02:10,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [196.19923, 881.5909, 398.08942, 422.7643, 27.871214, 752.81323, 137.81662, 207.31053, 1058.0608, 406.30164]
2025-09-13 02:02:10,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 322.0, 158.0, 194.0, 33.0, 338.0, 71.0, 99.0, 428.0, 176.0]
2025-09-13 02:02:10,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 15 hours, 42 minutes, 27 seconds)
2025-09-13 02:12:03,432 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:12:03,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:12:30,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 190.15341 ± 34.890
2025-09-13 02:12:30,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [178.80943, 177.74039, 146.6755, 247.19565, 182.10492, 169.71866, 164.55684, 264.69177, 185.43066, 184.61018]
2025-09-13 02:12:30,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [89.0, 90.0, 74.0, 111.0, 90.0, 86.0, 84.0, 129.0, 93.0, 94.0]
2025-09-13 02:12:30,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 15 hours, 14 minutes, 52 seconds)
2025-09-13 02:22:35,856 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:22:35,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:23:59,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 745.62103 ± 411.625
2025-09-13 02:23:59,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1054.6338, 594.3422, 131.74438, 135.99307, 1498.1638, 987.5756, 1099.9764, 509.27936, 621.99, 822.51105]
2025-09-13 02:23:59,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [426.0, 237.0, 69.0, 69.0, 546.0, 392.0, 407.0, 206.0, 250.0, 306.0]
2025-09-13 02:23:59,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (745.62) for latency ExtremeSparseL4U32
2025-09-13 02:23:59,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 15 hours, 14 minutes, 24 seconds)
2025-09-13 02:34:31,047 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:34:31,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:35:34,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 550.56433 ± 299.346
2025-09-13 02:35:34,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [57.10828, 891.1784, 390.92892, 668.73724, 898.98737, 764.9865, 277.52478, 598.5762, 827.07794, 130.5383]
2025-09-13 02:35:34,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [35.0, 322.0, 162.0, 251.0, 363.0, 308.0, 119.0, 255.0, 296.0, 66.0]
2025-09-13 02:35:34,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 15 hours, 8 minutes, 58 seconds)
2025-09-13 02:45:37,913 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:45:37,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:46:27,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 395.50885 ± 256.931
2025-09-13 02:46:27,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [389.35736, 875.7173, 103.28365, 354.5636, 74.90789, 766.8859, 435.5894, 85.032104, 417.26794, 452.4832]
2025-09-13 02:46:27,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [164.0, 354.0, 76.0, 153.0, 45.0, 321.0, 186.0, 48.0, 176.0, 192.0]
2025-09-13 02:46:27,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 14 hours, 50 minutes, 3 seconds)
2025-09-13 02:56:40,476 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:56:40,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:57:34,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 439.96201 ± 297.034
2025-09-13 02:57:34,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [406.86096, 292.7384, 899.3526, 235.97751, 112.98345, 1087.9485, 170.81305, 432.99402, 366.11908, 393.8327]
2025-09-13 02:57:34,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [171.0, 141.0, 327.0, 108.0, 69.0, 438.0, 88.0, 203.0, 157.0, 170.0]
2025-09-13 02:57:34,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 14 hours, 35 minutes, 24 seconds)
2025-09-13 03:07:51,609 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:07:51,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:08:31,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 332.51221 ± 220.201
2025-09-13 03:08:31,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [244.49042, 725.5454, 161.38342, 421.52887, 91.49846, 86.21353, 672.7964, 395.13504, 400.00922, 126.52123]
2025-09-13 03:08:31,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [110.0, 260.0, 79.0, 180.0, 66.0, 51.0, 255.0, 161.0, 180.0, 67.0]
2025-09-13 03:08:31,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 14 hours, 33 minutes, 52 seconds)
2025-09-13 03:18:34,948 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:18:34,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:19:24,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 420.89771 ± 192.418
2025-09-13 03:19:24,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [588.3692, 439.6185, 206.01637, 738.7562, 380.2365, 196.78456, 584.79816, 541.3086, 430.4781, 102.61071]
2025-09-13 03:19:24,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [225.0, 185.0, 97.0, 279.0, 164.0, 95.0, 236.0, 215.0, 189.0, 56.0]
2025-09-13 03:19:24,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 14 hours, 13 minutes, 28 seconds)
2025-09-13 03:29:36,774 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:29:36,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:30:33,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 474.76181 ± 396.745
2025-09-13 03:30:33,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1420.3683, 857.9151, 254.6669, 816.5793, 257.57565, 143.6209, 234.04985, 318.62155, 199.78973, 244.43109]
2025-09-13 03:30:33,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [504.0, 346.0, 109.0, 304.0, 142.0, 102.0, 108.0, 128.0, 97.0, 107.0]
2025-09-13 03:30:33,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 13 hours, 55 minutes, 44 seconds)
2025-09-13 03:40:46,900 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:40:46,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:41:48,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 553.43884 ± 348.428
2025-09-13 03:41:48,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [25.760773, 406.26303, 879.763, 883.7602, 852.4559, 228.4598, 1036.75, 409.11536, 85.55161, 726.5081]
2025-09-13 03:41:48,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [28.0, 161.0, 351.0, 307.0, 302.0, 100.0, 344.0, 169.0, 67.0, 288.0]
2025-09-13 03:41:48,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 13 hours, 50 minutes, 2 seconds)
2025-09-13 03:52:00,178 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:52:00,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:53:21,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 709.79041 ± 475.421
2025-09-13 03:53:21,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [734.92395, 1366.0927, 1552.531, 426.25983, 161.79083, 410.2998, 448.02774, 71.479195, 1111.3033, 815.19604]
2025-09-13 03:53:21,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [256.0, 500.0, 588.0, 209.0, 81.0, 203.0, 195.0, 45.0, 410.0, 319.0]
2025-09-13 03:53:21,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 13 hours, 45 minutes, 38 seconds)
2025-09-13 04:03:37,466 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:03:37,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:04:51,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 630.76306 ± 633.270
2025-09-13 04:04:51,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [533.30554, 224.24695, 2412.6714, 22.89419, 791.4472, 529.782, 521.85345, 192.98949, 670.7028, 407.73746]
2025-09-13 04:04:51,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [242.0, 107.0, 897.0, 29.0, 313.0, 204.0, 205.0, 92.0, 257.0, 177.0]
2025-09-13 04:04:51,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 13 hours, 42 minutes, 15 seconds)
2025-09-13 04:15:01,537 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:15:01,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:15:38,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 301.76913 ± 229.937
2025-09-13 04:15:38,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [85.050964, 243.08379, 91.07469, 775.13025, 252.4316, 168.63217, 142.46793, 702.5246, 242.39789, 314.89746]
2025-09-13 04:15:38,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [49.0, 108.0, 52.0, 280.0, 110.0, 79.0, 71.0, 280.0, 111.0, 134.0]
2025-09-13 04:15:38,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 13 hours, 29 minutes, 45 seconds)
2025-09-13 04:26:01,649 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:26:01,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:26:42,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 319.90833 ± 238.620
2025-09-13 04:26:42,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [224.89706, 86.66219, 101.87284, 273.96252, 374.42773, 113.25222, 742.40247, 745.3122, 116.80962, 419.48453]
2025-09-13 04:26:42,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [101.0, 62.0, 74.0, 141.0, 157.0, 73.0, 266.0, 277.0, 61.0, 173.0]
2025-09-13 04:26:42,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 13 hours, 17 minutes, 19 seconds)
2025-09-13 04:36:50,277 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:36:50,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:38:09,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 702.44562 ± 572.737
2025-09-13 04:38:09,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1130.6539, 507.373, 110.93826, 83.40179, 598.1576, 169.44394, 2097.5532, 571.8187, 932.69366, 822.4216]
2025-09-13 04:38:09,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [440.0, 206.0, 78.0, 61.0, 240.0, 83.0, 784.0, 213.0, 333.0, 282.0]
2025-09-13 04:38:09,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 13 hours, 9 minutes, 2 seconds)
2025-09-13 04:48:27,826 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:48:27,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:49:52,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 774.54657 ± 229.971
2025-09-13 04:49:52,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [560.3141, 707.9312, 648.2235, 750.97955, 515.3443, 1144.8502, 717.66864, 1213.1757, 916.9112, 570.0673]
2025-09-13 04:49:52,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [214.0, 263.0, 242.0, 279.0, 201.0, 413.0, 285.0, 433.0, 335.0, 219.0]
2025-09-13 04:49:52,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (774.55) for latency ExtremeSparseL4U32
2025-09-13 04:49:52,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 12 hours, 59 minutes, 55 seconds)
2025-09-13 05:00:33,787 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:00:33,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:01:07,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 258.86716 ± 160.410
2025-09-13 05:01:07,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [285.6224, 187.84499, 662.6523, 212.29045, 246.52757, 103.29504, 124.31192, 424.37625, 148.47742, 193.27313]
2025-09-13 05:01:07,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 90.0, 245.0, 101.0, 127.0, 56.0, 64.0, 168.0, 85.0, 92.0]
2025-09-13 05:01:07,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 12 hours, 45 minutes, 16 seconds)
2025-09-13 05:11:09,601 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:11:09,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:11:59,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 443.02945 ± 205.032
2025-09-13 05:11:59,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [198.9293, 768.2565, 658.22394, 613.4153, 436.87564, 273.01862, 381.4029, 656.6014, 263.21915, 180.35168]
2025-09-13 05:11:59,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 268.0, 235.0, 229.0, 177.0, 118.0, 169.0, 234.0, 114.0, 85.0]
2025-09-13 05:11:59,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 12 hours, 35 minutes, 3 seconds)
2025-09-13 05:21:56,356 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:21:56,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:22:53,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 511.44135 ± 543.986
2025-09-13 05:22:53,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1103.083, 351.0657, 60.400414, 95.42976, 1432.3678, 1432.6254, 136.03398, 161.70412, 84.21095, 257.49246]
2025-09-13 05:22:53,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [387.0, 145.0, 39.0, 54.0, 517.0, 517.0, 69.0, 77.0, 47.0, 112.0]
2025-09-13 05:22:53,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 12 hours, 21 minutes, 44 seconds)
2025-09-13 05:33:22,955 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:33:22,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:34:07,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 370.44235 ± 149.322
2025-09-13 05:34:07,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [320.1969, 383.46445, 541.9507, 250.71985, 396.37012, 291.18533, 224.528, 672.17084, 470.65363, 153.18385]
2025-09-13 05:34:07,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [132.0, 155.0, 205.0, 125.0, 161.0, 120.0, 103.0, 248.0, 218.0, 94.0]
2025-09-13 05:34:07,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 12 hours, 7 minutes, 36 seconds)
2025-09-13 05:44:10,417 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:44:10,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:46:10,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1169.19604 ± 618.507
2025-09-13 05:46:10,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [389.06314, 1531.9244, 1670.2012, 631.385, 98.28086, 756.83636, 1940.9602, 1890.31, 1474.3137, 1308.6858]
2025-09-13 05:46:10,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [175.0, 504.0, 587.0, 246.0, 69.0, 271.0, 653.0, 643.0, 502.0, 449.0]
2025-09-13 05:46:10,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (1169.20) for latency ExtremeSparseL4U32
2025-09-13 05:46:10,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 12 hours, 42 seconds)
2025-09-13 05:56:27,856 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:56:27,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:57:12,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 369.56845 ± 188.235
2025-09-13 05:57:12,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [614.57623, 69.61073, 277.772, 124.767296, 473.8408, 134.67874, 517.0575, 474.94675, 484.17133, 524.2633]
2025-09-13 05:57:12,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [237.0, 42.0, 141.0, 66.0, 187.0, 69.0, 203.0, 190.0, 197.0, 207.0]
2025-09-13 05:57:12,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 11 hours, 46 minutes, 39 seconds)
2025-09-13 06:07:31,359 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:07:31,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:08:27,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 458.19385 ± 324.759
2025-09-13 06:08:27,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [86.24261, 767.96686, 510.52567, 991.8281, 204.09422, 793.17725, 91.844215, 740.35095, 139.57365, 256.33505]
2025-09-13 06:08:27,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [67.0, 307.0, 201.0, 352.0, 95.0, 323.0, 72.0, 309.0, 82.0, 134.0]
2025-09-13 06:08:27,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 11 hours, 40 minutes, 5 seconds)
2025-09-13 06:18:58,390 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:18:58,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:20:07,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 685.04749 ± 404.191
2025-09-13 06:20:07,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [71.02722, 504.89075, 216.2426, 905.7611, 909.7701, 1132.1954, 1169.1409, 773.69366, 1054.3103, 113.4425]
2025-09-13 06:20:07,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [42.0, 197.0, 104.0, 303.0, 300.0, 403.0, 383.0, 263.0, 341.0, 60.0]
2025-09-13 06:20:07,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 11 hours, 38 minutes, 5 seconds)
2025-09-13 06:30:03,739 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:30:03,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:31:18,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 676.27917 ± 442.349
2025-09-13 06:31:18,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1612.2102, 443.77618, 344.5884, 893.1084, 141.45946, 927.6751, 434.48605, 182.96942, 1142.4386, 640.0808]
2025-09-13 06:31:18,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [535.0, 195.0, 161.0, 309.0, 70.0, 313.0, 165.0, 87.0, 423.0, 258.0]
2025-09-13 06:31:18,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 11 hours, 26 minutes, 2 seconds)
2025-09-13 06:41:36,502 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:41:36,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:43:11,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 938.89539 ± 684.684
2025-09-13 06:43:11,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1203.0394, 22.990103, 1021.24805, 2159.2568, 812.44885, 278.91495, 1065.6265, 774.2286, 83.94537, 1967.2555]
2025-09-13 06:43:11,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [425.0, 28.0, 364.0, 700.0, 275.0, 122.0, 361.0, 261.0, 48.0, 647.0]
2025-09-13 06:43:11,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 11 hours, 12 minutes, 44 seconds)
2025-09-13 06:53:16,136 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:53:16,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:54:41,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 852.36853 ± 410.653
2025-09-13 06:54:41,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1032.8851, 809.58136, 998.15155, 368.2997, 539.81476, 793.029, 230.8798, 1764.7205, 1128.9546, 857.36914]
2025-09-13 06:54:41,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [327.0, 274.0, 335.0, 149.0, 207.0, 268.0, 103.0, 540.0, 368.0, 320.0]
2025-09-13 06:54:41,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 11 hours, 6 minutes, 46 seconds)
2025-09-13 07:05:25,785 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:05:25,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:06:30,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 648.19733 ± 293.352
2025-09-13 07:06:30,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [342.6773, 504.17194, 742.3846, 600.62585, 1301.4629, 849.84814, 229.97594, 866.3659, 439.549, 604.9116]
2025-09-13 07:06:30,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [157.0, 184.0, 239.0, 205.0, 399.0, 267.0, 101.0, 276.0, 162.0, 220.0]
2025-09-13 07:06:30,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 11 hours, 1 minute, 49 seconds)
2025-09-13 07:16:22,634 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:16:22,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:17:39,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 731.82318 ± 526.192
2025-09-13 07:17:39,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [958.69556, 195.72365, 755.30334, 501.11923, 285.6156, 234.94005, 191.52098, 1746.045, 975.7987, 1473.4697]
2025-09-13 07:17:39,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [327.0, 92.0, 276.0, 196.0, 138.0, 116.0, 89.0, 583.0, 365.0, 482.0]
2025-09-13 07:17:39,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 10 hours, 44 minutes, 22 seconds)
2025-09-13 07:27:53,028 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:27:53,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:29:06,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 672.21460 ± 566.853
2025-09-13 07:29:06,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [271.85065, 734.0495, 844.2751, 124.81685, 1708.6632, 154.75858, 209.95264, 987.1182, 133.94453, 1552.7169]
2025-09-13 07:29:06,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [126.0, 292.0, 319.0, 68.0, 543.0, 77.0, 119.0, 361.0, 67.0, 521.0]
2025-09-13 07:29:06,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 10 hours, 35 minutes, 50 seconds)
2025-09-13 07:39:31,015 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:39:31,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:40:22,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 453.53384 ± 261.792
2025-09-13 07:40:22,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [874.2226, 787.2691, 249.07372, 478.04483, 159.57861, 355.28064, 450.0555, 249.92873, 136.74135, 795.14325]
2025-09-13 07:40:22,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [329.0, 276.0, 107.0, 176.0, 79.0, 162.0, 174.0, 107.0, 68.0, 298.0]
2025-09-13 07:40:22,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 10 hours, 17 minutes, 35 seconds)
2025-09-13 07:50:30,400 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:50:30,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:51:53,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 817.91687 ± 749.097
2025-09-13 07:51:53,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [473.52042, 622.3591, 380.79138, 94.39215, 93.25832, 1957.947, 2318.7734, 129.98679, 1251.41, 856.7296]
2025-09-13 07:51:53,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [181.0, 252.0, 151.0, 69.0, 69.0, 622.0, 749.0, 68.0, 394.0, 301.0]
2025-09-13 07:51:53,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 10 hours, 6 minutes, 20 seconds)
2025-09-13 08:02:27,322 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:02:27,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:03:24,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 522.53607 ± 346.024
2025-09-13 08:03:24,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [259.0309, 661.87085, 908.61, 1086.8348, 737.3643, 822.1535, 23.873089, 199.0836, 356.40332, 170.13614]
2025-09-13 08:03:24,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [115.0, 252.0, 308.0, 367.0, 288.0, 285.0, 30.0, 116.0, 140.0, 80.0]
2025-09-13 08:03:24,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 9 hours, 51 minutes, 42 seconds)
2025-09-13 08:13:43,213 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:13:43,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:14:43,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 608.83960 ± 558.216
2025-09-13 08:14:43,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [149.15819, 134.37555, 935.09924, 155.02162, 108.99063, 1488.5011, 578.186, 91.9441, 1623.3663, 823.7535]
2025-09-13 08:14:43,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [73.0, 88.0, 295.0, 77.0, 72.0, 458.0, 214.0, 65.0, 495.0, 277.0]
2025-09-13 08:14:44,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 9 hours, 42 minutes, 14 seconds)
2025-09-13 08:24:42,709 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:24:42,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:25:51,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 623.21637 ± 694.580
2025-09-13 08:25:51,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [26.870083, 876.73956, 165.56674, 1045.9429, 116.89361, 2361.0544, 309.49408, 172.06389, 1044.6115, 112.92708]
2025-09-13 08:25:51,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [30.0, 318.0, 95.0, 370.0, 83.0, 813.0, 149.0, 87.0, 363.0, 60.0]
2025-09-13 08:25:51,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 9 hours, 27 minutes, 33 seconds)
2025-09-13 08:36:27,834 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:36:27,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:36:53,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 196.14107 ± 151.177
2025-09-13 08:36:53,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [98.325775, 237.13731, 124.13014, 73.50514, 128.06114, 480.61383, 492.52942, 103.09051, 91.38357, 132.63379]
2025-09-13 08:36:53,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [67.0, 103.0, 63.0, 44.0, 65.0, 180.0, 185.0, 56.0, 52.0, 67.0]
2025-09-13 08:36:53,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 9 hours, 13 minutes, 46 seconds)
2025-09-13 08:47:02,594 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:47:02,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:48:39,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 972.01874 ± 603.013
2025-09-13 08:48:39,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1175.9812, 1563.7607, 1927.3678, 1575.1952, 57.476852, 326.8867, 1195.033, 767.49414, 948.45056, 182.54233]
2025-09-13 08:48:39,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [415.0, 488.0, 592.0, 504.0, 55.0, 148.0, 400.0, 271.0, 318.0, 84.0]
2025-09-13 08:48:39,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 9 hours, 4 minutes, 55 seconds)
2025-09-13 08:58:51,766 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:58:51,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:00:11,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 825.33643 ± 807.260
2025-09-13 09:00:11,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [196.62097, 969.17255, 1561.0138, 689.2741, 97.83811, 2219.104, 216.66266, 2106.2134, 66.343636, 131.12155]
2025-09-13 09:00:11,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 319.0, 502.0, 254.0, 55.0, 657.0, 96.0, 659.0, 42.0, 71.0]
2025-09-13 09:00:11,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 8 hours, 53 minutes, 47 seconds)
2025-09-13 09:10:08,076 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:10:08,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:11:30,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 780.48792 ± 477.135
2025-09-13 09:11:30,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [495.6446, 1822.0879, 737.76355, 597.5263, 747.62555, 256.2598, 1400.7748, 204.37047, 568.32916, 974.497]
2025-09-13 09:11:30,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [212.0, 587.0, 268.0, 220.0, 279.0, 111.0, 495.0, 94.0, 214.0, 338.0]
2025-09-13 09:11:30,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 8 hours, 42 minutes, 16 seconds)
2025-09-13 09:22:43,861 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:22:43,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:24:10,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 833.40479 ± 735.804
2025-09-13 09:24:10,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2074.6462, 426.37796, 1073.3333, 112.0384, 295.2262, 137.1965, 516.72974, 2033.022, 1471.5173, 193.96]
2025-09-13 09:24:10,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [646.0, 189.0, 362.0, 76.0, 141.0, 68.0, 216.0, 654.0, 477.0, 113.0]
2025-09-13 09:24:10,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 8 hours, 44 minutes, 44 seconds)
2025-09-13 09:33:38,111 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:33:38,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:34:24,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 417.88290 ± 302.078
2025-09-13 09:34:24,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [118.475296, 394.8076, 351.1616, 730.9666, 177.37474, 827.7228, 954.88654, 91.28409, 103.41365, 428.73657]
2025-09-13 09:34:24,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 156.0, 139.0, 251.0, 83.0, 282.0, 319.0, 51.0, 56.0, 186.0]
2025-09-13 09:34:24,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 8 hours, 26 minutes, 9 seconds)
2025-09-13 09:44:32,066 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:44:32,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:45:52,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 789.76630 ± 669.236
2025-09-13 09:45:52,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [332.5267, 1166.3749, 127.58914, 2054.68, 118.7722, 1358.3336, 142.34093, 111.19391, 1129.901, 1355.9506]
2025-09-13 09:45:52,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [168.0, 349.0, 65.0, 681.0, 62.0, 418.0, 70.0, 79.0, 373.0, 448.0]
2025-09-13 09:45:52,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 8 hours, 12 minutes, 4 seconds)
2025-09-13 09:56:41,937 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:56:41,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:58:09,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 918.01624 ± 604.112
2025-09-13 09:58:09,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [95.956726, 924.7629, 1416.9224, 73.337296, 1610.1423, 1908.4283, 213.81535, 807.5429, 1053.5532, 1075.7002]
2025-09-13 09:58:09,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [53.0, 308.0, 452.0, 59.0, 488.0, 594.0, 96.0, 276.0, 379.0, 315.0]
2025-09-13 09:58:09,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 8 hours, 6 minutes, 58 seconds)
2025-09-13 10:08:16,415 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:08:16,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:10:13,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1193.50330 ± 1160.394
2025-09-13 10:10:13,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2956.8833, 697.48096, 1813.8827, 154.341, 336.0636, 3133.6194, 2256.9631, 167.2292, 318.9687, 99.60176]
2025-09-13 10:10:13,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [900.0, 243.0, 594.0, 92.0, 135.0, 1000.0, 697.0, 81.0, 152.0, 67.0]
2025-09-13 10:10:13,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (1193.50) for latency ExtremeSparseL4U32
2025-09-13 10:10:13,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 1 minute, 29 seconds)
2025-09-13 10:20:32,089 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:20:32,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:22:02,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 921.59747 ± 1001.003
2025-09-13 10:22:02,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [553.1592, 322.3455, 2176.8572, 228.97958, 483.7731, 170.09831, 510.38272, 27.90011, 3249.494, 1492.9855]
2025-09-13 10:22:02,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [195.0, 148.0, 643.0, 98.0, 182.0, 84.0, 180.0, 31.0, 1000.0, 483.0]
2025-09-13 10:22:02,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 7 hours, 42 minutes, 57 seconds)
2025-09-13 10:31:57,917 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:31:57,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:34:35,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1780.69897 ± 1061.964
2025-09-13 10:34:35,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3383.8413, 746.61694, 1398.9034, 3000.904, 3326.2114, 1259.4519, 1708.8972, 1879.0391, 159.8166, 943.30865]
2025-09-13 10:34:35,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 255.0, 428.0, 873.0, 1000.0, 397.0, 525.0, 580.0, 77.0, 305.0]
2025-09-13 10:34:35,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (1780.70) for latency ExtremeSparseL4U32
2025-09-13 10:34:35,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 7 hours, 49 minutes, 32 seconds)
2025-09-13 10:45:01,973 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:45:01,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:46:27,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 884.45782 ± 747.837
2025-09-13 10:46:27,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [541.23315, 967.17883, 396.51834, 1064.1375, 81.514145, 1864.0779, 1322.3525, 78.02735, 155.29681, 2374.2407]
2025-09-13 10:46:27,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [193.0, 322.0, 155.0, 380.0, 76.0, 603.0, 395.0, 47.0, 76.0, 736.0]
2025-09-13 10:46:27,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 7 hours, 40 minutes, 28 seconds)
2025-09-13 10:56:45,846 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:56:45,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:58:10,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 884.79736 ± 890.584
2025-09-13 10:58:10,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [122.935745, 104.413605, 86.55417, 2655.9648, 156.47025, 706.92096, 2068.4094, 726.1328, 447.5084, 1772.6633]
2025-09-13 10:58:10,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [63.0, 56.0, 65.0, 799.0, 75.0, 242.0, 630.0, 248.0, 179.0, 542.0]
2025-09-13 10:58:10,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 7 hours, 24 minutes, 7 seconds)
2025-09-13 11:08:19,178 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:08:19,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:09:07,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 452.65497 ± 572.976
2025-09-13 11:09:07,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [102.97664, 93.72911, 766.9128, 124.932755, 265.10397, 391.50308, 2051.1702, 501.64658, 104.24256, 124.33217]
2025-09-13 11:09:07,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [56.0, 52.0, 261.0, 64.0, 116.0, 154.0, 642.0, 206.0, 56.0, 64.0]
2025-09-13 11:09:07,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 4 minutes, 8 seconds)
2025-09-13 11:19:17,122 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:19:17,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:21:16,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1229.17847 ± 1169.940
2025-09-13 11:21:16,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1738.5938, 3253.7239, 271.79385, 2881.9778, 182.74599, 1431.4692, 2162.31, 150.43831, 145.88678, 72.84608]
2025-09-13 11:21:16,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [542.0, 1000.0, 113.0, 911.0, 86.0, 470.0, 670.0, 88.0, 92.0, 61.0]
2025-09-13 11:21:16,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 6 hours, 54 minutes, 40 seconds)
2025-09-13 11:31:51,164 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:31:51,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:32:34,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 345.84637 ± 314.763
2025-09-13 11:32:34,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [112.65189, 467.8959, 29.303621, 73.49486, 286.2242, 1161.8922, 333.45312, 430.6321, 457.5073, 105.40835]
2025-09-13 11:32:34,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 191.0, 32.0, 56.0, 142.0, 410.0, 171.0, 164.0, 180.0, 56.0]
2025-09-13 11:32:34,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 6 hours, 34 minutes, 13 seconds)
2025-09-13 11:42:44,965 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:42:44,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:44:47,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1275.81042 ± 1296.552
2025-09-13 11:44:47,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3016.0557, 381.70197, 3198.0425, 260.82, 316.2939, 60.95621, 554.8668, 1668.3772, 84.25317, 3216.7375]
2025-09-13 11:44:47,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [928.0, 182.0, 1000.0, 111.0, 155.0, 62.0, 203.0, 531.0, 69.0, 1000.0]
2025-09-13 11:44:47,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 6 hours, 25 minutes, 1 second)
2025-09-13 11:54:55,251 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:54:55,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:57:09,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1417.08447 ± 1168.403
2025-09-13 11:57:09,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3266.0422, 886.2176, 833.11285, 1331.0891, 106.38786, 2879.9763, 729.20374, 820.9513, 109.85717, 3208.0059]
2025-09-13 11:57:09,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 297.0, 284.0, 454.0, 79.0, 940.0, 270.0, 290.0, 59.0, 1000.0]
2025-09-13 11:57:09,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 6 hours, 17 minutes, 30 seconds)
2025-09-13 12:07:42,417 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:07:42,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:08:57,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 752.96454 ± 611.329
2025-09-13 12:08:57,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [469.24768, 307.43976, 1111.7173, 1888.3217, 148.31046, 1570.0173, 25.64981, 124.00439, 1016.3567, 868.58026]
2025-09-13 12:08:57,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [190.0, 149.0, 357.0, 592.0, 85.0, 477.0, 30.0, 81.0, 328.0, 289.0]
2025-09-13 12:08:57,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 10 minutes, 55 seconds)
2025-09-13 12:18:47,839 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:18:47,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:21:06,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1516.11108 ± 1128.883
2025-09-13 12:21:06,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [187.26659, 2900.0747, 1508.2958, 87.59474, 1835.6298, 102.75586, 825.27527, 2642.7742, 1795.3323, 3276.1113]
2025-09-13 12:21:06,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [85.0, 895.0, 518.0, 73.0, 570.0, 57.0, 287.0, 767.0, 532.0, 1000.0]
2025-09-13 12:21:06,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 5 hours, 58 minutes, 58 seconds)
2025-09-13 12:31:36,785 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:31:36,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:33:32,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1236.28284 ± 820.730
2025-09-13 12:33:32,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [532.59174, 1014.1436, 1592.0542, 962.662, 1937.027, 1067.4431, 3334.7798, 652.87854, 669.4613, 599.78754]
2025-09-13 12:33:32,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [210.0, 344.0, 513.0, 339.0, 598.0, 367.0, 1000.0, 234.0, 249.0, 218.0]
2025-09-13 12:33:32,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 5 hours, 53 minutes, 38 seconds)
2025-09-13 12:43:21,065 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:43:21,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:44:37,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 754.56091 ± 747.936
2025-09-13 12:44:37,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1498.592, 543.2855, 242.4459, 860.29126, 270.90942, 584.98254, 103.26105, 237.77252, 2681.8635, 522.2058]
2025-09-13 12:44:37,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [499.0, 200.0, 105.0, 297.0, 114.0, 209.0, 56.0, 113.0, 840.0, 211.0]
2025-09-13 12:44:37,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 5 hours, 35 minutes, 2 seconds)
2025-09-13 12:55:00,135 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:55:00,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:55:58,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 544.46210 ± 317.814
2025-09-13 12:55:58,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [900.2106, 262.4491, 987.11725, 506.67706, 160.37506, 557.31134, 209.5356, 737.0658, 955.02203, 168.857]
2025-09-13 12:55:58,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [304.0, 111.0, 322.0, 207.0, 78.0, 217.0, 113.0, 253.0, 316.0, 106.0]
2025-09-13 12:55:58,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 17 minutes, 35 seconds)
2025-09-13 13:05:58,959 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:05:58,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:07:52,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1251.14746 ± 997.391
2025-09-13 13:07:52,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [564.11206, 2189.6306, 1382.0538, 81.74595, 105.4552, 164.0582, 1934.9362, 3263.8835, 1682.216, 1143.3843]
2025-09-13 13:07:52,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [216.0, 672.0, 440.0, 49.0, 58.0, 80.0, 606.0, 1000.0, 484.0, 346.0]
2025-09-13 13:07:52,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 6 minutes, 24 seconds)
2025-09-13 13:18:04,141 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:18:04,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:19:48,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1063.77319 ± 895.440
2025-09-13 13:19:48,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [702.4634, 197.98015, 110.032394, 367.63623, 3236.6426, 950.2553, 1983.0952, 705.5522, 1231.586, 1152.4886]
2025-09-13 13:19:48,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [268.0, 110.0, 59.0, 146.0, 1000.0, 319.0, 636.0, 241.0, 402.0, 395.0]
2025-09-13 13:19:48,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 4 hours, 53 minutes, 29 seconds)
2025-09-13 13:30:02,135 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:30:02,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:32:22,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1546.77209 ± 894.092
2025-09-13 13:32:22,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [266.99252, 2064.4292, 274.9039, 1001.6252, 1288.9021, 2805.903, 1885.7266, 1226.7236, 1588.3751, 3064.1401]
2025-09-13 13:32:22,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [114.0, 645.0, 119.0, 331.0, 406.0, 861.0, 606.0, 395.0, 494.0, 944.0]
2025-09-13 13:32:22,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 4 hours, 42 minutes, 23 seconds)
2025-09-13 13:42:19,522 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:42:19,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:43:37,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 803.40637 ± 613.917
2025-09-13 13:43:37,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1713.8798, 1122.3123, 190.4036, 254.14902, 673.40924, 151.4089, 82.26049, 1862.7216, 1044.9951, 938.5235]
2025-09-13 13:43:37,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [524.0, 362.0, 89.0, 110.0, 230.0, 75.0, 57.0, 600.0, 339.0, 304.0]
2025-09-13 13:43:37,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 31 minutes, 23 seconds)
2025-09-13 13:54:13,482 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:54:13,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:56:14,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1275.38721 ± 818.880
2025-09-13 13:56:14,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [449.03864, 1020.7949, 920.70355, 736.42523, 2105.555, 2924.5388, 1116.1869, 1087.4885, 2211.397, 181.74348]
2025-09-13 13:56:14,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [168.0, 354.0, 304.0, 251.0, 641.0, 914.0, 375.0, 376.0, 682.0, 86.0]
2025-09-13 13:56:14,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 25 minutes, 10 seconds)
2025-09-13 14:06:29,874 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:06:29,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:08:06,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1017.22595 ± 875.164
2025-09-13 14:08:06,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [229.5142, 3307.113, 1523.4456, 1206.3085, 711.30054, 705.30133, 315.39548, 449.01495, 388.72107, 1336.1462]
2025-09-13 14:08:06,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [116.0, 1000.0, 475.0, 411.0, 245.0, 247.0, 129.0, 164.0, 164.0, 451.0]
2025-09-13 14:08:06,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 12 minutes, 58 seconds)
2025-09-13 14:17:52,532 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:17:52,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:19:18,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 943.67688 ± 999.822
2025-09-13 14:19:18,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [81.43298, 1174.857, 3383.547, 1221.6775, 24.525763, 1665.8582, 1289.2695, 287.01456, 209.0717, 99.51497]
2025-09-13 14:19:18,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [47.0, 352.0, 1000.0, 365.0, 27.0, 472.0, 404.0, 121.0, 97.0, 54.0]
2025-09-13 14:19:18,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 3 hours, 58 minutes, 1 second)
2025-09-13 14:29:22,211 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:29:22,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:30:48,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 893.33679 ± 615.032
2025-09-13 14:30:48,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [544.08405, 1225.348, 1037.8474, 24.300215, 601.6688, 126.56399, 1320.5957, 1879.6318, 1746.6799, 426.6481]
2025-09-13 14:30:48,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [191.0, 385.0, 355.0, 28.0, 208.0, 90.0, 432.0, 597.0, 535.0, 160.0]
2025-09-13 14:30:48,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 3 hours, 42 minutes, 2 seconds)
2025-09-13 14:41:28,766 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:41:28,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:43:02,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 992.17657 ± 856.322
2025-09-13 14:43:02,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1873.0789, 1488.976, 1265.9082, 2849.6665, 607.1858, 400.87823, 1036.8337, 350.44662, 22.241346, 26.550732]
2025-09-13 14:43:02,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [574.0, 458.0, 417.0, 868.0, 212.0, 156.0, 354.0, 153.0, 25.0, 32.0]
2025-09-13 14:43:02,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 33 minutes, 54 seconds)
2025-09-13 14:53:04,679 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:53:04,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:54:50,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1156.04675 ± 937.806
2025-09-13 14:54:50,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2647.7217, 105.93175, 978.21985, 1125.5536, 306.91995, 1009.1412, 2737.6626, 2000.4795, 107.587395, 541.2509]
2025-09-13 14:54:50,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [794.0, 74.0, 322.0, 376.0, 136.0, 328.0, 823.0, 603.0, 59.0, 208.0]
2025-09-13 14:54:50,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 19 minutes, 15 seconds)
2025-09-13 15:04:45,567 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:04:45,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:06:24,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1034.26599 ± 952.251
2025-09-13 15:06:24,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [128.01595, 1025.9712, 860.73047, 1204.3899, 1904.2736, 581.6317, 100.17125, 982.4484, 158.01483, 3397.0137]
2025-09-13 15:06:24,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [89.0, 342.0, 288.0, 415.0, 586.0, 217.0, 68.0, 316.0, 77.0, 1000.0]
2025-09-13 15:06:24,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 6 minutes, 33 seconds)
2025-09-13 15:16:37,061 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:16:37,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:18:48,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1427.37439 ± 1148.021
2025-09-13 15:18:48,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [84.51006, 3017.545, 709.4144, 335.67084, 2958.1934, 3305.3237, 992.1886, 438.0263, 1135.736, 1297.1361]
2025-09-13 15:18:48,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [64.0, 907.0, 256.0, 136.0, 876.0, 1000.0, 324.0, 170.0, 376.0, 418.0]
2025-09-13 15:18:48,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 2 hours, 58 minutes, 28 seconds)
2025-09-13 15:29:08,045 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:29:08,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:30:53,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1135.89990 ± 612.939
2025-09-13 15:30:53,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [991.3931, 1518.8179, 1818.8254, 1211.115, 181.08868, 329.26892, 368.05594, 1333.8119, 1670.1832, 1936.4384]
2025-09-13 15:30:53,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [316.0, 476.0, 588.0, 399.0, 85.0, 134.0, 148.0, 428.0, 514.0, 603.0]
2025-09-13 15:30:53,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 48 minutes, 13 seconds)
2025-09-13 15:41:20,837 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:41:20,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:43:56,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1755.44922 ± 1119.425
2025-09-13 15:43:56,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1019.29266, 3273.8564, 2250.4211, 3202.4387, 90.88032, 2600.436, 1522.0115, 2552.0513, 270.56818, 772.5365]
2025-09-13 15:43:56,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [323.0, 1000.0, 656.0, 940.0, 53.0, 744.0, 455.0, 728.0, 111.0, 256.0]
2025-09-13 15:43:56,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 38 minutes, 19 seconds)
2025-09-13 15:54:28,938 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:54:28,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:56:48,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1537.42712 ± 1089.154
2025-09-13 15:56:48,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [550.28906, 2693.5576, 1861.5277, 1440.8761, 300.9299, 1761.4998, 552.10205, 3392.8306, 104.13422, 2716.525]
2025-09-13 15:56:48,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [197.0, 805.0, 557.0, 456.0, 126.0, 548.0, 216.0, 1000.0, 58.0, 806.0]
2025-09-13 15:56:48,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 28 minutes, 42 seconds)
2025-09-13 16:07:12,789 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:07:12,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:09:09,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1286.61902 ± 913.414
2025-09-13 16:09:09,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2154.2607, 264.03992, 1404.4456, 1680.2704, 494.63077, 1006.53394, 1070.96, 260.21762, 3417.404, 1113.4265]
2025-09-13 16:09:09,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [648.0, 112.0, 445.0, 520.0, 181.0, 301.0, 362.0, 110.0, 1000.0, 349.0]
2025-09-13 16:09:09,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 18 minutes, 3 seconds)
2025-09-13 16:19:16,139 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:19:16,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:20:37,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 816.56812 ± 762.712
2025-09-13 16:20:37,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1280.8828, 328.60886, 430.6707, 1520.5948, 183.91908, 2442.8047, 76.860115, 223.49171, 216.73657, 1461.1117]
2025-09-13 16:20:37,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [433.0, 155.0, 159.0, 483.0, 85.0, 737.0, 66.0, 116.0, 115.0, 460.0]
2025-09-13 16:20:37,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 3 minutes, 38 seconds)
2025-09-13 16:30:42,584 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:30:42,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:32:12,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 897.75818 ± 470.294
2025-09-13 16:32:12,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [727.3084, 287.56546, 681.4642, 450.04755, 404.8525, 680.26056, 1298.1263, 1687.738, 1272.3992, 1487.8198]
2025-09-13 16:32:12,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [247.0, 119.0, 257.0, 182.0, 169.0, 243.0, 411.0, 535.0, 402.0, 497.0]
2025-09-13 16:32:13,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 50 minutes, 23 seconds)
2025-09-13 16:42:20,031 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:42:20,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:44:41,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1514.30835 ± 1183.921
2025-09-13 16:44:41,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [176.6545, 1394.2893, 139.545, 3272.4211, 455.23917, 3287.7778, 236.57204, 1748.4542, 1833.4133, 2598.7173]
2025-09-13 16:44:41,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 443.0, 71.0, 1000.0, 170.0, 1000.0, 104.0, 571.0, 575.0, 806.0]
2025-09-13 16:44:41,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 37 minutes, 12 seconds)
2025-09-13 16:54:47,586 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:54:47,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:57:31,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1827.08081 ± 1097.588
2025-09-13 16:57:31,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1769.5226, 3363.053, 342.99, 731.15234, 556.3776, 1822.5007, 1832.9722, 3357.3762, 1254.7195, 3240.1438]
2025-09-13 16:57:31,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [547.0, 1000.0, 180.0, 256.0, 200.0, 568.0, 580.0, 1000.0, 399.0, 1000.0]
2025-09-13 16:57:31,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (1827.08) for latency ExtremeSparseL4U32
2025-09-13 16:57:31,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 25 minutes)
2025-09-13 17:07:36,191 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:07:36,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:09:37,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1302.31396 ± 859.028
2025-09-13 17:09:37,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2828.23, 869.34344, 258.57565, 1574.506, 286.97717, 1598.0374, 1432.688, 2128.6763, 1951.9557, 94.14956]
2025-09-13 17:09:37,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [861.0, 311.0, 127.0, 478.0, 117.0, 485.0, 473.0, 628.0, 590.0, 65.0]
2025-09-13 17:09:37,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 12 minutes, 32 seconds)
2025-09-13 17:19:56,862 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:19:56,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:22:09,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1417.28101 ± 1317.619
2025-09-13 17:22:09,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2318.8643, 3338.0928, 750.4137, 317.40933, 144.15396, 288.2238, 765.1959, 2814.4812, 22.326872, 3413.6484]
2025-09-13 17:22:09,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [706.0, 1000.0, 259.0, 131.0, 94.0, 142.0, 275.0, 862.0, 25.0, 1000.0]
2025-09-13 17:22:09,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 1 minute, 32 seconds)
2025-09-13 17:32:56,193 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:32:56,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:34:55,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1331.26685 ± 1130.096
2025-09-13 17:34:55,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2610.7656, 1234.9534, 72.71977, 3395.2551, 1711.6809, 26.821398, 2365.7793, 1397.3347, 337.9093, 159.4494]
2025-09-13 17:34:55,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [771.0, 392.0, 67.0, 1000.0, 524.0, 31.0, 703.0, 462.0, 134.0, 76.0]
2025-09-13 17:34:55,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 50 minutes, 10 seconds)
2025-09-13 17:44:37,202 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:44:37,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:47:13,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1709.12830 ± 1246.917
2025-09-13 17:47:13,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2898.5112, 298.25604, 3180.6428, 3331.58, 342.57574, 3378.042, 972.7789, 1112.2013, 1009.8918, 566.80255]
2025-09-13 17:47:13,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [883.0, 119.0, 1000.0, 1000.0, 138.0, 1000.0, 323.0, 361.0, 324.0, 186.0]
2025-09-13 17:47:13,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 37 minutes, 31 seconds)
2025-09-13 17:57:56,985 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:57:56,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:00:13,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1535.65906 ± 1134.960
2025-09-13 18:00:13,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [636.60925, 27.794603, 2702.2852, 2996.133, 842.99023, 1061.5424, 2110.1843, 902.9386, 3515.4854, 560.62787]
2025-09-13 18:00:13,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [222.0, 32.0, 792.0, 878.0, 289.0, 343.0, 632.0, 311.0, 1000.0, 201.0]
2025-09-13 18:00:13,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 25 minutes, 4 seconds)
2025-09-13 18:09:55,208 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:09:55,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:11:14,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 794.56329 ± 569.076
2025-09-13 18:11:14,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [320.80164, 1123.3075, 594.92426, 206.64143, 1579.0734, 1018.0098, 1942.1428, 469.80853, 466.435, 224.4884]
2025-09-13 18:11:14,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [133.0, 353.0, 205.0, 104.0, 485.0, 334.0, 608.0, 186.0, 174.0, 115.0]
2025-09-13 18:11:14,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 12 minutes, 19 seconds)
2025-09-13 18:21:43,942 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:21:43,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:24:04,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1576.49500 ± 960.015
2025-09-13 18:24:04,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [364.04727, 2347.0173, 3447.4822, 1228.6711, 178.38304, 1663.4055, 1251.3933, 913.4743, 1810.341, 2560.7354]
2025-09-13 18:24:04,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [147.0, 704.0, 1000.0, 383.0, 102.0, 515.0, 410.0, 303.0, 571.0, 787.0]
2025-09-13 18:24:04,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1251 [DEBUG]: Training session finished
