2025-09-13 02:47:54,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc20-hopper/ExtremeSparseL4U32-mbpac_memdelay
2025-09-13 02:47:54,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc20-hopper/ExtremeSparseL4U32-mbpac_memdelay
2025-09-13 02:47:54,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1459813b4c50>}
2025-09-13 02:47:54,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1111 [DEBUG]: using device: cuda
2025-09-13 02:47:54,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1133 [INFO]: Creating new trainer
2025-09-13 02:47:54,564 baseline-mbpac-noiseperc20-hopper:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-09-13 02:47:54,564 baseline-mbpac-noiseperc20-hopper:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-13 02:47:54,571 baseline-mbpac-noiseperc20-hopper:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=3, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-13 02:47:55,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1194 [DEBUG]: Starting training session...
2025-09-13 02:47:55,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 1/100
2025-09-13 02:58:31,587 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:58:31,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:58:50,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 115.55509 ± 60.962
2025-09-13 02:58:50,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [186.47781, 58.132244, 36.93851, 84.6219, 145.75064, 177.80057, 149.6186, 182.15904, 124.495056, 9.556577]
2025-09-13 02:58:50,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 36.0, 26.0, 49.0, 75.0, 88.0, 77.0, 90.0, 67.0, 14.0]
2025-09-13 02:58:50,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (115.56) for latency ExtremeSparseL4U32
2025-09-13 02:58:50,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 17 hours, 59 minutes, 23 seconds)
2025-09-13 03:09:02,143 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:09:02,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:09:25,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 128.22919 ± 70.670
2025-09-13 03:09:25,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [87.54025, 96.705, 48.00248, 160.22327, 36.678993, 87.837265, 151.78311, 290.2395, 141.83142, 181.45052]
2025-09-13 03:09:25,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 55.0, 32.0, 107.0, 27.0, 62.0, 123.0, 146.0, 84.0, 100.0]
2025-09-13 03:09:25,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (128.23) for latency ExtremeSparseL4U32
2025-09-13 03:09:25,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 17 hours, 33 minutes, 32 seconds)
2025-09-13 03:19:42,956 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:19:42,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:20:09,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 154.20250 ± 75.339
2025-09-13 03:20:09,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [19.27883, 115.42384, 182.9608, 107.291176, 176.07613, 109.444664, 236.09683, 300.5985, 188.118, 106.736374]
2025-09-13 03:20:09,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 80.0, 102.0, 60.0, 105.0, 69.0, 126.0, 138.0, 126.0, 62.0]
2025-09-13 03:20:09,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (154.20) for latency ExtremeSparseL4U32
2025-09-13 03:20:09,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 17 hours, 21 minutes, 57 seconds)
2025-09-13 03:30:27,740 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:30:27,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:30:58,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 199.02232 ± 74.039
2025-09-13 03:30:58,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [268.6363, 116.92678, 107.112114, 332.147, 122.572, 215.37355, 278.82132, 191.40973, 135.65459, 221.5699]
2025-09-13 03:30:58,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 68.0, 62.0, 179.0, 68.0, 111.0, 131.0, 100.0, 83.0, 111.0]
2025-09-13 03:30:58,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (199.02) for latency ExtremeSparseL4U32
2025-09-13 03:30:58,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 17 hours, 12 minutes, 59 seconds)
2025-09-13 03:41:08,832 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:41:08,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:41:28,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 108.19362 ± 82.029
2025-09-13 03:41:28,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [97.3682, 55.600212, 147.80104, 203.68597, 122.70628, 56.103687, 293.54153, 24.969994, 50.87485, 29.284468]
2025-09-13 03:41:28,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [82.0, 37.0, 120.0, 105.0, 69.0, 41.0, 143.0, 21.0, 32.0, 25.0]
2025-09-13 03:41:28,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 16 hours, 57 minutes, 23 seconds)
2025-09-13 03:51:49,478 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:51:49,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:52:29,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 247.49524 ± 117.845
2025-09-13 03:52:29,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [328.19952, 119.80109, 76.126274, 342.4458, 204.97598, 139.00917, 375.96252, 443.3679, 285.6975, 159.36655]
2025-09-13 03:52:29,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [156.0, 67.0, 43.0, 166.0, 110.0, 78.0, 241.0, 274.0, 162.0, 90.0]
2025-09-13 03:52:29,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (247.50) for latency ExtremeSparseL4U32
2025-09-13 03:52:29,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 16 hours, 48 minutes, 45 seconds)
2025-09-13 04:02:35,070 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:02:35,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:03:12,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 234.03552 ± 100.806
2025-09-13 04:03:12,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [477.64114, 99.250435, 283.64706, 227.38432, 174.21272, 313.7828, 191.05128, 218.7677, 137.63412, 216.98367]
2025-09-13 04:03:12,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [297.0, 57.0, 130.0, 133.0, 101.0, 161.0, 98.0, 120.0, 78.0, 110.0]
2025-09-13 04:03:12,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 16 hours, 40 minutes, 10 seconds)
2025-09-13 04:13:18,300 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:13:18,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:13:45,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 155.94327 ± 114.172
2025-09-13 04:13:45,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [121.63301, 148.33095, 165.8061, 98.244896, 67.87247, 75.340126, 121.70086, 339.81754, 398.1407, 22.546165]
2025-09-13 04:13:45,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 80.0, 97.0, 59.0, 40.0, 43.0, 73.0, 235.0, 205.0, 23.0]
2025-09-13 04:13:45,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 16 hours, 26 minutes, 22 seconds)
2025-09-13 04:23:54,932 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:23:55,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:24:23,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 164.59915 ± 90.853
2025-09-13 04:24:23,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [201.38336, 205.30576, 166.515, 83.38606, 308.605, 226.9061, 161.08754, 249.88597, 16.53799, 26.37874]
2025-09-13 04:24:23,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [126.0, 112.0, 141.0, 51.0, 159.0, 124.0, 97.0, 125.0, 18.0, 29.0]
2025-09-13 04:24:23,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 16 hours, 12 minutes, 19 seconds)
2025-09-13 04:34:30,742 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:34:30,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:34:51,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 113.70995 ± 69.355
2025-09-13 04:34:51,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [132.61156, 14.489316, 8.09764, 33.689133, 96.080154, 211.93895, 170.46323, 125.682846, 182.43138, 161.61531]
2025-09-13 04:34:51,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [77.0, 19.0, 11.0, 27.0, 59.0, 135.0, 103.0, 79.0, 101.0, 108.0]
2025-09-13 04:34:51,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 16 hours, 45 seconds)
2025-09-13 04:45:02,503 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:45:02,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:45:19,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 88.83208 ± 108.238
2025-09-13 04:45:19,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [14.030641, 28.305655, 13.882929, 14.797221, 363.1002, 177.60762, 134.3625, 113.10484, 14.789843, 14.3393545]
2025-09-13 04:45:19,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 29.0, 21.0, 18.0, 184.0, 124.0, 90.0, 66.0, 19.0, 15.0]
2025-09-13 04:45:19,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 15 hours, 40 minutes, 24 seconds)
2025-09-13 04:55:31,126 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:55:31,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:56:04,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 227.28992 ± 127.377
2025-09-13 04:56:04,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [345.79398, 76.88685, 431.8847, 400.47308, 169.1014, 13.052132, 250.32967, 172.4211, 211.03941, 201.91693]
2025-09-13 04:56:04,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [168.0, 44.0, 185.0, 175.0, 94.0, 29.0, 168.0, 88.0, 113.0, 99.0]
2025-09-13 04:56:04,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 15 hours, 30 minutes, 30 seconds)
2025-09-13 05:06:11,233 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:06:11,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:06:45,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 256.35358 ± 150.786
2025-09-13 05:06:45,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [275.7978, 478.6314, 283.7208, 14.16074, 219.82426, 543.8706, 200.53395, 84.339874, 220.9883, 241.66814]
2025-09-13 05:06:45,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [126.0, 201.0, 130.0, 16.0, 124.0, 212.0, 105.0, 49.0, 110.0, 108.0]
2025-09-13 05:06:45,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (256.35) for latency ExtremeSparseL4U32
2025-09-13 05:06:45,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 15 hours, 21 minutes, 58 seconds)
2025-09-13 05:16:54,123 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:16:54,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:17:29,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 287.92712 ± 83.682
2025-09-13 05:17:29,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [393.71832, 326.76096, 327.97397, 258.74496, 117.356125, 236.79927, 326.09805, 198.23685, 288.77853, 404.80426]
2025-09-13 05:17:29,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [149.0, 145.0, 131.0, 115.0, 65.0, 108.0, 132.0, 89.0, 132.0, 150.0]
2025-09-13 05:17:29,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (287.93) for latency ExtremeSparseL4U32
2025-09-13 05:17:29,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 15 hours, 13 minutes, 10 seconds)
2025-09-13 05:27:42,670 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:27:42,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:28:28,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 300.28818 ± 257.710
2025-09-13 05:28:28,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [922.67346, 129.50652, 144.6596, 507.9094, 186.49591, 115.45865, 332.9914, 116.51795, 493.42926, 53.239693]
2025-09-13 05:28:28,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [454.0, 82.0, 96.0, 237.0, 102.0, 79.0, 187.0, 74.0, 247.0, 31.0]
2025-09-13 05:28:28,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (300.29) for latency ExtremeSparseL4U32
2025-09-13 05:28:28,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 15 hours, 11 minutes, 38 seconds)
2025-09-13 05:38:36,951 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:38:36,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:39:06,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 217.08699 ± 117.052
2025-09-13 05:39:06,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [265.75662, 250.79608, 338.58325, 239.62694, 136.13287, 111.66717, 342.63474, 12.200189, 381.74966, 91.72206]
2025-09-13 05:39:06,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [113.0, 116.0, 145.0, 107.0, 72.0, 63.0, 164.0, 20.0, 182.0, 54.0]
2025-09-13 05:39:06,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 15 hours, 3 minutes, 37 seconds)
2025-09-13 05:49:23,512 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:49:23,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:49:55,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 223.24751 ± 147.794
2025-09-13 05:49:55,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [494.94058, 218.247, 22.907593, 99.99608, 309.03934, 113.62051, 110.09825, 117.81438, 348.72034, 397.09097]
2025-09-13 05:49:55,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [229.0, 100.0, 55.0, 64.0, 126.0, 59.0, 63.0, 63.0, 156.0, 170.0]
2025-09-13 05:49:55,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 14 hours, 53 minutes, 52 seconds)
2025-09-13 05:59:58,929 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:59:58,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:00:27,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 225.15129 ± 87.448
2025-09-13 06:00:27,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [222.62465, 122.22808, 330.73105, 134.03462, 237.37938, 202.44662, 101.62771, 254.56941, 250.44453, 395.42697]
2025-09-13 06:00:27,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [101.0, 63.0, 142.0, 70.0, 106.0, 92.0, 55.0, 108.0, 109.0, 148.0]
2025-09-13 06:00:27,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 14 hours, 40 minutes, 46 seconds)
2025-09-13 06:10:35,189 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:10:35,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:11:01,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 168.02583 ± 219.679
2025-09-13 06:11:01,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [147.15735, 21.821527, 10.88799, 29.185389, 720.7926, 195.57431, 412.02, 118.12658, 10.77259, 13.920057]
2025-09-13 06:11:01,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [81.0, 21.0, 18.0, 28.0, 323.0, 102.0, 184.0, 104.0, 19.0, 21.0]
2025-09-13 06:11:01,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 14 hours, 27 minutes, 15 seconds)
2025-09-13 06:21:19,703 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:21:19,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:21:44,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 170.31979 ± 172.148
2025-09-13 06:21:44,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [12.728664, 215.25833, 509.9833, 46.45835, 78.968796, 121.48667, 117.087395, 10.828515, 110.007484, 480.3905]
2025-09-13 06:21:44,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 98.0, 180.0, 82.0, 46.0, 64.0, 64.0, 21.0, 67.0, 197.0]
2025-09-13 06:21:44,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 14 hours, 12 minutes, 13 seconds)
2025-09-13 06:31:55,703 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:31:55,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:32:25,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 206.29573 ± 138.487
2025-09-13 06:32:25,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [264.67172, 163.44672, 20.832275, 321.64102, 137.66582, 81.45196, 132.55598, 362.09796, 483.01837, 95.57542]
2025-09-13 06:32:25,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [123.0, 82.0, 22.0, 150.0, 72.0, 49.0, 78.0, 156.0, 227.0, 56.0]
2025-09-13 06:32:25,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 14 hours, 2 minutes, 25 seconds)
2025-09-13 06:42:59,130 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:42:59,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:43:40,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 296.53470 ± 209.487
2025-09-13 06:43:40,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [622.8454, 344.2276, 196.61906, 202.83675, 111.524704, 247.29727, 83.30312, 81.538025, 720.3111, 354.84406]
2025-09-13 06:43:40,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [259.0, 171.0, 93.0, 97.0, 59.0, 114.0, 49.0, 66.0, 359.0, 176.0]
2025-09-13 06:43:40,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 13 hours, 58 minutes, 33 seconds)
2025-09-13 06:53:26,287 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:53:26,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:54:09,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 283.92120 ± 177.280
2025-09-13 06:54:09,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [356.37283, 240.17874, 354.03506, 75.21916, 579.5409, 69.422264, 584.56726, 240.40799, 111.05909, 228.40857]
2025-09-13 06:54:09,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [211.0, 118.0, 164.0, 45.0, 297.0, 41.0, 278.0, 115.0, 90.0, 112.0]
2025-09-13 06:54:09,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 13 hours, 46 minutes, 54 seconds)
2025-09-13 07:04:23,942 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:04:23,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:05:09,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 359.73257 ± 298.580
2025-09-13 07:05:09,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [300.59796, 64.21209, 332.4387, 165.94234, 430.67944, 456.11206, 150.72691, 394.81976, 1167.4154, 134.38098]
2025-09-13 07:05:09,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 37.0, 146.0, 79.0, 204.0, 205.0, 73.0, 147.0, 504.0, 69.0]
2025-09-13 07:05:09,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (359.73) for latency ExtremeSparseL4U32
2025-09-13 07:05:09,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 13 hours, 42 minutes, 53 seconds)
2025-09-13 07:15:37,017 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:15:37,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:16:02,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 172.72243 ± 142.213
2025-09-13 07:16:02,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [116.71158, 14.682314, 153.11023, 398.99017, 336.10645, 402.2407, 80.08271, 98.05908, 12.557683, 114.683395]
2025-09-13 07:16:02,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [61.0, 19.0, 75.0, 186.0, 168.0, 182.0, 46.0, 56.0, 15.0, 90.0]
2025-09-13 07:16:02,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 13 hours, 34 minutes, 32 seconds)
2025-09-13 07:25:52,914 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:25:52,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:26:33,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 298.85986 ± 178.354
2025-09-13 07:26:33,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [572.19476, 433.66824, 379.24728, 139.53175, 10.478376, 169.12758, 170.7909, 553.8931, 204.97943, 354.68707]
2025-09-13 07:26:33,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [273.0, 177.0, 172.0, 70.0, 15.0, 81.0, 81.0, 250.0, 101.0, 169.0]
2025-09-13 07:26:33,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 13 hours, 21 minutes, 9 seconds)
2025-09-13 07:36:52,315 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:36:52,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:37:51,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 473.19574 ± 696.447
2025-09-13 07:37:51,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [486.5062, 1335.276, 24.987411, 11.840342, 2192.927, 487.80255, 146.54233, 21.21192, 13.837504, 11.026441]
2025-09-13 07:37:51,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [209.0, 544.0, 25.0, 21.0, 857.0, 221.0, 80.0, 22.0, 28.0, 14.0]
2025-09-13 07:37:51,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (473.20) for latency ExtremeSparseL4U32
2025-09-13 07:37:51,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 13 hours, 10 minutes, 58 seconds)
2025-09-13 07:47:56,317 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:47:56,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:48:22,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 171.50371 ± 152.273
2025-09-13 07:48:22,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [143.24028, 143.28836, 14.870754, 37.93467, 12.818579, 249.25867, 191.70024, 222.9938, 561.9198, 137.01196]
2025-09-13 07:48:22,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 70.0, 16.0, 33.0, 18.0, 167.0, 96.0, 142.0, 225.0, 68.0]
2025-09-13 07:48:22,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 13 hours, 47 seconds)
2025-09-13 07:58:38,469 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:58:38,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:59:11,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 244.38432 ± 143.327
2025-09-13 07:59:11,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [333.03342, 265.28345, 277.62314, 158.94992, 436.13824, 489.09106, 186.9883, 227.24292, 51.96084, 17.531912]
2025-09-13 07:59:11,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [151.0, 117.0, 132.0, 81.0, 190.0, 213.0, 88.0, 108.0, 31.0, 19.0]
2025-09-13 07:59:11,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 12 hours, 47 minutes, 16 seconds)
2025-09-13 08:09:23,337 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:09:23,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:10:31,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 578.39148 ± 628.853
2025-09-13 08:10:31,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [373.82895, 193.70735, 191.58966, 1245.8417, 401.67017, 13.031928, 706.8365, 170.5051, 307.69232, 2179.2112]
2025-09-13 08:10:31,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [160.0, 90.0, 92.0, 488.0, 177.0, 17.0, 259.0, 85.0, 143.0, 823.0]
2025-09-13 08:10:31,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (578.39) for latency ExtremeSparseL4U32
2025-09-13 08:10:31,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 12 hours, 42 minutes, 37 seconds)
2025-09-13 08:20:52,229 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:20:52,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:21:46,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 463.14307 ± 456.235
2025-09-13 08:21:46,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1062.8143, 71.73783, 378.05774, 552.3515, 785.18964, 1379.6053, 350.78128, 16.969635, 21.114233, 12.809072]
2025-09-13 08:21:46,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [405.0, 41.0, 160.0, 226.0, 296.0, 563.0, 152.0, 19.0, 21.0, 15.0]
2025-09-13 08:21:46,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 12 hours, 42 minutes, 3 seconds)
2025-09-13 08:31:57,530 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:31:57,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:32:15,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 122.79884 ± 187.587
2025-09-13 08:32:15,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [203.84776, 9.134938, 50.447826, 8.705742, 9.0114155, 13.3990345, 13.567368, 606.6351, 299.65027, 13.589033]
2025-09-13 08:32:15,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [94.0, 12.0, 30.0, 12.0, 19.0, 29.0, 19.0, 236.0, 143.0, 20.0]
2025-09-13 08:32:15,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 12 hours, 19 minutes, 53 seconds)
2025-09-13 08:42:27,937 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:42:27,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:43:31,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 518.39081 ± 449.719
2025-09-13 08:43:31,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [93.527115, 645.86285, 101.124146, 254.90395, 567.93085, 1438.6378, 1190.3279, 202.34036, 590.68384, 98.569214]
2025-09-13 08:43:31,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [52.0, 263.0, 57.0, 118.0, 253.0, 592.0, 483.0, 95.0, 221.0, 54.0]
2025-09-13 08:43:31,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 12 hours, 19 minutes, 5 seconds)
2025-09-13 08:53:42,212 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:53:42,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:54:39,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 468.45517 ± 387.541
2025-09-13 08:54:39,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [902.2996, 427.8582, 917.1257, 18.806, 162.89848, 122.44382, 140.78886, 1020.53674, 109.95705, 861.8375]
2025-09-13 08:54:39,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [360.0, 169.0, 363.0, 21.0, 104.0, 64.0, 72.0, 419.0, 61.0, 342.0]
2025-09-13 08:54:39,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 12 hours, 12 minutes, 11 seconds)
2025-09-13 09:04:41,602 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:04:41,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:05:17,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 288.51166 ± 337.146
2025-09-13 09:05:17,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [185.0988, 18.374165, 17.505978, 7.277963, 874.9457, 212.5649, 7.695786, 810.3771, 679.29346, 71.98277]
2025-09-13 09:05:17,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 22.0, 19.0, 13.0, 334.0, 109.0, 12.0, 312.0, 286.0, 45.0]
2025-09-13 09:05:17,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 11 hours, 52 minutes, 7 seconds)
2025-09-13 09:15:34,447 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:15:34,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:16:12,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 323.87375 ± 330.698
2025-09-13 09:16:12,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [730.0793, 742.6554, 13.979764, 18.558899, 10.676621, 652.378, 16.687174, 211.79445, 80.236404, 761.6917]
2025-09-13 09:16:12,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [276.0, 281.0, 19.0, 21.0, 12.0, 255.0, 19.0, 119.0, 47.0, 282.0]
2025-09-13 09:16:12,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 11 hours, 36 minutes, 43 seconds)
2025-09-13 09:26:25,300 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:26:25,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:26:51,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 181.66862 ± 177.130
2025-09-13 09:26:51,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [542.4953, 17.688084, 15.215688, 457.65253, 279.21347, 74.55301, 73.048546, 192.81053, 79.410675, 84.59834]
2025-09-13 09:26:51,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [234.0, 31.0, 26.0, 176.0, 132.0, 54.0, 46.0, 94.0, 49.0, 49.0]
2025-09-13 09:26:51,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 11 hours, 27 minutes, 57 seconds)
2025-09-13 09:37:06,138 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:37:06,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:38:01,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 465.60831 ± 324.756
2025-09-13 09:38:01,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [128.46727, 179.53835, 899.79364, 407.27084, 804.88336, 230.15956, 796.50696, 226.39526, 896.8972, 86.17045]
2025-09-13 09:38:01,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [68.0, 82.0, 351.0, 171.0, 310.0, 114.0, 311.0, 110.0, 329.0, 52.0]
2025-09-13 09:38:01,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 11 hours, 15 minutes, 43 seconds)
2025-09-13 09:48:15,014 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:48:15,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:49:04,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 407.54523 ± 253.351
2025-09-13 09:49:04,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [293.82022, 90.12238, 592.4144, 217.85173, 986.417, 193.91226, 528.61304, 593.0822, 275.6932, 303.52588]
2025-09-13 09:49:04,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [120.0, 60.0, 228.0, 102.0, 374.0, 91.0, 205.0, 258.0, 126.0, 136.0]
2025-09-13 09:49:04,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 11 hours, 3 minutes, 52 seconds)
2025-09-13 09:59:41,122 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:59:41,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:00:31,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 399.85416 ± 503.040
2025-09-13 10:00:31,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1192.3994, 339.83826, 189.98196, 334.0004, 251.70506, 23.940487, 15.941814, 69.92997, 40.127335, 1540.6766]
2025-09-13 10:00:31,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [443.0, 141.0, 91.0, 136.0, 147.0, 28.0, 15.0, 61.0, 42.0, 605.0]
2025-09-13 10:00:31,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 11 hours, 2 minutes, 40 seconds)
2025-09-13 10:11:03,513 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:11:03,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:11:39,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 294.87592 ± 344.403
2025-09-13 10:11:39,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [491.95612, 1244.4999, 286.62305, 237.0747, 209.42195, 112.398254, 14.218928, 245.61583, 10.150969, 96.79985]
2025-09-13 10:11:39,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [179.0, 414.0, 130.0, 112.0, 95.0, 80.0, 30.0, 132.0, 13.0, 64.0]
2025-09-13 10:11:39,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 10 hours, 54 minutes, 15 seconds)
2025-09-13 10:21:47,223 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:21:47,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:22:14,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 198.93709 ± 376.784
2025-09-13 10:22:14,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [178.96632, 102.23032, 141.65543, 22.846327, 13.035107, 1312.3295, 19.321356, 12.815382, 15.281652, 170.88947]
2025-09-13 10:22:14,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 58.0, 77.0, 22.0, 17.0, 473.0, 25.0, 19.0, 23.0, 131.0]
2025-09-13 10:22:14,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 10 hours, 42 minutes, 24 seconds)
2025-09-13 10:32:06,922 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:32:06,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:32:48,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 357.64288 ± 328.542
2025-09-13 10:32:48,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [293.39517, 324.31686, 921.72974, 464.27283, 84.44289, 562.77606, 10.928324, 20.661562, 881.7835, 12.121908]
2025-09-13 10:32:48,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [138.0, 137.0, 340.0, 178.0, 50.0, 228.0, 15.0, 20.0, 317.0, 20.0]
2025-09-13 10:32:48,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 10 hours, 24 minutes, 37 seconds)
2025-09-13 10:42:47,095 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:42:47,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:43:40,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 453.09854 ± 381.180
2025-09-13 10:43:40,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1097.3666, 140.4281, 174.07443, 557.9301, 1216.3176, 382.18753, 15.095216, 355.1291, 377.1652, 215.29167]
2025-09-13 10:43:40,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [379.0, 70.0, 85.0, 208.0, 453.0, 176.0, 23.0, 150.0, 159.0, 100.0]
2025-09-13 10:43:40,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 10 hours, 11 minutes, 23 seconds)
2025-09-13 10:53:49,984 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:53:50,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:54:44,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 480.39240 ± 469.294
2025-09-13 10:54:44,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1791.1704, 399.48465, 16.143993, 314.99634, 386.24286, 417.50076, 240.02771, 590.4641, 96.85715, 551.0361]
2025-09-13 10:54:44,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [627.0, 168.0, 21.0, 138.0, 154.0, 171.0, 109.0, 228.0, 53.0, 210.0]
2025-09-13 10:54:44,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 9 hours, 56 minutes, 26 seconds)
2025-09-13 11:04:59,884 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:04:59,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:05:27,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 202.66997 ± 101.791
2025-09-13 11:05:27,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [184.77272, 392.94055, 181.90361, 168.4321, 196.27985, 112.45706, 172.02682, 19.552828, 253.2671, 345.06717]
2025-09-13 11:05:27,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [89.0, 167.0, 86.0, 85.0, 92.0, 60.0, 84.0, 21.0, 117.0, 146.0]
2025-09-13 11:05:27,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 9 hours, 41 minutes, 2 seconds)
2025-09-13 11:15:44,203 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:15:44,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:16:50,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 611.00726 ± 596.143
2025-09-13 11:16:50,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [579.27, 208.92523, 2206.0015, 747.94867, 184.99518, 571.48346, 217.27188, 996.44617, 231.00075, 166.72935]
2025-09-13 11:16:50,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [233.0, 97.0, 759.0, 251.0, 91.0, 208.0, 108.0, 351.0, 126.0, 79.0]
2025-09-13 11:16:50,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (611.01) for latency ExtremeSparseL4U32
2025-09-13 11:16:50,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 9 hours, 38 minutes, 45 seconds)
2025-09-13 11:27:02,616 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:27:02,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:27:51,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 384.67532 ± 537.780
2025-09-13 11:27:51,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [171.47699, 410.30508, 10.858494, 139.6999, 1845.0917, 11.467064, 115.79527, 326.46698, 22.205, 793.3865]
2025-09-13 11:27:51,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [85.0, 167.0, 16.0, 174.0, 682.0, 17.0, 75.0, 134.0, 22.0, 303.0]
2025-09-13 11:27:51,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 9 hours, 32 minutes, 26 seconds)
2025-09-13 11:38:01,673 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:38:01,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:38:47,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 416.36957 ± 565.829
2025-09-13 11:38:47,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [94.00894, 8.400966, 12.376833, 1052.7164, 95.48932, 122.72643, 393.31552, 21.410833, 1832.0773, 531.17346]
2025-09-13 11:38:47,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [53.0, 14.0, 14.0, 356.0, 53.0, 66.0, 159.0, 23.0, 623.0, 209.0]
2025-09-13 11:38:47,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 9 hours, 22 minutes, 15 seconds)
2025-09-13 11:49:01,456 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:49:01,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:49:55,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 442.19522 ± 382.076
2025-09-13 11:49:55,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [65.38649, 674.35486, 1204.5913, 469.04578, 786.6457, 82.97369, 187.15862, 774.2054, 156.71098, 20.879232]
2025-09-13 11:49:55,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [39.0, 280.0, 453.0, 218.0, 320.0, 48.0, 101.0, 327.0, 77.0, 26.0]
2025-09-13 11:49:55,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 9 hours, 11 minutes, 50 seconds)
2025-09-13 12:00:15,889 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:00:15,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:01:29,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 690.60052 ± 741.801
2025-09-13 12:01:29,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [76.65753, 2374.8003, 976.54224, 1553.2505, 155.7359, 271.29727, 7.8874063, 272.073, 1025.2914, 192.4695]
2025-09-13 12:01:29,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [47.0, 814.0, 370.0, 520.0, 77.0, 121.0, 12.0, 120.0, 373.0, 91.0]
2025-09-13 12:01:29,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (690.60) for latency ExtremeSparseL4U32
2025-09-13 12:01:29,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 9 hours, 9 minutes, 4 seconds)
2025-09-13 12:11:31,489 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:11:31,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:12:08,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 277.33591 ± 457.826
2025-09-13 12:12:08,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [838.18506, 35.867073, 13.699317, 233.22441, 25.413174, 8.302357, 145.95393, 17.551525, 11.443501, 1443.7189]
2025-09-13 12:12:08,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [308.0, 37.0, 20.0, 116.0, 28.0, 15.0, 131.0, 25.0, 14.0, 541.0]
2025-09-13 12:12:08,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 8 hours, 50 minutes, 52 seconds)
2025-09-13 12:22:23,368 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:22:23,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:22:54,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 262.67328 ± 206.312
2025-09-13 12:22:54,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [167.18286, 119.42836, 124.75564, 63.10806, 14.702958, 513.82227, 607.6335, 122.212105, 501.52267, 392.3645]
2025-09-13 12:22:54,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 62.0, 63.0, 40.0, 16.0, 192.0, 213.0, 63.0, 193.0, 161.0]
2025-09-13 12:22:54,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 8 hours, 37 minutes, 33 seconds)
2025-09-13 12:33:09,720 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:33:09,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:33:48,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 326.60376 ± 469.627
2025-09-13 12:33:48,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [382.01437, 13.445507, 138.54858, 1663.7139, 123.9646, 316.02863, 446.84885, 153.72435, 15.45305, 12.295587]
2025-09-13 12:33:48,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [176.0, 15.0, 72.0, 566.0, 64.0, 145.0, 168.0, 89.0, 20.0, 20.0]
2025-09-13 12:33:48,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 8 hours, 26 minutes, 6 seconds)
2025-09-13 12:44:06,471 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:44:06,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:44:53,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 390.92801 ± 327.914
2025-09-13 12:44:53,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [800.88654, 22.620346, 574.88043, 434.25162, 106.26193, 300.58612, 762.1849, 23.674688, 871.34436, 12.589177]
2025-09-13 12:44:53,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [302.0, 22.0, 226.0, 184.0, 81.0, 143.0, 270.0, 41.0, 306.0, 20.0]
2025-09-13 12:44:53,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 8 hours, 14 minutes, 37 seconds)
2025-09-13 12:55:09,193 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:55:09,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:55:42,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 260.42572 ± 157.986
2025-09-13 12:55:42,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [521.28796, 90.67388, 458.6214, 10.103379, 257.24582, 377.4544, 360.36868, 204.65926, 209.7679, 114.0746]
2025-09-13 12:55:42,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [192.0, 51.0, 181.0, 14.0, 140.0, 162.0, 166.0, 97.0, 94.0, 60.0]
2025-09-13 12:55:42,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 7 hours, 57 minutes, 7 seconds)
2025-09-13 13:05:46,015 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:05:46,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:06:10,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 190.66740 ± 161.812
2025-09-13 13:06:10,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [149.59583, 499.15295, 333.6148, 120.984276, 143.4027, 144.4784, 427.45367, 57.182877, 16.958546, 13.850141]
2025-09-13 13:06:10,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 184.0, 144.0, 66.0, 71.0, 71.0, 162.0, 38.0, 19.0, 18.0]
2025-09-13 13:06:10,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 7 hours, 44 minutes, 48 seconds)
2025-09-13 13:16:25,652 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:16:25,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:17:21,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 515.52081 ± 363.741
2025-09-13 13:17:21,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [671.21344, 958.0857, 91.36538, 252.4369, 1192.75, 175.00278, 19.299631, 535.18054, 686.74677, 573.1269]
2025-09-13 13:17:21,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [224.0, 349.0, 74.0, 111.0, 411.0, 87.0, 20.0, 203.0, 250.0, 200.0]
2025-09-13 13:17:21,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 7 hours, 37 minutes, 21 seconds)
2025-09-13 13:27:40,324 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:27:40,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:28:20,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 339.96844 ± 315.966
2025-09-13 13:28:20,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [143.3325, 238.82056, 512.43536, 20.653542, 19.567286, 130.04805, 446.16052, 764.0006, 125.84318, 998.8227]
2025-09-13 13:28:20,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [76.0, 109.0, 202.0, 25.0, 33.0, 66.0, 179.0, 257.0, 68.0, 352.0]
2025-09-13 13:28:20,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 7 hours, 27 minutes, 13 seconds)
2025-09-13 13:38:31,885 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:38:31,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:39:21,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 447.73218 ± 358.138
2025-09-13 13:39:21,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [143.83804, 948.77515, 767.46985, 220.106, 827.37366, 106.88304, 143.12848, 933.8233, 11.311208, 374.61313]
2025-09-13 13:39:21,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [74.0, 345.0, 272.0, 103.0, 287.0, 57.0, 71.0, 315.0, 28.0, 182.0]
2025-09-13 13:39:21,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 7 hours, 15 minutes, 43 seconds)
2025-09-13 13:49:23,595 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:49:23,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:50:24,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 559.83435 ± 500.658
2025-09-13 13:50:24,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [633.9324, 629.07935, 1113.5547, 1018.5799, 435.17596, 22.6514, 18.856558, 1532.0914, 164.89697, 29.524904]
2025-09-13 13:50:24,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [243.0, 231.0, 407.0, 352.0, 181.0, 23.0, 19.0, 544.0, 79.0, 30.0]
2025-09-13 13:50:24,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 7 hours, 6 minutes, 43 seconds)
2025-09-13 14:00:51,653 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:00:51,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:01:58,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 632.32391 ± 555.134
2025-09-13 14:01:58,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [595.7509, 733.6154, 17.095806, 10.501667, 818.33984, 1800.4387, 263.5557, 199.7974, 1384.1804, 499.96292]
2025-09-13 14:01:58,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [221.0, 256.0, 21.0, 18.0, 284.0, 592.0, 117.0, 95.0, 466.0, 231.0]
2025-09-13 14:01:58,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 7 hours, 4 minutes, 2 seconds)
2025-09-13 14:11:48,947 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:11:48,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:12:28,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 329.11823 ± 314.463
2025-09-13 14:12:28,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [141.52867, 14.311542, 180.2316, 950.4169, 456.93625, 168.9403, 163.66284, 886.2991, 85.898705, 242.9562]
2025-09-13 14:12:28,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [71.0, 22.0, 85.0, 341.0, 187.0, 81.0, 88.0, 332.0, 50.0, 121.0]
2025-09-13 14:12:28,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 6 hours, 47 minutes, 47 seconds)
2025-09-13 14:22:51,434 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:22:51,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:23:24,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 269.24323 ± 275.894
2025-09-13 14:23:24,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [30.432737, 18.39937, 370.15598, 164.84421, 24.56178, 16.620996, 491.69424, 894.96136, 189.84387, 490.91742]
2025-09-13 14:23:24,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [27.0, 21.0, 156.0, 96.0, 24.0, 20.0, 189.0, 344.0, 94.0, 187.0]
2025-09-13 14:23:24,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 6 hours, 36 minutes, 27 seconds)
2025-09-13 14:33:27,559 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:33:27,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:34:10,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 365.43857 ± 259.227
2025-09-13 14:34:10,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [187.0805, 152.14008, 424.74408, 548.8524, 566.68964, 20.136904, 384.1115, 12.997576, 874.78723, 482.8458]
2025-09-13 14:34:10,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 77.0, 165.0, 222.0, 216.0, 21.0, 180.0, 19.0, 281.0, 201.0]
2025-09-13 14:34:10,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 6 hours, 23 minutes, 46 seconds)
2025-09-13 14:44:09,115 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:44:09,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:45:06,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 531.69720 ± 875.995
2025-09-13 14:45:06,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [231.91693, 11.541715, 17.575167, 1803.6886, 2664.5178, 90.71654, 168.56442, 218.7519, 16.06785, 93.63069]
2025-09-13 14:45:06,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [109.0, 17.0, 21.0, 596.0, 869.0, 50.0, 93.0, 104.0, 31.0, 54.0]
2025-09-13 14:45:06,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 6 hours, 11 minutes, 52 seconds)
2025-09-13 14:55:31,250 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:55:31,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:56:34,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 599.22498 ± 734.241
2025-09-13 14:56:34,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [97.89873, 2132.038, 844.67786, 98.01622, 11.124289, 8.909572, 19.273285, 1679.4235, 922.2864, 178.60216]
2025-09-13 14:56:34,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [53.0, 726.0, 305.0, 80.0, 14.0, 16.0, 23.0, 554.0, 310.0, 84.0]
2025-09-13 14:56:34,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 6 hours, 21 seconds)
2025-09-13 15:06:31,807 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:06:31,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:07:14,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 357.79068 ± 383.246
2025-09-13 15:07:14,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [18.12351, 414.3006, 472.87207, 107.222206, 199.8081, 108.93155, 988.1581, 114.6336, 13.563398, 1140.2933]
2025-09-13 15:07:14,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 167.0, 189.0, 58.0, 93.0, 58.0, 394.0, 60.0, 14.0, 416.0]
2025-09-13 15:07:14,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 5 hours, 50 minutes, 31 seconds)
2025-09-13 15:17:31,797 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:17:31,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:18:26,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 482.97934 ± 398.794
2025-09-13 15:18:26,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [977.49677, 490.04596, 9.335214, 118.793945, 354.36047, 1006.55347, 206.50749, 531.7547, 1117.4873, 17.458218]
2025-09-13 15:18:26,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [335.0, 188.0, 14.0, 92.0, 149.0, 362.0, 97.0, 195.0, 412.0, 20.0]
2025-09-13 15:18:26,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 5 hours, 41 minutes, 12 seconds)
2025-09-13 15:28:34,953 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:28:34,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:29:57,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 797.35455 ± 1119.263
2025-09-13 15:29:57,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1491.665, 16.488474, 12.9265375, 3095.6409, 143.6729, 411.66513, 2621.5872, 89.496925, 75.436066, 14.966883]
2025-09-13 15:29:57,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [516.0, 17.0, 18.0, 990.0, 84.0, 179.0, 926.0, 50.0, 47.0, 19.0]
2025-09-13 15:29:57,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (797.35) for latency ExtremeSparseL4U32
2025-09-13 15:29:57,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 5 hours, 34 minutes, 43 seconds)
2025-09-13 15:40:19,478 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:40:19,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:41:29,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 645.42108 ± 841.018
2025-09-13 15:41:29,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [131.41057, 2654.274, 16.573706, 626.8723, 30.11328, 11.650204, 244.74016, 1444.8433, 15.310266, 1278.4231]
2025-09-13 15:41:29,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [67.0, 922.0, 18.0, 265.0, 32.0, 15.0, 126.0, 498.0, 25.0, 443.0]
2025-09-13 15:41:29,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 5 hours, 27 minutes, 2 seconds)
2025-09-13 15:51:23,101 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:51:23,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:52:04,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 352.55145 ± 326.543
2025-09-13 15:52:04,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [621.0776, 148.2837, 130.28152, 385.6626, 180.26672, 1174.2559, 505.00516, 13.228956, 212.92545, 154.52696]
2025-09-13 15:52:04,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [240.0, 73.0, 68.0, 161.0, 87.0, 412.0, 189.0, 29.0, 104.0, 75.0]
2025-09-13 15:52:04,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 5 hours, 10 minutes, 49 seconds)
2025-09-13 16:02:18,317 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:02:18,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:04:11,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1128.21558 ± 1015.198
2025-09-13 16:04:11,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1171.2614, 2230.3315, 113.48859, 1963.2699, 343.0777, 3007.339, 415.1144, 66.5026, 108.684006, 1863.0867]
2025-09-13 16:04:11,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [401.0, 766.0, 82.0, 671.0, 145.0, 1000.0, 165.0, 44.0, 74.0, 619.0]
2025-09-13 16:04:11,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (1128.22) for latency ExtremeSparseL4U32
2025-09-13 16:04:11,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 7 minutes, 33 seconds)
2025-09-13 16:14:25,267 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:14:25,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:15:24,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 556.22217 ± 490.788
2025-09-13 16:15:24,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1353.7153, 1118.0829, 177.22935, 855.4421, 75.21549, 88.29567, 1198.5562, 90.111336, 390.10123, 215.47224]
2025-09-13 16:15:24,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [414.0, 360.0, 84.0, 321.0, 44.0, 50.0, 430.0, 65.0, 157.0, 107.0]
2025-09-13 16:15:24,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 4 hours, 56 minutes, 13 seconds)
2025-09-13 16:26:07,710 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:26:07,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:27:03,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 501.16959 ± 650.450
2025-09-13 16:27:03,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1008.90753, 2132.525, 363.95413, 95.99686, 17.182568, 987.5608, 101.914375, 208.43954, 21.324951, 73.89]
2025-09-13 16:27:03,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [338.0, 711.0, 161.0, 53.0, 17.0, 365.0, 57.0, 98.0, 29.0, 62.0]
2025-09-13 16:27:03,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 4 hours, 45 minutes, 26 seconds)
2025-09-13 16:36:42,900 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:36:42,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:37:34,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 451.64990 ± 431.524
2025-09-13 16:37:34,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [15.701619, 46.94161, 611.6515, 691.8467, 814.71027, 15.680159, 473.5538, 12.565631, 428.6206, 1405.2269]
2025-09-13 16:37:34,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 46.0, 240.0, 243.0, 300.0, 20.0, 205.0, 21.0, 171.0, 497.0]
2025-09-13 16:37:34,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 4 hours, 29 minutes, 12 seconds)
2025-09-13 16:48:10,900 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:48:10,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:49:10,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 526.77399 ± 576.452
2025-09-13 16:49:10,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [138.5678, 616.97906, 290.76917, 155.98073, 1157.5559, 499.19217, 84.19173, 1966.9552, 352.10413, 5.443885]
2025-09-13 16:49:10,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [71.0, 234.0, 139.0, 77.0, 402.0, 198.0, 47.0, 699.0, 152.0, 8.0]
2025-09-13 16:49:10,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 22 minutes, 36 seconds)
2025-09-13 16:58:50,315 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:58:50,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:59:35,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 423.54068 ± 495.931
2025-09-13 16:59:35,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1110.4967, 13.974187, 26.272928, 1348.0637, 16.740816, 20.364422, 154.30182, 672.2925, 21.406189, 851.4932]
2025-09-13 16:59:35,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [382.0, 20.0, 23.0, 464.0, 32.0, 22.0, 78.0, 246.0, 19.0, 302.0]
2025-09-13 16:59:35,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 3 minutes, 45 seconds)
2025-09-13 17:09:46,518 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:09:46,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:10:51,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 618.12469 ± 430.553
2025-09-13 17:10:51,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [13.519717, 917.5855, 968.9995, 423.5177, 1008.1439, 585.1426, 574.919, 106.3028, 174.9696, 1408.1462]
2025-09-13 17:10:51,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 331.0, 326.0, 169.0, 309.0, 222.0, 257.0, 57.0, 81.0, 502.0]
2025-09-13 17:10:51,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 3 hours, 52 minutes, 54 seconds)
2025-09-13 17:21:17,738 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:21:17,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:22:52,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 920.47986 ± 925.358
2025-09-13 17:22:52,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [64.481125, 2761.5444, 180.56923, 1036.1813, 1209.8694, 2297.2234, 19.16696, 1209.3936, 167.23497, 259.1339]
2025-09-13 17:22:52,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [39.0, 1000.0, 85.0, 354.0, 398.0, 771.0, 20.0, 394.0, 90.0, 120.0]
2025-09-13 17:22:53,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 3 hours, 43 minutes, 19 seconds)
2025-09-13 17:32:51,969 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:32:51,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:33:59,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 622.86084 ± 566.411
2025-09-13 17:33:59,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [18.54726, 200.3536, 1066.5952, 1102.8064, 238.11635, 1728.3229, 183.90297, 445.92874, 54.92232, 1189.1129]
2025-09-13 17:33:59,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 104.0, 381.0, 378.0, 113.0, 583.0, 88.0, 174.0, 38.0, 408.0]
2025-09-13 17:33:59,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 3 hours, 34 minutes, 21 seconds)
2025-09-13 17:44:09,708 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:44:09,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:45:32,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 787.52307 ± 600.827
2025-09-13 17:45:32,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1572.0386, 231.72005, 843.1855, 83.03669, 78.224915, 1273.3911, 920.15576, 101.13323, 1038.6835, 1733.6615]
2025-09-13 17:45:32,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [548.0, 107.0, 314.0, 50.0, 61.0, 461.0, 306.0, 58.0, 360.0, 581.0]
2025-09-13 17:45:32,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 22 minutes, 56 seconds)
2025-09-13 17:55:57,310 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:55:57,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:56:50,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 487.21152 ± 430.375
2025-09-13 17:56:50,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [658.9803, 23.944681, 201.06738, 781.0683, 992.64484, 794.9757, 103.85255, 1227.7408, 69.75109, 18.089825]
2025-09-13 17:56:50,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [236.0, 28.0, 96.0, 285.0, 334.0, 287.0, 56.0, 442.0, 52.0, 22.0]
2025-09-13 17:56:50,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 14 minutes, 38 seconds)
2025-09-13 18:06:50,653 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:06:50,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:08:02,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 689.38947 ± 650.386
2025-09-13 18:08:02,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [425.58044, 735.71606, 1150.6389, 1014.5314, 642.0631, 632.07074, 2257.143, 18.09935, 6.003573, 12.048344]
2025-09-13 18:08:02,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [174.0, 277.0, 387.0, 359.0, 258.0, 248.0, 740.0, 24.0, 8.0, 19.0]
2025-09-13 18:08:02,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 2 minutes, 58 seconds)
2025-09-13 18:18:16,536 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:18:16,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:19:08,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 481.94669 ± 444.949
2025-09-13 18:19:08,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [563.34796, 1180.5275, 505.86032, 1171.5396, 157.2157, 951.3603, 93.867195, 14.364537, 153.31421, 28.069456]
2025-09-13 18:19:08,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [239.0, 373.0, 188.0, 392.0, 79.0, 329.0, 52.0, 16.0, 81.0, 26.0]
2025-09-13 18:19:08,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 2 hours, 48 minutes, 45 seconds)
2025-09-13 18:29:11,818 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:29:11,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:30:10,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 553.97656 ± 513.251
2025-09-13 18:30:10,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [180.80634, 931.45044, 930.3527, 639.1467, 19.845491, 74.86543, 1560.508, 95.50186, 1036.0677, 71.220825]
2025-09-13 18:30:10,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 317.0, 328.0, 234.0, 20.0, 44.0, 523.0, 54.0, 355.0, 43.0]
2025-09-13 18:30:10,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 37 minutes, 19 seconds)
2025-09-13 18:40:32,232 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:40:32,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:41:46,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 705.52942 ± 860.131
2025-09-13 18:41:46,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1046.6509, 172.29376, 634.6504, 603.5113, 402.0437, 9.5840435, 246.6629, 14.294225, 832.3487, 3093.2544]
2025-09-13 18:41:46,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [376.0, 91.0, 222.0, 229.0, 164.0, 14.0, 114.0, 18.0, 308.0, 1000.0]
2025-09-13 18:41:46,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 26 minutes, 11 seconds)
2025-09-13 18:51:51,079 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:51:51,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:52:52,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 572.65637 ± 660.134
2025-09-13 18:52:52,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [9.992666, 15.169847, 313.2178, 2180.7898, 1352.0103, 260.09827, 231.18297, 252.76302, 279.6342, 831.70483]
2025-09-13 18:52:52,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 21.0, 131.0, 728.0, 449.0, 121.0, 113.0, 122.0, 128.0, 300.0]
2025-09-13 18:52:52,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 14 minutes, 28 seconds)
2025-09-13 19:03:08,795 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:03:08,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:04:12,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 583.61438 ± 616.401
2025-09-13 19:04:12,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [690.12714, 1711.2137, 83.45425, 1455.6172, 536.04926, 54.542458, 1171.5435, 110.07992, 9.352167, 14.1650505]
2025-09-13 19:04:12,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [260.0, 615.0, 77.0, 500.0, 204.0, 33.0, 391.0, 59.0, 15.0, 29.0]
2025-09-13 19:04:12,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 3 minutes, 33 seconds)
2025-09-13 19:14:43,032 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:14:43,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:15:18,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 322.43982 ± 410.004
2025-09-13 19:15:18,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [76.36876, 195.95587, 181.02504, 140.3612, 800.24927, 419.8957, 18.32888, 27.438782, 20.987902, 1343.7867]
2025-09-13 19:15:18,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [43.0, 93.0, 86.0, 70.0, 259.0, 167.0, 19.0, 30.0, 23.0, 447.0]
2025-09-13 19:15:18,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 1 hour, 52 minutes, 20 seconds)
2025-09-13 19:25:14,678 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:25:14,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:26:16,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 612.49579 ± 437.965
2025-09-13 19:26:16,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [111.21712, 31.305464, 1179.6317, 982.3485, 468.9661, 464.38324, 721.34326, 350.14923, 1443.0684, 372.54498]
2025-09-13 19:26:16,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [59.0, 26.0, 384.0, 323.0, 181.0, 170.0, 242.0, 159.0, 484.0, 145.0]
2025-09-13 19:26:16,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 40 minutes, 59 seconds)
2025-09-13 19:36:55,925 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:36:55,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:37:52,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 552.56702 ± 542.554
2025-09-13 19:37:52,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [18.51811, 403.83893, 71.69978, 1453.9254, 1141.9636, 1236.6388, 120.446754, 920.47876, 137.52069, 20.639723]
2025-09-13 19:37:52,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [27.0, 162.0, 51.0, 458.0, 368.0, 432.0, 65.0, 320.0, 72.0, 23.0]
2025-09-13 19:37:52,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 29 minutes, 46 seconds)
2025-09-13 19:47:26,784 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:47:26,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:48:26,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 555.27039 ± 696.724
2025-09-13 19:48:26,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [197.5095, 21.40594, 21.150919, 261.5131, 1926.8896, 385.45908, 23.343016, 308.0743, 1901.7372, 505.62112]
2025-09-13 19:48:26,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 26.0, 25.0, 119.0, 627.0, 160.0, 21.0, 135.0, 635.0, 192.0]
2025-09-13 19:48:26,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 17 minutes, 47 seconds)
2025-09-13 19:58:41,036 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:58:41,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:59:55,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 717.95740 ± 796.562
2025-09-13 19:59:55,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [377.98306, 27.97052, 113.80611, 2806.7214, 910.74207, 1367.7513, 654.3048, 409.43042, 395.10947, 115.75491]
2025-09-13 19:59:55,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [159.0, 29.0, 60.0, 927.0, 299.0, 472.0, 248.0, 154.0, 159.0, 85.0]
2025-09-13 19:59:55,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 6 minutes, 52 seconds)
2025-09-13 20:10:12,343 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:10:12,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:10:31,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 139.95207 ± 288.845
2025-09-13 20:10:31,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [52.470455, 991.9334, 10.408896, 20.11675, 28.939217, 62.87707, 14.104226, 12.7856865, 194.13326, 11.751839]
2025-09-13 20:10:31,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [36.0, 357.0, 12.0, 20.0, 25.0, 40.0, 31.0, 15.0, 125.0, 18.0]
2025-09-13 20:10:31,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 55 minutes, 13 seconds)
2025-09-13 20:20:41,208 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:20:41,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:21:51,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 652.73718 ± 936.933
2025-09-13 20:21:51,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1824.1078, 10.304896, 97.73234, 680.93726, 2995.8347, 422.52924, 146.96922, 226.63718, 14.336195, 107.98229]
2025-09-13 20:21:51,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [620.0, 14.0, 53.0, 253.0, 1000.0, 158.0, 75.0, 131.0, 17.0, 88.0]
2025-09-13 20:21:51,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 44 minutes, 27 seconds)
2025-09-13 20:32:45,905 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:32:45,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:34:15,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 924.63635 ± 901.022
2025-09-13 20:34:15,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1598.4637, 29.076624, 668.92944, 451.6022, 1064.1193, 969.6379, 1301.3356, 3113.408, 22.199741, 27.589745]
2025-09-13 20:34:15,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [541.0, 28.0, 226.0, 168.0, 329.0, 306.0, 416.0, 1000.0, 21.0, 29.0]
2025-09-13 20:34:15,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 33 minutes, 50 seconds)
2025-09-13 20:44:03,071 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:44:03,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:44:49,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 438.43488 ± 537.985
2025-09-13 20:44:49,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1289.2867, 23.775469, 588.25415, 461.36282, 1567.8271, 61.861237, 9.080223, 8.86622, 9.844556, 364.1902]
2025-09-13 20:44:49,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [436.0, 23.0, 231.0, 175.0, 496.0, 39.0, 13.0, 14.0, 15.0, 155.0]
2025-09-13 20:44:49,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 22 minutes, 33 seconds)
2025-09-13 20:54:53,944 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:54:53,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:56:14,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 776.42468 ± 781.927
2025-09-13 20:56:14,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1726.9835, 198.78334, 60.108055, 173.22539, 706.09674, 2240.4463, 66.19981, 858.53876, 1680.5836, 53.28194]
2025-09-13 20:56:14,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [622.0, 91.0, 40.0, 84.0, 283.0, 721.0, 39.0, 336.0, 569.0, 36.0]
2025-09-13 20:56:14,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 11 minutes, 15 seconds)
2025-09-13 21:06:40,113 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:06:40,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:07:46,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 655.27979 ± 931.156
2025-09-13 21:07:46,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1829.9807, 12.137159, 17.393183, 558.459, 110.079094, 643.0385, 324.44272, 77.83014, 18.69686, 2960.7402]
2025-09-13 21:07:46,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [579.0, 13.0, 23.0, 225.0, 61.0, 216.0, 137.0, 44.0, 27.0, 976.0]
2025-09-13 21:07:46,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1251 [DEBUG]: Training session finished
