2025-09-11 18:51:21,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc5-hopper/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 18:51:21,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc5-hopper/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 18:51:21,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14a294bf0210>}
2025-09-11 18:51:21,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1111 [DEBUG]: using device: cuda
2025-09-11 18:51:21,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1133 [INFO]: Creating new trainer
2025-09-11 18:51:21,552 baseline-mbpac-noiseperc5-hopper:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-09-11 18:51:21,552 baseline-mbpac-noiseperc5-hopper:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 18:51:21,559 baseline-mbpac-noiseperc5-hopper:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=3, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-11 18:51:22,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1194 [DEBUG]: Starting training session...
2025-09-11 18:51:22,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 1/100
2025-09-11 19:01:12,422 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:01:12,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:01:29,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 115.16858 ± 13.717
2025-09-11 19:01:29,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [132.6585, 121.11875, 114.553314, 117.14857, 115.994545, 118.49201, 116.62787, 119.52608, 76.58633, 118.97988]
2025-09-11 19:01:29,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [72.0, 68.0, 65.0, 67.0, 66.0, 67.0, 70.0, 68.0, 46.0, 67.0]
2025-09-11 19:01:29,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (115.17) for latency ExtremeClogL1U23
2025-09-11 19:01:29,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 16 hours, 42 minutes, 15 seconds)
2025-09-11 19:12:32,937 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:12:32,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:12:50,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 114.34888 ± 46.427
2025-09-11 19:12:50,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [138.52379, 59.003452, 75.206345, 207.32385, 141.37794, 48.870857, 130.24141, 74.27056, 139.5751, 129.09544]
2025-09-11 19:12:50,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [71.0, 37.0, 46.0, 106.0, 73.0, 30.0, 84.0, 53.0, 89.0, 69.0]
2025-09-11 19:12:50,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 17 hours, 31 minutes, 45 seconds)
2025-09-11 19:23:56,159 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:23:56,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:24:21,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 183.14185 ± 113.251
2025-09-11 19:24:21,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [339.2231, 319.87772, 292.11792, 76.61758, 252.78891, 72.776695, 264.44925, 60.565273, 93.29734, 59.70465]
2025-09-11 19:24:21,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [161.0, 161.0, 138.0, 58.0, 126.0, 59.0, 120.0, 39.0, 50.0, 48.0]
2025-09-11 19:24:21,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (183.14) for latency ExtremeClogL1U23
2025-09-11 19:24:21,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 17 hours, 46 minutes, 41 seconds)
2025-09-11 19:35:31,787 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:35:31,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:36:00,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 221.26762 ± 106.718
2025-09-11 19:36:00,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [72.66316, 321.23325, 334.8521, 279.4263, 305.75137, 80.56715, 107.06513, 312.3255, 109.11448, 289.67767]
2025-09-11 19:36:00,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [44.0, 156.0, 136.0, 122.0, 133.0, 51.0, 82.0, 145.0, 64.0, 141.0]
2025-09-11 19:36:00,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (221.27) for latency ExtremeClogL1U23
2025-09-11 19:36:00,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 17 hours, 51 minutes, 17 seconds)
2025-09-11 19:47:08,484 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:47:08,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:48:00,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 379.10129 ± 52.043
2025-09-11 19:48:00,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [441.21817, 378.7767, 324.83145, 496.08072, 381.90887, 347.8869, 386.7577, 379.36346, 335.64313, 318.54587]
2025-09-11 19:48:00,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [241.0, 174.0, 155.0, 275.0, 170.0, 169.0, 185.0, 221.0, 169.0, 162.0]
2025-09-11 19:48:00,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (379.10) for latency ExtremeClogL1U23
2025-09-11 19:48:00,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 17 hours, 55 minutes, 55 seconds)
2025-09-11 19:59:20,951 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:59:20,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:59:50,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 243.85147 ± 167.490
2025-09-11 19:59:50,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [17.581514, 167.29483, 239.70804, 370.25488, 22.465515, 407.70615, 400.91656, 477.0069, 24.589512, 310.99063]
2025-09-11 19:59:50,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 82.0, 109.0, 150.0, 21.0, 147.0, 166.0, 181.0, 22.0, 189.0]
2025-09-11 19:59:50,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 18 hours, 16 minutes, 47 seconds)
2025-09-11 20:10:48,113 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:10:48,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:11:34,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 331.01309 ± 130.935
2025-09-11 20:11:34,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [544.55756, 349.2436, 340.23782, 185.5476, 541.5283, 210.38098, 333.51474, 283.668, 131.76326, 389.68893]
2025-09-11 20:11:34,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [334.0, 169.0, 170.0, 96.0, 280.0, 121.0, 137.0, 151.0, 74.0, 210.0]
2025-09-11 20:11:34,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 18 hours, 12 minutes, 24 seconds)
2025-09-11 20:22:32,056 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:22:32,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:23:20,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 489.45856 ± 267.158
2025-09-11 20:23:20,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [126.65297, 558.3368, 359.58307, 657.6093, 252.25737, 674.8061, 297.05484, 202.51501, 962.49445, 803.27545]
2025-09-11 20:23:20,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [67.0, 215.0, 157.0, 241.0, 120.0, 234.0, 134.0, 102.0, 332.0, 254.0]
2025-09-11 20:23:20,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (489.46) for latency ExtremeClogL1U23
2025-09-11 20:23:20,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 18 hours, 5 minutes, 5 seconds)
2025-09-11 20:34:22,976 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:34:22,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:35:10,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 542.67603 ± 239.611
2025-09-11 20:35:10,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [606.59045, 615.62256, 580.31793, 615.81494, 661.0879, 552.5686, 161.69414, 31.957275, 751.2563, 849.85077]
2025-09-11 20:35:10,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [200.0, 198.0, 184.0, 204.0, 212.0, 175.0, 76.0, 39.0, 252.0, 277.0]
2025-09-11 20:35:10,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (542.68) for latency ExtremeClogL1U23
2025-09-11 20:35:10,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 17 hours, 56 minutes, 49 seconds)
2025-09-11 20:46:07,623 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:46:07,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:46:48,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 399.41968 ± 282.472
2025-09-11 20:46:48,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [882.2191, 205.7988, 233.99571, 337.5645, 814.0053, 361.10577, 92.50453, 162.35669, 163.54608, 741.10034]
2025-09-11 20:46:48,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [305.0, 108.0, 120.0, 151.0, 243.0, 143.0, 57.0, 89.0, 95.0, 244.0]
2025-09-11 20:46:48,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 17 hours, 38 minutes, 32 seconds)
2025-09-11 20:57:50,853 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:57:50,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:58:58,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 745.09698 ± 233.167
2025-09-11 20:58:58,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [990.7728, 472.33493, 739.46515, 694.306, 853.3983, 884.6117, 230.14836, 981.1348, 648.03625, 956.7616]
2025-09-11 20:58:58,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [312.0, 191.0, 255.0, 244.0, 276.0, 290.0, 111.0, 323.0, 235.0, 337.0]
2025-09-11 20:58:58,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (745.10) for latency ExtremeClogL1U23
2025-09-11 20:58:58,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 17 hours, 32 minutes, 40 seconds)
2025-09-11 21:09:43,342 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:09:43,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:10:29,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 493.54785 ± 376.707
2025-09-11 21:10:29,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [38.65513, 414.30606, 1060.7512, 177.2568, 43.21465, 69.36254, 678.7061, 796.8387, 668.2913, 988.09595]
2025-09-11 21:10:29,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [41.0, 161.0, 343.0, 84.0, 39.0, 92.0, 219.0, 261.0, 212.0, 315.0]
2025-09-11 21:10:29,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 17 hours, 17 minutes, 2 seconds)
2025-09-11 21:21:32,054 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:21:32,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:22:44,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 827.96417 ± 422.881
2025-09-11 21:22:44,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1113.7682, 1041.4368, 844.2597, 141.82315, 1221.4318, 473.0961, 922.53577, 66.30994, 1249.0062, 1205.9739]
2025-09-11 21:22:44,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [377.0, 331.0, 290.0, 73.0, 395.0, 190.0, 322.0, 47.0, 391.0, 392.0]
2025-09-11 21:22:44,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (827.96) for latency ExtremeClogL1U23
2025-09-11 21:22:44,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 17 hours, 13 minutes, 43 seconds)
2025-09-11 21:33:36,631 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:33:36,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:34:42,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 728.86023 ± 467.867
2025-09-11 21:34:42,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [750.92566, 749.0379, 918.816, 235.75256, 1001.39343, 166.98845, 82.393105, 1758.6088, 696.9152, 927.7713]
2025-09-11 21:34:42,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [246.0, 260.0, 315.0, 105.0, 315.0, 80.0, 60.0, 569.0, 238.0, 313.0]
2025-09-11 21:34:42,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 17 hours, 4 minutes, 1 second)
2025-09-11 21:45:48,233 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:45:48,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:46:53,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 737.82416 ± 326.866
2025-09-11 21:46:53,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [95.41538, 860.29785, 1060.1138, 188.2577, 865.3422, 862.16254, 1099.4878, 710.7928, 646.91864, 989.453]
2025-09-11 21:46:53,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [58.0, 291.0, 328.0, 87.0, 287.0, 289.0, 353.0, 240.0, 226.0, 330.0]
2025-09-11 21:46:53,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 17 hours, 1 minute, 27 seconds)
2025-09-11 21:57:47,230 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:57:47,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:58:46,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 683.86017 ± 338.723
2025-09-11 21:58:46,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [110.99669, 268.59753, 955.0618, 820.7833, 1187.8943, 910.63574, 705.9488, 934.0673, 252.82881, 691.7877]
2025-09-11 21:58:46,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [61.0, 118.0, 319.0, 270.0, 365.0, 284.0, 245.0, 306.0, 129.0, 215.0]
2025-09-11 21:58:46,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 16 hours, 44 minutes, 42 seconds)
2025-09-11 22:09:51,764 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:09:51,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:10:53,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 683.01007 ± 288.305
2025-09-11 22:10:53,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [662.196, 530.0887, 933.6441, 1023.717, 195.9236, 292.48578, 746.5925, 1115.7821, 817.6794, 511.9915]
2025-09-11 22:10:53,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [232.0, 224.0, 307.0, 313.0, 90.0, 135.0, 243.0, 351.0, 251.0, 188.0]
2025-09-11 22:10:53,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 16 hours, 42 minutes, 40 seconds)
2025-09-11 22:21:53,030 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:21:53,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:23:24,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1060.61499 ± 513.125
2025-09-11 22:23:24,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1901.1882, 880.1674, 1908.861, 863.1548, 1161.6765, 675.1429, 96.00241, 951.598, 1212.7596, 955.59924]
2025-09-11 22:23:24,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [643.0, 275.0, 614.0, 270.0, 383.0, 217.0, 65.0, 318.0, 399.0, 299.0]
2025-09-11 22:23:24,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (1060.61) for latency ExtremeClogL1U23
2025-09-11 22:23:24,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 16 hours, 34 minutes, 49 seconds)
2025-09-11 22:34:35,965 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:34:35,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:36:06,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1023.91711 ± 825.047
2025-09-11 22:36:06,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [202.95767, 3017.2507, 1214.0903, 1737.1687, 705.7094, 753.18494, 1393.6011, 117.8493, 424.39136, 672.9664]
2025-09-11 22:36:06,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [87.0, 1000.0, 418.0, 536.0, 215.0, 263.0, 448.0, 61.0, 170.0, 230.0]
2025-09-11 22:36:06,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 16 hours, 34 minutes, 30 seconds)
2025-09-11 22:46:43,287 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:46:43,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:47:44,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 639.32892 ± 455.159
2025-09-11 22:47:44,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1051.7429, 213.8508, 298.16766, 1215.2239, 1100.1656, 182.68303, 222.04306, 1199.123, 70.24834, 840.0405]
2025-09-11 22:47:44,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [351.0, 94.0, 137.0, 392.0, 388.0, 97.0, 103.0, 438.0, 60.0, 307.0]
2025-09-11 22:47:44,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 16 hours, 13 minutes, 32 seconds)
2025-09-11 22:58:47,089 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:58:47,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:59:55,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 742.36316 ± 392.093
2025-09-11 22:59:55,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [706.73157, 1088.6442, 694.07007, 940.54926, 1079.1707, 240.78227, 865.7731, 19.261286, 433.2034, 1355.4452]
2025-09-11 22:59:55,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [239.0, 328.0, 249.0, 315.0, 350.0, 103.0, 315.0, 22.0, 187.0, 481.0]
2025-09-11 22:59:55,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 16 hours, 6 minutes, 5 seconds)
2025-09-11 23:11:02,312 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:11:02,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:13:16,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1543.31641 ± 1038.678
2025-09-11 23:13:16,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [44.86401, 493.09656, 1384.0317, 2359.0713, 1369.9907, 103.75365, 1633.2882, 3063.7473, 1974.7799, 3006.5405]
2025-09-11 23:13:16,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [42.0, 185.0, 456.0, 740.0, 469.0, 60.0, 540.0, 1000.0, 660.0, 1000.0]
2025-09-11 23:13:16,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (1543.32) for latency ExtremeClogL1U23
2025-09-11 23:13:16,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 16 hours, 13 minutes, 11 seconds)
2025-09-11 23:24:22,806 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:24:22,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:25:45,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 934.70099 ± 755.880
2025-09-11 23:25:45,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [899.6091, 847.28406, 462.28696, 2025.6025, 769.33624, 2669.3755, 165.19199, 295.65573, 745.6446, 467.02295]
2025-09-11 23:25:45,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [284.0, 294.0, 168.0, 687.0, 229.0, 833.0, 78.0, 138.0, 264.0, 177.0]
2025-09-11 23:25:45,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 16 hours, 16 seconds)
2025-09-11 23:36:39,782 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:36:39,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:37:59,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 914.05322 ± 744.654
2025-09-11 23:37:59,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [99.55396, 2045.0558, 1932.5522, 354.80804, 757.8785, 1838.6049, 52.59047, 227.24284, 633.8032, 1198.4423]
2025-09-11 23:37:59,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [63.0, 666.0, 623.0, 142.0, 265.0, 573.0, 38.0, 103.0, 234.0, 367.0]
2025-09-11 23:37:59,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 15 hours, 40 minutes, 43 seconds)
2025-09-11 23:48:56,360 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:48:56,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:50:40,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1202.84766 ± 1045.282
2025-09-11 23:50:40,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1074.2043, 3218.2654, 564.1039, 1398.4323, 84.38097, 263.32565, 114.51729, 713.96515, 1828.1167, 2769.1646]
2025-09-11 23:50:40,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [342.0, 1000.0, 214.0, 459.0, 53.0, 128.0, 61.0, 258.0, 575.0, 895.0]
2025-09-11 23:50:40,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 15 hours, 43 minutes, 56 seconds)
2025-09-12 00:01:45,027 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:01:45,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:03:32,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1215.97998 ± 932.253
2025-09-12 00:03:32,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [398.6894, 63.98253, 683.6016, 855.0876, 3112.3508, 1284.5679, 694.4471, 2520.608, 1865.1361, 681.3288]
2025-09-12 00:03:32,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [172.0, 52.0, 251.0, 295.0, 1000.0, 429.0, 254.0, 780.0, 593.0, 239.0]
2025-09-12 00:03:32,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 15 hours, 41 minutes, 28 seconds)
2025-09-12 00:14:25,919 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:14:25,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:15:42,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 857.65997 ± 640.140
2025-09-12 00:15:42,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1404.7871, 81.419945, 137.10718, 199.6789, 950.87787, 2030.6919, 1383.4027, 183.7757, 1036.4252, 1168.433]
2025-09-12 00:15:42,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [476.0, 58.0, 86.0, 93.0, 350.0, 670.0, 453.0, 85.0, 325.0, 366.0]
2025-09-12 00:15:42,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 15 hours, 11 minutes, 25 seconds)
2025-09-12 00:27:16,877 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:27:16,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:28:52,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1107.22375 ± 821.162
2025-09-12 00:28:52,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [49.101532, 584.4593, 16.460546, 964.67145, 2662.7405, 2376.0396, 904.3983, 1011.92303, 1348.3206, 1154.1217]
2025-09-12 00:28:52,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [38.0, 208.0, 18.0, 337.0, 854.0, 756.0, 304.0, 309.0, 444.0, 378.0]
2025-09-12 00:28:52,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 15 hours, 8 minutes, 52 seconds)
2025-09-12 00:39:46,344 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:39:46,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:41:49,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1394.67981 ± 1102.490
2025-09-12 00:41:49,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1286.205, 1979.3954, 225.73744, 2196.4712, 69.748, 45.472027, 663.7304, 1321.5732, 3056.7322, 3101.7322]
2025-09-12 00:41:49,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [447.0, 646.0, 113.0, 726.0, 57.0, 41.0, 232.0, 415.0, 1000.0, 1000.0]
2025-09-12 00:41:49,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 15 hours, 6 minutes, 25 seconds)
2025-09-12 00:52:24,471 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:52:24,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:53:29,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 722.36157 ± 501.870
2025-09-12 00:53:29,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [552.9751, 35.01392, 1047.2305, 288.85938, 1412.4744, 1500.401, 1055.3728, 824.0411, 51.307953, 455.9393]
2025-09-12 00:53:29,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [213.0, 36.0, 333.0, 120.0, 483.0, 509.0, 323.0, 284.0, 36.0, 182.0]
2025-09-12 00:53:29,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 14 hours, 39 minutes, 34 seconds)
2025-09-12 01:04:29,797 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:04:29,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:06:34,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1395.34753 ± 1130.613
2025-09-12 01:06:34,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1701.6732, 701.3511, 233.08206, 1753.6516, 2657.2385, 602.66534, 2986.3735, 3055.6094, 235.82623, 26.004776]
2025-09-12 01:06:34,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [589.0, 253.0, 105.0, 590.0, 881.0, 238.0, 1000.0, 1000.0, 106.0, 26.0]
2025-09-12 01:06:34,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 14 hours, 29 minutes, 52 seconds)
2025-09-12 01:17:25,231 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:17:25,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:19:22,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1388.58801 ± 1238.346
2025-09-12 01:19:22,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [219.34442, 2374.3547, 1276.1132, 2888.2078, 318.41864, 155.55537, 2940.8108, 404.61176, 145.87988, 3162.584]
2025-09-12 01:19:22,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [113.0, 726.0, 402.0, 871.0, 129.0, 73.0, 892.0, 148.0, 70.0, 1000.0]
2025-09-12 01:19:22,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 14 hours, 25 minutes, 57 seconds)
2025-09-12 01:30:32,695 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:30:32,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:32:15,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1216.71179 ± 1213.917
2025-09-12 01:32:15,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1069.8356, 619.9457, 36.79491, 69.35229, 316.20456, 32.458332, 3005.211, 1071.4352, 2818.4705, 3127.4104]
2025-09-12 01:32:15,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [311.0, 212.0, 38.0, 56.0, 129.0, 33.0, 930.0, 323.0, 846.0, 1000.0]
2025-09-12 01:32:15,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 14 hours, 9 minutes, 12 seconds)
2025-09-12 01:43:09,896 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:43:09,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:44:24,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 861.43408 ± 870.842
2025-09-12 01:44:24,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [40.2543, 854.4588, 765.4157, 3040.9087, 28.350319, 32.191536, 1203.6818, 1320.1666, 1090.5492, 238.36446]
2025-09-12 01:44:24,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [38.0, 261.0, 246.0, 1000.0, 25.0, 27.0, 414.0, 437.0, 333.0, 100.0]
2025-09-12 01:44:24,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 13 hours, 46 minutes, 6 seconds)
2025-09-12 01:55:28,770 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:55:28,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:57:54,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1647.74475 ± 1133.434
2025-09-12 01:57:54,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1555.7708, 1657.4263, 1630.024, 3109.557, 1927.7078, 28.142563, 432.09027, 37.80556, 3057.5132, 3041.4104]
2025-09-12 01:57:54,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [542.0, 549.0, 573.0, 1000.0, 629.0, 33.0, 161.0, 30.0, 1000.0, 1000.0]
2025-09-12 01:57:54,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (1647.74) for latency ExtremeClogL1U23
2025-09-12 01:57:54,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 13 hours, 57 minutes, 19 seconds)
2025-09-12 02:09:11,922 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:09:11,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:11:00,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1349.57959 ± 781.539
2025-09-12 02:11:00,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2104.0984, 1060.0746, 3255.6533, 1205.8867, 1376.6702, 1058.7678, 253.21353, 741.0644, 1454.7565, 985.6106]
2025-09-12 02:11:00,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [629.0, 312.0, 1000.0, 362.0, 422.0, 319.0, 103.0, 253.0, 474.0, 292.0]
2025-09-12 02:11:00,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 13 hours, 44 minutes, 45 seconds)
2025-09-12 02:21:31,696 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:21:31,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:22:43,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 839.13556 ± 485.995
2025-09-12 02:22:43,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1128.4873, 55.310898, 1087.2552, 712.2832, 113.17157, 1099.834, 290.82693, 1233.7714, 1481.1692, 1189.2465]
2025-09-12 02:22:43,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [334.0, 39.0, 341.0, 242.0, 78.0, 344.0, 117.0, 369.0, 488.0, 383.0]
2025-09-12 02:22:43,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 13 hours, 18 minutes, 10 seconds)
2025-09-12 02:33:52,797 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:33:52,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:35:10,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 863.60193 ± 766.430
2025-09-12 02:35:10,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [199.83426, 457.33752, 2673.2793, 959.25824, 736.5245, 1686.8849, 81.54123, 221.20409, 449.19928, 1170.956]
2025-09-12 02:35:10,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 168.0, 886.0, 331.0, 257.0, 548.0, 66.0, 95.0, 164.0, 359.0]
2025-09-12 02:35:10,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 13 hours, 17 seconds)
2025-09-12 02:46:35,775 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:46:35,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:48:38,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1483.72791 ± 942.600
2025-09-12 02:48:38,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [975.97284, 2974.3857, 2404.6357, 770.7329, 2124.7021, 958.34296, 84.56924, 1878.6831, 283.23022, 2382.0237]
2025-09-12 02:48:38,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [332.0, 943.0, 762.0, 282.0, 634.0, 294.0, 49.0, 558.0, 115.0, 706.0]
2025-09-12 02:48:38,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 13 hours, 3 minutes, 34 seconds)
2025-09-12 02:59:24,061 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:59:24,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:01:25,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1375.36255 ± 887.646
2025-09-12 03:01:25,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [636.2166, 140.19415, 1046.554, 1644.116, 1227.5961, 2995.656, 801.4172, 2967.6824, 1074.5049, 1219.6885]
2025-09-12 03:01:25,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [222.0, 77.0, 347.0, 515.0, 439.0, 1000.0, 272.0, 1000.0, 369.0, 419.0]
2025-09-12 03:01:25,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 12 hours, 42 minutes, 8 seconds)
2025-09-12 03:12:20,832 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:12:20,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:13:20,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 647.10559 ± 542.487
2025-09-12 03:13:20,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1531.4188, 1377.0638, 288.61783, 940.1015, 50.668827, 67.97025, 61.825493, 748.2133, 276.181, 1128.9945]
2025-09-12 03:13:20,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [465.0, 466.0, 138.0, 304.0, 43.0, 53.0, 52.0, 264.0, 126.0, 348.0]
2025-09-12 03:13:20,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 12 hours, 15 minutes, 30 seconds)
2025-09-12 03:24:16,691 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:24:16,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:26:18,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1438.22607 ± 1001.171
2025-09-12 03:26:18,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [579.1628, 1259.824, 1180.9857, 1817.8079, 1957.209, 62.25544, 3064.1416, 236.29062, 3082.791, 1141.7926]
2025-09-12 03:26:18,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [206.0, 425.0, 349.0, 556.0, 640.0, 46.0, 1000.0, 102.0, 1000.0, 334.0]
2025-09-12 03:26:18,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 12 hours, 17 minutes, 31 seconds)
2025-09-12 03:37:33,978 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:37:33,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:38:41,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 706.61584 ± 924.647
2025-09-12 03:38:41,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [335.11603, 2986.1895, 167.7895, 85.04561, 85.14506, 31.936537, 1656.0741, 39.073284, 443.32028, 1236.4681]
2025-09-12 03:38:41,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [137.0, 1000.0, 78.0, 70.0, 48.0, 36.0, 580.0, 39.0, 176.0, 419.0]
2025-09-12 03:38:41,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 12 hours, 3 minutes, 58 seconds)
2025-09-12 03:49:28,752 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:49:28,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:51:23,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1364.37476 ± 1036.620
2025-09-12 03:51:23,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1688.9701, 3201.4006, 903.5163, 1361.4011, 344.7171, 1919.4817, 34.14043, 2873.63, 83.38443, 1233.1063]
2025-09-12 03:51:23,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [560.0, 1000.0, 279.0, 453.0, 163.0, 641.0, 35.0, 883.0, 60.0, 371.0]
2025-09-12 03:51:23,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 11 hours, 42 minutes, 53 seconds)
2025-09-12 04:02:47,889 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:02:47,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:04:24,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1090.43945 ± 993.603
2025-09-12 04:04:24,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1880.6038, 797.29785, 255.33344, 3135.63, 604.4941, 656.6121, 346.43094, 2556.2466, 244.37746, 427.36777]
2025-09-12 04:04:24,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [628.0, 283.0, 125.0, 1000.0, 217.0, 223.0, 141.0, 838.0, 109.0, 158.0]
2025-09-12 04:04:24,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 11 hours, 32 minutes, 53 seconds)
2025-09-12 04:14:45,842 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:14:45,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:16:34,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1246.93738 ± 1145.044
2025-09-12 04:16:34,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [911.1186, 37.417915, 380.2041, 3083.0537, 84.734604, 3133.1816, 806.57495, 1131.6582, 2502.8816, 398.549]
2025-09-12 04:16:34,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [316.0, 39.0, 149.0, 1000.0, 57.0, 1000.0, 283.0, 345.0, 797.0, 154.0]
2025-09-12 04:16:34,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 11 hours, 22 minutes, 57 seconds)
2025-09-12 04:27:43,041 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:27:43,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:29:19,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1141.81982 ± 943.174
2025-09-12 04:29:19,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [619.45856, 2955.7725, 497.07306, 257.46545, 934.5318, 175.5001, 580.80365, 2702.1428, 945.34375, 1750.1062]
2025-09-12 04:29:19,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [242.0, 885.0, 175.0, 124.0, 286.0, 79.0, 210.0, 830.0, 313.0, 521.0]
2025-09-12 04:29:19,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 11 hours, 7 minutes, 57 seconds)
2025-09-12 04:40:17,375 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:40:17,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:42:44,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1692.79456 ± 1081.145
2025-09-12 04:42:44,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1870.1742, 3021.9011, 422.2739, 794.6725, 805.00195, 1531.144, 3085.84, 2207.895, 106.16969, 3082.8726]
2025-09-12 04:42:44,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [599.0, 1000.0, 154.0, 272.0, 285.0, 507.0, 1000.0, 715.0, 71.0, 1000.0]
2025-09-12 04:42:44,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (1692.79) for latency ExtremeClogL1U23
2025-09-12 04:42:44,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 11 hours, 6 minutes, 14 seconds)
2025-09-12 04:53:48,244 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:53:48,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:56:03,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1520.82190 ± 1018.396
2025-09-12 04:56:03,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [940.0788, 426.50958, 1981.754, 3062.789, 3040.3792, 2462.2048, 1161.3333, 1424.7692, 45.437325, 662.96265]
2025-09-12 04:56:03,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [331.0, 179.0, 647.0, 1000.0, 1000.0, 797.0, 401.0, 499.0, 44.0, 235.0]
2025-09-12 04:56:03,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 10 hours, 59 minutes, 32 seconds)
2025-09-12 05:07:14,352 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:07:14,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:08:43,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1062.34045 ± 862.657
2025-09-12 05:08:43,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1165.074, 2597.674, 1688.628, 358.06497, 60.335026, 1581.9222, 2158.0142, 101.414894, 618.7734, 293.50403]
2025-09-12 05:08:43,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [382.0, 831.0, 513.0, 140.0, 55.0, 466.0, 645.0, 57.0, 219.0, 123.0]
2025-09-12 05:08:43,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 10 hours, 43 minutes, 11 seconds)
2025-09-12 05:19:42,182 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:19:42,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:21:14,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1025.99573 ± 1031.952
2025-09-12 05:21:14,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [246.87988, 776.9923, 33.435688, 96.176445, 931.16656, 2545.648, 87.741264, 695.14996, 1746.2657, 3100.502]
2025-09-12 05:21:14,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [104.0, 278.0, 50.0, 64.0, 295.0, 835.0, 52.0, 238.0, 567.0, 1000.0]
2025-09-12 05:21:14,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 10 hours, 33 minutes, 46 seconds)
2025-09-12 05:32:13,262 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:32:13,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:33:42,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1094.19336 ± 540.143
2025-09-12 05:33:42,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [843.73914, 489.72906, 1325.6865, 702.46985, 978.85583, 1031.1044, 1164.848, 875.6516, 954.8005, 2575.0488]
2025-09-12 05:33:42,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [267.0, 173.0, 411.0, 241.0, 307.0, 314.0, 352.0, 271.0, 295.0, 779.0]
2025-09-12 05:33:42,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 10 hours, 18 minutes, 9 seconds)
2025-09-12 05:44:25,497 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:44:25,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:45:51,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1047.06763 ± 578.501
2025-09-12 05:45:51,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2047.8555, 1628.5615, 1118.9037, 197.93549, 961.10944, 1259.0984, 29.824926, 1254.4414, 1254.1124, 718.83307]
2025-09-12 05:45:51,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [645.0, 513.0, 335.0, 87.0, 289.0, 410.0, 32.0, 388.0, 367.0, 231.0]
2025-09-12 05:45:51,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 9 hours, 53 minutes, 12 seconds)
2025-09-12 05:57:49,430 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:57:49,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:59:32,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1248.11792 ± 652.642
2025-09-12 05:59:32,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1618.9359, 498.78903, 1421.0488, 1094.1886, 900.09564, 399.2234, 1761.1182, 2758.6455, 1089.1278, 940.0066]
2025-09-12 05:59:32,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [499.0, 177.0, 421.0, 325.0, 296.0, 163.0, 564.0, 858.0, 324.0, 284.0]
2025-09-12 05:59:32,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 9 hours, 43 minutes, 59 seconds)
2025-09-12 06:09:57,382 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:09:57,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:11:49,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1322.05237 ± 742.934
2025-09-12 06:11:49,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1292.2385, 1129.8652, 750.177, 947.43, 760.77234, 3129.5437, 1131.3016, 505.40884, 1401.6676, 2172.118]
2025-09-12 06:11:49,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [440.0, 377.0, 259.0, 292.0, 270.0, 1000.0, 375.0, 198.0, 428.0, 673.0]
2025-09-12 06:11:49,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 9 hours, 27 minutes, 54 seconds)
2025-09-12 06:22:30,240 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:22:30,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:24:37,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1496.48279 ± 794.241
2025-09-12 06:24:37,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [496.2065, 2022.6155, 2604.8875, 1683.2213, 2418.5037, 706.0116, 1644.544, 649.20935, 497.88214, 2241.7476]
2025-09-12 06:24:37,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [178.0, 663.0, 805.0, 514.0, 743.0, 244.0, 534.0, 219.0, 198.0, 732.0]
2025-09-12 06:24:37,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 9 hours, 17 minutes, 40 seconds)
2025-09-12 06:35:29,991 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:35:29,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:37:04,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1067.84058 ± 836.656
2025-09-12 06:37:04,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3128.2295, 969.14856, 1237.8138, 1825.3214, 78.415504, 1002.79956, 1007.4526, 211.2468, 591.15173, 626.8273]
2025-09-12 06:37:04,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 329.0, 408.0, 606.0, 56.0, 348.0, 339.0, 92.0, 220.0, 227.0]
2025-09-12 06:37:04,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 9 hours, 4 minutes, 53 seconds)
2025-09-12 06:49:01,404 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:49:01,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:50:58,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1320.52991 ± 1260.167
2025-09-12 06:50:58,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [130.90584, 1997.0703, 258.0508, 3040.9858, 944.57117, 184.36769, 3059.192, 42.58374, 3081.3188, 466.25415]
2025-09-12 06:50:58,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [66.0, 652.0, 110.0, 985.0, 324.0, 86.0, 1000.0, 33.0, 1000.0, 176.0]
2025-09-12 06:50:58,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 9 hours, 6 minutes, 57 seconds)
2025-09-12 07:01:48,688 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:01:48,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:03:06,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 898.85126 ± 711.599
2025-09-12 07:03:06,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [898.7202, 1028.5936, 520.6606, 1032.4165, 245.28807, 1681.1294, 69.42329, 87.3373, 945.50824, 2479.4348]
2025-09-12 07:03:06,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [278.0, 334.0, 189.0, 306.0, 103.0, 542.0, 56.0, 71.0, 316.0, 788.0]
2025-09-12 07:03:06,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 41 minutes, 16 seconds)
2025-09-12 07:13:49,071 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:13:49,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:15:23,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1095.40344 ± 513.661
2025-09-12 07:15:23,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [971.1003, 1068.716, 665.844, 1042.5844, 1832.3605, 2015.0997, 1004.18225, 1231.2837, 87.60512, 1035.2584]
2025-09-12 07:15:23,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [327.0, 364.0, 230.0, 322.0, 550.0, 642.0, 348.0, 369.0, 53.0, 361.0]
2025-09-12 07:15:23,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 8 hours, 28 minutes, 25 seconds)
2025-09-12 07:26:12,229 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:26:12,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:27:47,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1076.89807 ± 904.025
2025-09-12 07:27:47,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [19.17825, 263.01648, 767.05066, 1782.4619, 905.1749, 921.9017, 1991.2991, 921.22925, 125.928894, 3071.7397]
2025-09-12 07:27:47,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 125.0, 272.0, 555.0, 312.0, 314.0, 652.0, 300.0, 64.0, 1000.0]
2025-09-12 07:27:47,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 12 minutes, 41 seconds)
2025-09-12 07:38:31,722 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:38:31,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:39:58,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1040.69043 ± 613.155
2025-09-12 07:39:58,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1076.1094, 196.44821, 1550.8192, 96.61664, 859.1369, 1144.363, 388.5845, 1766.6067, 1909.9552, 1418.2646]
2025-09-12 07:39:58,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [327.0, 87.0, 483.0, 55.0, 296.0, 342.0, 144.0, 571.0, 578.0, 429.0]
2025-09-12 07:39:58,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 7 hours, 58 minutes, 6 seconds)
2025-09-12 07:51:01,952 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:51:01,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:52:13,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 866.05878 ± 504.736
2025-09-12 07:52:13,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [971.6291, 1379.3485, 61.51322, 703.0515, 1139.7766, 957.0814, 263.10275, 1296.8966, 253.24414, 1634.9438]
2025-09-12 07:52:13,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [292.0, 437.0, 44.0, 248.0, 340.0, 286.0, 111.0, 416.0, 102.0, 495.0]
2025-09-12 07:52:13,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 7 hours, 33 minutes, 14 seconds)
2025-09-12 08:04:00,362 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:04:00,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:05:46,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1229.96313 ± 993.000
2025-09-12 08:05:46,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2075.2207, 69.23527, 42.661793, 951.3256, 3120.0684, 626.92993, 1222.03, 2615.1948, 606.0842, 970.8809]
2025-09-12 08:05:46,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [661.0, 55.0, 45.0, 303.0, 1000.0, 213.0, 409.0, 841.0, 238.0, 309.0]
2025-09-12 08:05:46,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 31 minutes, 12 seconds)
2025-09-12 08:15:56,188 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:15:56,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:17:24,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1056.93848 ± 687.115
2025-09-12 08:17:24,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [586.16095, 2032.9257, 1206.749, 187.28433, 52.538418, 1803.4385, 1326.9683, 697.5703, 688.8937, 1986.855]
2025-09-12 08:17:24,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [204.0, 622.0, 361.0, 82.0, 49.0, 547.0, 422.0, 234.0, 232.0, 624.0]
2025-09-12 08:17:24,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 14 minutes, 11 seconds)
2025-09-12 08:28:17,073 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:28:17,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:30:00,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1206.69702 ± 854.804
2025-09-12 08:30:00,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3188.104, 1211.9557, 128.77313, 953.1833, 612.68005, 1036.628, 1304.4774, 2143.2896, 265.46286, 1222.4164]
2025-09-12 08:30:00,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [997.0, 384.0, 65.0, 333.0, 221.0, 346.0, 432.0, 688.0, 107.0, 370.0]
2025-09-12 08:30:00,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 3 minutes, 7 seconds)
2025-09-12 08:41:12,649 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:41:12,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:43:15,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1447.35034 ± 841.778
2025-09-12 08:43:15,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3104.3875, 2130.8037, 1164.4865, 702.68774, 1762.7333, 174.26819, 652.09717, 997.46466, 1513.1293, 2271.4468]
2025-09-12 08:43:15,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 687.0, 363.0, 244.0, 574.0, 81.0, 246.0, 313.0, 487.0, 733.0]
2025-09-12 08:43:15,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 6 hours, 57 minutes, 39 seconds)
2025-09-12 08:54:13,493 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:54:13,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:56:05,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1361.20227 ± 717.072
2025-09-12 08:56:05,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1156.102, 302.33173, 1120.5087, 2733.7805, 500.219, 1247.4503, 1881.9696, 1768.189, 2094.8044, 806.66693]
2025-09-12 08:56:05,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [343.0, 123.0, 331.0, 817.0, 178.0, 392.0, 602.0, 530.0, 661.0, 285.0]
2025-09-12 08:56:05,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 6 hours, 48 minutes, 47 seconds)
2025-09-12 09:06:48,713 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:06:48,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:08:12,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1003.09045 ± 592.506
2025-09-12 09:08:12,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [846.7553, 1125.5795, 946.56714, 117.78726, 1352.9237, 2362.6245, 166.83513, 979.78296, 1012.4773, 1119.5713]
2025-09-12 09:08:12,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [279.0, 333.0, 306.0, 67.0, 439.0, 740.0, 78.0, 296.0, 301.0, 326.0]
2025-09-12 09:08:12,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 27 minutes, 5 seconds)
2025-09-12 09:19:16,680 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:19:16,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:21:44,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1749.83044 ± 814.281
2025-09-12 09:21:44,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1081.0459, 2156.955, 3119.0955, 449.1468, 3035.972, 1648.2056, 2102.3997, 1492.7878, 1155.1947, 1257.5028]
2025-09-12 09:21:44,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [363.0, 667.0, 1000.0, 167.0, 1000.0, 485.0, 671.0, 489.0, 347.0, 415.0]
2025-09-12 09:21:44,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (1749.83) for latency ExtremeClogL1U23
2025-09-12 09:21:45,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 26 minutes, 1 second)
2025-09-12 09:33:08,596 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:33:08,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:35:18,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1445.28418 ± 1289.281
2025-09-12 09:35:18,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [58.22093, 876.13556, 2972.3948, 376.7779, 267.63416, 3004.5442, 210.65666, 2982.8872, 3048.379, 655.211]
2025-09-12 09:35:18,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [54.0, 309.0, 1000.0, 148.0, 135.0, 1000.0, 93.0, 1000.0, 1000.0, 231.0]
2025-09-12 09:35:18,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 18 minutes, 43 seconds)
2025-09-12 09:46:03,838 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:46:03,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:48:23,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1563.10840 ± 1139.126
2025-09-12 09:48:23,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2083.7031, 191.05861, 524.06683, 3086.5242, 2027.7375, 2927.0132, 3042.82, 975.7956, 529.7649, 242.59947]
2025-09-12 09:48:23,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [717.0, 97.0, 211.0, 1000.0, 662.0, 961.0, 1000.0, 334.0, 205.0, 127.0]
2025-09-12 09:48:23,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 4 minutes, 45 seconds)
2025-09-12 09:59:10,369 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:59:10,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:01:13,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1482.02808 ± 785.737
2025-09-12 10:01:13,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1123.2793, 1051.8484, 920.5941, 1720.4968, 1699.1497, 1077.8551, 1712.0295, 2607.7012, 47.275967, 2860.0505]
2025-09-12 10:01:13,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [389.0, 327.0, 285.0, 518.0, 528.0, 359.0, 541.0, 825.0, 36.0, 885.0]
2025-09-12 10:01:13,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 51 minutes, 44 seconds)
2025-09-12 10:12:34,061 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:12:34,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:14:25,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1297.11450 ± 1293.110
2025-09-12 10:14:25,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [45.29319, 262.732, 3221.4956, 578.40607, 183.30356, 35.52403, 1186.9698, 3162.8638, 3170.4268, 1124.1312]
2025-09-12 10:14:25,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [35.0, 121.0, 1000.0, 210.0, 81.0, 36.0, 387.0, 1000.0, 1000.0, 325.0]
2025-09-12 10:14:25,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 44 minutes, 20 seconds)
2025-09-12 10:25:56,253 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:25:56,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:28:00,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1409.49683 ± 1021.870
2025-09-12 10:28:00,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2732.903, 1317.0239, 832.04724, 2116.2397, 414.75226, 212.88416, 141.59586, 3123.1812, 2238.3164, 966.0249]
2025-09-12 10:28:00,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [895.0, 407.0, 272.0, 691.0, 160.0, 95.0, 94.0, 1000.0, 751.0, 306.0]
2025-09-12 10:28:00,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 31 minutes, 15 seconds)
2025-09-12 10:38:40,396 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:38:40,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:39:50,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 785.02588 ± 979.477
2025-09-12 10:39:50,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2456.3767, 16.975647, 172.45206, 791.05365, 50.96993, 90.48326, 210.55692, 2880.0981, 688.14374, 493.14954]
2025-09-12 10:39:50,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [800.0, 17.0, 79.0, 283.0, 42.0, 64.0, 90.0, 918.0, 245.0, 176.0]
2025-09-12 10:39:50,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 9 minutes, 47 seconds)
2025-09-12 10:50:31,312 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:50:31,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:52:38,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1456.23206 ± 937.788
2025-09-12 10:52:38,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2692.0999, 579.6627, 1310.1239, 3069.7092, 492.54913, 2013.4438, 877.06085, 1098.8098, 200.82964, 2228.0308]
2025-09-12 10:52:38,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [880.0, 218.0, 445.0, 1000.0, 177.0, 665.0, 297.0, 352.0, 88.0, 675.0]
2025-09-12 10:52:38,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 55 minutes, 32 seconds)
2025-09-12 11:03:56,197 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:03:56,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:06:31,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1842.11682 ± 914.156
2025-09-12 11:06:31,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1863.4968, 1232.8202, 3031.1316, 1921.3043, 1392.426, 3095.8333, 858.76135, 437.19696, 1445.0735, 3143.1248]
2025-09-12 11:06:31,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [587.0, 407.0, 946.0, 618.0, 480.0, 1000.0, 290.0, 176.0, 467.0, 1000.0]
2025-09-12 11:06:31,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (1842.12) for latency ExtremeClogL1U23
2025-09-12 11:06:31,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 47 minutes, 19 seconds)
2025-09-12 11:17:31,298 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:17:31,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:20:01,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1756.33459 ± 1147.678
2025-09-12 11:20:01,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3094.1301, 3158.9414, 169.1651, 3084.8608, 1064.5801, 2373.0767, 1726.4746, 278.07538, 2224.0142, 390.02814]
2025-09-12 11:20:01,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 78.0, 1000.0, 359.0, 776.0, 562.0, 118.0, 660.0, 150.0]
2025-09-12 11:20:01,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 35 minutes, 31 seconds)
2025-09-12 11:31:40,101 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:31:40,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:32:52,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 840.70245 ± 812.326
2025-09-12 11:32:52,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1190.0887, 357.41937, 1136.1179, 1114.4708, 163.3364, 2985.735, 472.77216, 552.3883, 329.7326, 104.9632]
2025-09-12 11:32:52,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [346.0, 141.0, 344.0, 337.0, 75.0, 926.0, 188.0, 206.0, 135.0, 74.0]
2025-09-12 11:32:52,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 19 minutes, 30 seconds)
2025-09-12 11:43:16,223 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:43:16,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:45:27,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1505.56262 ± 995.252
2025-09-12 11:45:27,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2051.4639, 520.95685, 1827.6224, 422.2294, 774.65497, 247.14568, 3117.844, 1501.7242, 1461.2433, 3130.7415]
2025-09-12 11:45:27,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [674.0, 191.0, 613.0, 180.0, 279.0, 106.0, 1000.0, 496.0, 453.0, 1000.0]
2025-09-12 11:45:27,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 9 minutes, 19 seconds)
2025-09-12 11:56:13,432 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:56:13,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:58:50,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1900.67578 ± 975.853
2025-09-12 11:58:50,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1271.4457, 1856.5015, 3193.9648, 1101.5856, 459.9791, 550.7787, 3216.4004, 2316.616, 2870.031, 2169.4548]
2025-09-12 11:58:50,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [404.0, 590.0, 1000.0, 358.0, 165.0, 212.0, 1000.0, 685.0, 849.0, 671.0]
2025-09-12 11:58:50,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (1900.68) for latency ExtremeClogL1U23
2025-09-12 11:58:50,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 58 minutes, 17 seconds)
2025-09-12 12:10:07,225 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:10:07,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:11:16,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 751.82422 ± 721.166
2025-09-12 12:11:16,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [142.36765, 427.11075, 1267.2594, 35.493107, 1680.4243, 1287.6365, 2065.4187, 22.502686, 540.9203, 49.108307]
2025-09-12 12:11:16,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [68.0, 163.0, 424.0, 37.0, 557.0, 419.0, 664.0, 25.0, 207.0, 37.0]
2025-09-12 12:11:16,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 40 minutes, 6 seconds)
2025-09-12 12:21:55,583 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:21:55,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:23:57,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1461.65015 ± 815.419
2025-09-12 12:23:57,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1304.1198, 3153.1272, 846.4168, 1444.5807, 721.73926, 975.1181, 2367.766, 1181.2845, 2220.9155, 401.43347]
2025-09-12 12:23:57,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [408.0, 1000.0, 288.0, 469.0, 266.0, 325.0, 763.0, 370.0, 699.0, 152.0]
2025-09-12 12:23:57,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 24 minutes, 35 seconds)
2025-09-12 12:34:53,617 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:34:53,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:36:11,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 888.98114 ± 950.693
2025-09-12 12:36:11,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2941.2842, 439.77625, 1978.898, 25.92072, 553.30554, 134.16064, 74.69443, 459.71582, 1860.3447, 421.71066]
2025-09-12 12:36:11,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [879.0, 163.0, 628.0, 27.0, 211.0, 91.0, 62.0, 172.0, 590.0, 156.0]
2025-09-12 12:36:11,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 9 minutes, 56 seconds)
2025-09-12 12:47:06,147 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:47:06,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:49:02,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1394.50134 ± 760.562
2025-09-12 12:49:02,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1179.8126, 1174.7338, 250.81767, 2540.5442, 1389.5378, 600.93976, 2202.7324, 2526.8726, 1394.0763, 684.94684]
2025-09-12 12:49:02,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [390.0, 362.0, 102.0, 778.0, 428.0, 211.0, 680.0, 815.0, 417.0, 227.0]
2025-09-12 12:49:02,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 58 minutes, 1 second)
2025-09-12 13:00:10,529 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:00:10,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:01:03,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 643.58734 ± 458.341
2025-09-12 13:01:03,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [151.17972, 41.22848, 976.9445, 84.387405, 1074.8036, 72.888435, 1128.8894, 917.9698, 949.6402, 1037.942]
2025-09-12 13:01:03,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [73.0, 32.0, 289.0, 48.0, 353.0, 52.0, 328.0, 274.0, 276.0, 304.0]
2025-09-12 13:01:03,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 41 minutes, 47 seconds)
2025-09-12 13:12:39,941 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:12:39,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:14:33,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1321.50952 ± 1070.377
2025-09-12 13:14:33,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [874.7182, 125.21327, 905.5536, 2226.2822, 3127.6477, 238.73917, 260.02435, 1142.8263, 3117.578, 1196.5131]
2025-09-12 13:14:33,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [282.0, 62.0, 307.0, 733.0, 1000.0, 101.0, 107.0, 381.0, 1000.0, 397.0]
2025-09-12 13:14:33,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 31 minutes, 54 seconds)
2025-09-12 13:25:10,343 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:25:10,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:26:55,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1234.47290 ± 1150.768
2025-09-12 13:26:55,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3171.983, 130.09192, 3200.5051, 2142.8167, 1534.4337, 145.34921, 517.9237, 802.7541, 215.0338, 483.83762]
2025-09-12 13:26:55,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 64.0, 1000.0, 641.0, 494.0, 73.0, 189.0, 278.0, 107.0, 175.0]
2025-09-12 13:26:55,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 18 minutes, 32 seconds)
2025-09-12 13:37:57,446 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:37:57,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:39:56,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1430.73132 ± 898.957
2025-09-12 13:39:56,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1387.3938, 3127.112, 3182.7747, 824.4245, 1222.2008, 621.21173, 667.47614, 1172.0349, 1304.5424, 798.1429]
2025-09-12 13:39:56,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [414.0, 959.0, 1000.0, 279.0, 364.0, 214.0, 228.0, 362.0, 383.0, 278.0]
2025-09-12 13:39:56,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 7 minutes, 29 seconds)
2025-09-12 13:50:36,868 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:50:36,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:52:11,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1119.70142 ± 811.582
2025-09-12 13:52:11,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [888.4591, 844.665, 401.85666, 93.058304, 960.7035, 1250.3873, 651.7601, 858.84814, 2490.9385, 2756.3376]
2025-09-12 13:52:11,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [292.0, 288.0, 152.0, 64.0, 297.0, 366.0, 218.0, 274.0, 787.0, 855.0]
2025-09-12 13:52:11,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 53 minutes, 40 seconds)
2025-09-12 14:04:09,758 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:04:09,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:06:25,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1588.86743 ± 929.719
2025-09-12 14:06:25,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1218.4512, 508.92398, 606.5162, 3139.1252, 3180.1992, 2004.949, 1253.617, 609.8087, 1981.5621, 1385.5201]
2025-09-12 14:06:25,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [361.0, 180.0, 229.0, 1000.0, 1000.0, 626.0, 418.0, 214.0, 636.0, 451.0]
2025-09-12 14:06:25,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 44 minutes, 33 seconds)
2025-09-12 14:16:48,267 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:16:48,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:19:08,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1597.76318 ± 1081.096
2025-09-12 14:19:08,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [643.2619, 2156.8867, 2823.3833, 3134.1873, 3179.0764, 717.62714, 190.71211, 838.76624, 720.4958, 1573.2351]
2025-09-12 14:19:08,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [221.0, 709.0, 900.0, 1000.0, 1000.0, 246.0, 102.0, 291.0, 259.0, 528.0]
2025-09-12 14:19:08,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 30 minutes, 24 seconds)
2025-09-12 14:30:18,017 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:30:18,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:32:33,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1644.15015 ± 1071.896
2025-09-12 14:32:33,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3126.9844, 1107.9174, 1118.4197, 52.184338, 512.38464, 2526.857, 2691.7869, 931.52014, 3171.176, 1202.2719]
2025-09-12 14:32:33,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 324.0, 336.0, 38.0, 180.0, 809.0, 863.0, 310.0, 1000.0, 351.0]
2025-09-12 14:32:33,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 18 minutes, 45 seconds)
2025-09-12 14:43:03,540 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:43:03,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:45:41,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1846.27515 ± 1119.011
2025-09-12 14:45:41,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [471.00687, 1985.8307, 1298.4934, 2355.1406, 2125.5337, 3174.0798, 611.2513, 79.27462, 3184.9783, 3177.1626]
2025-09-12 14:45:41,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [193.0, 650.0, 429.0, 752.0, 678.0, 1000.0, 215.0, 50.0, 1000.0, 1000.0]
2025-09-12 14:45:41,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 5 minutes, 44 seconds)
2025-09-12 14:57:06,201 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:57:06,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:59:50,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1946.67053 ± 1103.098
2025-09-12 14:59:50,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2255.0513, 1743.7008, 251.07628, 3167.4468, 3131.3699, 1771.3264, 2817.7944, 1170.5809, 3095.9346, 62.424652]
2025-09-12 14:59:50,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [698.0, 567.0, 102.0, 1000.0, 1000.0, 561.0, 867.0, 386.0, 1000.0, 47.0]
2025-09-12 14:59:50,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (1946.67) for latency ExtremeClogL1U23
2025-09-12 14:59:50,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 54 minutes, 7 seconds)
2025-09-12 15:11:15,202 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:11:15,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:13:16,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1460.07837 ± 986.482
2025-09-12 15:13:16,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1069.4167, 483.44586, 3196.043, 2015.8639, 1226.7812, 469.2191, 1164.6921, 3241.3853, 1255.8568, 478.07986]
2025-09-12 15:13:16,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [317.0, 170.0, 1000.0, 630.0, 397.0, 170.0, 357.0, 1000.0, 409.0, 169.0]
2025-09-12 15:13:16,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 40 minutes, 7 seconds)
2025-09-12 15:23:54,890 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:23:54,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:26:11,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1642.94336 ± 921.715
2025-09-12 15:26:11,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1682.194, 2405.7197, 565.20636, 3112.3901, 693.86847, 722.7618, 1327.6166, 1528.9755, 1165.4918, 3225.2083]
2025-09-12 15:26:11,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [506.0, 753.0, 209.0, 1000.0, 237.0, 260.0, 389.0, 489.0, 345.0, 1000.0]
2025-09-12 15:26:11,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 26 minutes, 49 seconds)
2025-09-12 15:36:39,141 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:36:39,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:38:48,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1542.27039 ± 1049.441
2025-09-12 15:38:48,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3121.8115, 2445.6252, 2622.6672, 111.77777, 1086.2688, 109.79289, 1215.333, 2741.2368, 809.10455, 1159.0857]
2025-09-12 15:38:48,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 735.0, 853.0, 88.0, 359.0, 63.0, 387.0, 843.0, 282.0, 402.0]
2025-09-12 15:38:48,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 13 minutes, 15 seconds)
2025-09-12 15:49:57,211 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:49:57,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:51:28,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1135.73987 ± 753.563
2025-09-12 15:51:28,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [81.38893, 961.8203, 897.34045, 1017.613, 1068.017, 1051.8119, 3234.6077, 960.83594, 1021.50836, 1062.454]
2025-09-12 15:51:28,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [65.0, 295.0, 273.0, 309.0, 342.0, 310.0, 975.0, 287.0, 308.0, 321.0]
2025-09-12 15:51:28,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1251 [DEBUG]: Training session finished
