2025-09-11 19:08:53,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc25-hopper/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 19:08:53,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc25-hopper/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 19:08:53,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x15374ba80c50>}
2025-09-11 19:08:53,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1111 [DEBUG]: using device: cuda
2025-09-11 19:08:53,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1133 [INFO]: Creating new trainer
2025-09-11 19:08:53,197 baseline-mbpac-noiseperc25-hopper:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-09-11 19:08:53,197 baseline-mbpac-noiseperc25-hopper:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 19:08:53,205 baseline-mbpac-noiseperc25-hopper:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=3, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-11 19:08:54,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1194 [DEBUG]: Starting training session...
2025-09-11 19:08:54,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 1/100
2025-09-11 19:18:41,793 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:18:41,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:18:53,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 76.53789 ± 53.766
2025-09-11 19:18:53,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [50.042595, 62.449085, 73.23123, 17.79042, 56.680893, 212.54765, 57.55937, 137.5843, 51.9727, 45.520645]
2025-09-11 19:18:53,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 37.0, 43.0, 22.0, 38.0, 101.0, 35.0, 70.0, 33.0, 27.0]
2025-09-11 19:18:53,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (76.54) for latency ExtremeClogL1U23
2025-09-11 19:18:53,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 16 hours, 29 minutes, 17 seconds)
2025-09-11 19:30:06,951 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:30:06,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:30:36,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 160.07962 ± 138.740
2025-09-11 19:30:36,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [78.21582, 9.478865, 160.97102, 282.16708, 37.712055, 278.96817, 424.70428, 38.78418, 14.721943, 275.07275]
2025-09-11 19:30:36,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [72.0, 14.0, 121.0, 197.0, 30.0, 175.0, 237.0, 29.0, 16.0, 184.0]
2025-09-11 19:30:36,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (160.08) for latency ExtremeClogL1U23
2025-09-11 19:30:36,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 17 hours, 43 minutes, 35 seconds)
2025-09-11 19:41:59,123 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:41:59,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:42:38,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 253.20171 ± 133.392
2025-09-11 19:42:38,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [124.493004, 321.30545, 137.93617, 235.09714, 621.898, 229.38115, 194.30977, 221.29874, 210.4863, 235.8114]
2025-09-11 19:42:38,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [81.0, 155.0, 86.0, 120.0, 484.0, 112.0, 97.0, 110.0, 102.0, 116.0]
2025-09-11 19:42:38,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (253.20) for latency ExtremeClogL1U23
2025-09-11 19:42:38,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 18 hours, 11 minutes, 6 seconds)
2025-09-11 19:53:58,746 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:53:58,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:54:20,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 164.47256 ± 125.252
2025-09-11 19:54:20,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [308.44666, 126.01592, 9.956921, 222.3732, 11.772764, 296.36087, 302.681, 73.248024, 7.838921, 286.0314]
2025-09-11 19:54:20,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 71.0, 15.0, 112.0, 14.0, 133.0, 149.0, 44.0, 11.0, 126.0]
2025-09-11 19:54:20,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 18 hours, 10 minutes, 37 seconds)
2025-09-11 20:05:32,658 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:05:32,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:06:05,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 260.95596 ± 115.641
2025-09-11 20:06:05,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [319.0818, 322.1537, 61.70247, 16.724035, 270.22583, 306.68585, 300.56973, 391.3003, 338.809, 282.30685]
2025-09-11 20:06:05,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [128.0, 118.0, 37.0, 17.0, 117.0, 140.0, 147.0, 245.0, 132.0, 126.0]
2025-09-11 20:06:05,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (260.96) for latency ExtremeClogL1U23
2025-09-11 20:06:05,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 18 hours, 6 minutes, 39 seconds)
2025-09-11 20:17:21,400 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:17:21,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:17:56,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 307.46796 ± 32.163
2025-09-11 20:17:56,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [323.30685, 307.76428, 323.8539, 332.90033, 316.51794, 309.85062, 318.32614, 307.3336, 213.72949, 321.09625]
2025-09-11 20:17:56,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [136.0, 137.0, 136.0, 136.0, 132.0, 131.0, 127.0, 125.0, 103.0, 131.0]
2025-09-11 20:17:56,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (307.47) for latency ExtremeClogL1U23
2025-09-11 20:17:56,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 18 hours, 29 minutes, 59 seconds)
2025-09-11 20:29:14,714 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:29:14,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:29:45,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 264.06940 ± 96.908
2025-09-11 20:29:45,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [356.02338, 119.977196, 350.638, 133.86937, 207.24469, 234.07362, 407.67737, 302.70917, 347.9146, 180.56636]
2025-09-11 20:29:45,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [141.0, 65.0, 139.0, 70.0, 102.0, 107.0, 179.0, 121.0, 131.0, 92.0]
2025-09-11 20:29:45,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 18 hours, 20 minutes, 12 seconds)
2025-09-11 20:40:45,036 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:40:45,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:41:19,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 316.02936 ± 105.576
2025-09-11 20:41:19,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [379.09637, 389.8412, 343.26135, 377.83972, 100.1005, 435.1043, 128.18993, 341.527, 342.25113, 323.08194]
2025-09-11 20:41:19,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [145.0, 148.0, 136.0, 146.0, 61.0, 213.0, 63.0, 129.0, 146.0, 120.0]
2025-09-11 20:41:19,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (316.03) for latency ExtremeClogL1U23
2025-09-11 20:41:19,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 17 hours, 59 minutes, 44 seconds)
2025-09-11 20:52:23,823 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:52:23,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:52:57,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 325.90790 ± 132.492
2025-09-11 20:52:57,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [395.9824, 61.64584, 407.9538, 397.68704, 400.31534, 402.76694, 378.23853, 61.97694, 375.69852, 376.81375]
2025-09-11 20:52:57,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [145.0, 38.0, 162.0, 153.0, 151.0, 152.0, 146.0, 40.0, 160.0, 146.0]
2025-09-11 20:52:57,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (325.91) for latency ExtremeClogL1U23
2025-09-11 20:52:57,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 17 hours, 46 minutes, 55 seconds)
2025-09-11 21:04:07,185 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:04:07,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:04:38,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 272.53314 ± 97.847
2025-09-11 21:04:38,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [374.66663, 294.23605, 128.27884, 255.94197, 273.70233, 231.55232, 386.70908, 288.7008, 399.62595, 91.917465]
2025-09-11 21:04:38,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [148.0, 129.0, 68.0, 117.0, 123.0, 103.0, 150.0, 122.0, 151.0, 51.0]
2025-09-11 21:04:38,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 17 hours, 33 minutes, 45 seconds)
2025-09-11 21:15:44,748 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:15:44,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:16:19,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 337.04272 ± 145.168
2025-09-11 21:16:19,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [372.64612, 523.6487, 397.92834, 325.31396, 24.812319, 417.0866, 375.67432, 441.75174, 383.2501, 108.31519]
2025-09-11 21:16:19,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [133.0, 179.0, 139.0, 120.0, 25.0, 171.0, 131.0, 219.0, 137.0, 57.0]
2025-09-11 21:16:19,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (337.04) for latency ExtremeClogL1U23
2025-09-11 21:16:19,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 17 hours, 19 minutes, 20 seconds)
2025-09-11 21:27:20,538 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:27:20,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:27:48,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 251.97128 ± 194.034
2025-09-11 21:27:48,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [53.194736, 581.14594, 378.86642, 271.98557, 17.364681, 8.470214, 544.5781, 251.20912, 269.79172, 143.10632]
2025-09-11 21:27:48,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [39.0, 188.0, 163.0, 113.0, 18.0, 15.0, 193.0, 117.0, 120.0, 75.0]
2025-09-11 21:27:48,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 17 hours, 1 minute, 36 seconds)
2025-09-11 21:38:58,099 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:38:58,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:39:38,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 359.98029 ± 240.523
2025-09-11 21:39:38,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [13.814107, 552.9185, 453.27344, 449.65134, 196.39052, 103.79843, 664.33453, 524.6069, 12.714765, 628.30035]
2025-09-11 21:39:38,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 188.0, 170.0, 217.0, 96.0, 96.0, 239.0, 180.0, 18.0, 284.0]
2025-09-11 21:39:38,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (359.98) for latency ExtremeClogL1U23
2025-09-11 21:39:38,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 16 hours, 54 minutes, 37 seconds)
2025-09-11 21:50:43,009 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:50:43,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:51:20,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 340.56708 ± 345.397
2025-09-11 21:51:20,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [114.41475, 588.9333, 183.95512, 1129.1941, 431.5545, 682.28076, 11.528026, 145.74104, 103.23396, 14.835149]
2025-09-11 21:51:20,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [76.0, 219.0, 99.0, 409.0, 190.0, 240.0, 14.0, 75.0, 60.0, 21.0]
2025-09-11 21:51:20,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 16 hours, 44 minutes, 3 seconds)
2025-09-11 22:02:25,656 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:02:25,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:03:09,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 418.46713 ± 326.122
2025-09-11 22:03:09,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [225.46144, 339.3587, 647.7348, 694.93445, 776.87805, 1020.87476, 219.3945, 16.83945, 15.533795, 227.66168]
2025-09-11 22:03:09,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [100.0, 146.0, 249.0, 283.0, 266.0, 336.0, 118.0, 20.0, 17.0, 110.0]
2025-09-11 22:03:09,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (418.47) for latency ExtremeClogL1U23
2025-09-11 22:03:09,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 16 hours, 34 minutes, 54 seconds)
2025-09-11 22:14:23,888 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:14:23,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:15:03,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 376.83029 ± 283.233
2025-09-11 22:15:03,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [140.53844, 777.76996, 80.31833, 669.7913, 353.76047, 775.40076, 436.30737, 472.99478, 23.5826, 37.83914]
2025-09-11 22:15:03,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [74.0, 251.0, 77.0, 220.0, 153.0, 246.0, 177.0, 187.0, 62.0, 48.0]
2025-09-11 22:15:03,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 16 hours, 26 minutes, 50 seconds)
2025-09-11 22:26:12,559 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:26:12,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:26:50,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 344.51105 ± 365.993
2025-09-11 22:26:50,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [239.42502, 1253.0265, 153.82423, 185.53629, 9.93727, 24.942926, 690.58215, 233.91386, 544.62775, 109.29443]
2025-09-11 22:26:50,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [107.0, 444.0, 115.0, 91.0, 15.0, 24.0, 229.0, 107.0, 218.0, 66.0]
2025-09-11 22:26:50,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 16 hours, 20 minutes, 1 second)
2025-09-11 22:38:18,061 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:38:18,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:39:02,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 441.42056 ± 304.741
2025-09-11 22:39:02,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [711.8543, 636.3186, 1047.1724, 616.0717, 164.65807, 339.8942, 10.678446, 532.314, 188.27249, 166.97127]
2025-09-11 22:39:02,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [233.0, 235.0, 334.0, 211.0, 84.0, 146.0, 18.0, 238.0, 89.0, 80.0]
2025-09-11 22:39:02,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (441.42) for latency ExtremeClogL1U23
2025-09-11 22:39:02,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 16 hours, 14 minutes, 12 seconds)
2025-09-11 22:50:02,631 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:50:02,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:50:57,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 580.92133 ± 523.644
2025-09-11 22:50:57,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [5.9315267, 1095.6736, 1498.089, 100.306244, 15.762339, 366.0963, 1059.102, 19.533882, 976.62036, 672.098]
2025-09-11 22:50:57,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [9.0, 381.0, 522.0, 57.0, 16.0, 151.0, 347.0, 20.0, 334.0, 212.0]
2025-09-11 22:50:57,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (580.92) for latency ExtremeClogL1U23
2025-09-11 22:50:57,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 16 hours, 5 minutes, 48 seconds)
2025-09-11 23:02:00,398 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:02:00,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:02:55,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 608.81097 ± 364.152
2025-09-11 23:02:55,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1092.9381, 963.0143, 832.3469, 844.7274, 289.9071, 232.44356, 923.275, 211.36076, 13.306104, 684.7907]
2025-09-11 23:02:55,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [366.0, 304.0, 275.0, 284.0, 121.0, 110.0, 296.0, 102.0, 15.0, 229.0]
2025-09-11 23:02:55,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (608.81) for latency ExtremeClogL1U23
2025-09-11 23:02:55,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 15 hours, 56 minutes, 15 seconds)
2025-09-11 23:14:05,447 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:14:05,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:14:53,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 504.78110 ± 380.703
2025-09-11 23:14:53,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [621.6635, 116.34484, 610.92487, 17.480808, 749.17645, 538.081, 486.72205, 21.01325, 1365.3727, 521.0319]
2025-09-11 23:14:53,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [207.0, 64.0, 226.0, 17.0, 248.0, 176.0, 200.0, 21.0, 457.0, 190.0]
2025-09-11 23:14:53,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 15 hours, 45 minutes, 15 seconds)
2025-09-11 23:26:02,490 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:26:02,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:26:40,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 337.18817 ± 355.878
2025-09-11 23:26:40,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [132.32796, 9.646895, 89.56909, 223.19905, 775.90625, 89.407394, 213.30553, 908.78406, 9.679937, 920.05536]
2025-09-11 23:26:40,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 12.0, 51.0, 145.0, 266.0, 52.0, 115.0, 341.0, 12.0, 321.0]
2025-09-11 23:26:40,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 15 hours, 33 minutes, 16 seconds)
2025-09-11 23:37:31,590 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:37:31,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:38:14,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 447.97870 ± 325.696
2025-09-11 23:38:14,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [573.57153, 10.148059, 611.4356, 132.23676, 11.432367, 902.668, 862.512, 198.0028, 755.54193, 422.23776]
2025-09-11 23:38:14,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [211.0, 15.0, 224.0, 67.0, 13.0, 299.0, 293.0, 94.0, 241.0, 179.0]
2025-09-11 23:38:14,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 15 hours, 11 minutes, 41 seconds)
2025-09-11 23:49:37,731 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:49:37,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:50:27,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 542.69073 ± 346.423
2025-09-11 23:50:27,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [848.0862, 906.8519, 209.29044, 12.779396, 798.15784, 94.80563, 825.83685, 203.76538, 872.48615, 654.8474]
2025-09-11 23:50:27,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [276.0, 297.0, 96.0, 20.0, 278.0, 57.0, 284.0, 97.0, 279.0, 209.0]
2025-09-11 23:50:27,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 15 hours, 4 minutes, 26 seconds)
2025-09-12 00:01:29,298 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:01:29,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:02:18,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 474.28485 ± 337.504
2025-09-12 00:02:18,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [586.2927, 1335.0211, 403.66757, 356.38904, 261.33472, 289.74298, 564.0982, 288.43497, 10.712964, 647.15454]
2025-09-12 00:02:18,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [223.0, 455.0, 183.0, 147.0, 117.0, 128.0, 211.0, 123.0, 16.0, 243.0]
2025-09-12 00:02:18,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 14 hours, 50 minutes, 46 seconds)
2025-09-12 00:13:18,893 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:13:18,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:14:15,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 561.77838 ± 486.474
2025-09-12 00:14:15,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [876.4495, 123.39536, 1298.5409, 12.920238, 390.14816, 209.46002, 171.95792, 1440.9279, 270.92957, 823.0541]
2025-09-12 00:14:15,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [317.0, 64.0, 448.0, 15.0, 157.0, 122.0, 85.0, 512.0, 121.0, 314.0]
2025-09-12 00:14:15,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 14 hours, 38 minutes, 33 seconds)
2025-09-12 00:25:17,565 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:25:17,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:26:15,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 599.40546 ± 456.030
2025-09-12 00:26:15,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [776.4601, 1014.5061, 170.86009, 18.461689, 9.787318, 979.5932, 1372.5497, 739.8836, 136.5945, 775.35815]
2025-09-12 00:26:15,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [258.0, 358.0, 85.0, 18.0, 13.0, 364.0, 505.0, 236.0, 71.0, 283.0]
2025-09-12 00:26:15,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 14 hours, 30 minutes, 8 seconds)
2025-09-12 00:37:20,266 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:37:20,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:38:01,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 410.41519 ± 328.353
2025-09-12 00:38:01,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [422.4496, 709.60876, 49.88688, 353.21484, 821.66473, 12.683106, 18.491526, 823.5317, 129.23193, 763.3886]
2025-09-12 00:38:01,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [165.0, 256.0, 38.0, 183.0, 300.0, 17.0, 20.0, 260.0, 67.0, 248.0]
2025-09-12 00:38:01,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 14 hours, 20 minutes, 47 seconds)
2025-09-12 00:49:03,530 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:49:03,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:49:58,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 593.90955 ± 633.971
2025-09-12 00:49:58,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [8.938935, 881.48724, 430.43213, 10.340249, 740.489, 220.21567, 467.54468, 237.38501, 639.33417, 2302.9287]
2025-09-12 00:49:58,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 289.0, 166.0, 12.0, 239.0, 108.0, 177.0, 111.0, 214.0, 744.0]
2025-09-12 00:49:58,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 14 hours, 5 minutes, 11 seconds)
2025-09-12 01:01:12,739 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:01:12,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:01:47,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 310.41614 ± 323.132
2025-09-12 01:01:47,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [20.26547, 885.19824, 597.8373, 352.5126, 13.888887, 335.63226, 788.7528, 17.43236, 10.997458, 81.64401]
2025-09-12 01:01:47,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 289.0, 229.0, 146.0, 19.0, 246.0, 285.0, 18.0, 14.0, 53.0]
2025-09-12 01:01:47,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 13 hours, 52 minutes, 47 seconds)
2025-09-12 01:13:06,811 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:13:06,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:13:53,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 487.31073 ± 636.010
2025-09-12 01:13:53,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [208.79236, 9.013067, 652.9671, 166.70007, 152.5892, 535.5701, 754.70416, 2241.3555, 135.43275, 15.983231]
2025-09-12 01:13:53,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [96.0, 18.0, 246.0, 80.0, 74.0, 206.0, 239.0, 730.0, 67.0, 18.0]
2025-09-12 01:13:53,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 13 hours, 43 minutes, 5 seconds)
2025-09-12 01:24:58,116 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:24:58,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:26:23,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 936.90710 ± 433.998
2025-09-12 01:26:23,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1844.039, 842.6538, 1034.7004, 839.61334, 198.79533, 827.57623, 604.37225, 801.5614, 1520.378, 855.3807]
2025-09-12 01:26:23,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [647.0, 271.0, 369.0, 267.0, 90.0, 301.0, 191.0, 260.0, 512.0, 285.0]
2025-09-12 01:26:23,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (936.91) for latency ExtremeClogL1U23
2025-09-12 01:26:23,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 13 hours, 37 minutes, 40 seconds)
2025-09-12 01:37:16,616 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:37:16,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:38:00,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 418.18158 ± 515.171
2025-09-12 01:38:00,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [49.082325, 339.247, 20.925032, 172.27917, 9.322246, 407.36954, 1646.5658, 11.076538, 1089.7463, 436.20172]
2025-09-12 01:38:00,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [46.0, 140.0, 24.0, 111.0, 11.0, 174.0, 601.0, 14.0, 356.0, 176.0]
2025-09-12 01:38:00,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 13 hours, 23 minutes, 48 seconds)
2025-09-12 01:49:14,640 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:49:14,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:49:34,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 160.36526 ± 156.445
2025-09-12 01:49:34,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [353.29453, 8.238121, 231.54706, 340.8089, 13.084582, 16.802671, 22.498466, 178.54866, 21.747904, 417.08176]
2025-09-12 01:49:34,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [155.0, 13.0, 135.0, 144.0, 15.0, 21.0, 22.0, 86.0, 22.0, 165.0]
2025-09-12 01:49:34,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 13 hours, 6 minutes, 42 seconds)
2025-09-12 02:00:25,488 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:00:25,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:01:08,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 441.08676 ± 358.196
2025-09-12 02:01:08,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [655.21655, 10.811455, 958.233, 782.4278, 848.2608, 138.76704, 196.14745, 13.551963, 116.19862, 691.25305]
2025-09-12 02:01:08,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [263.0, 13.0, 312.0, 266.0, 283.0, 71.0, 94.0, 17.0, 64.0, 227.0]
2025-09-12 02:01:08,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 12 hours, 51 minutes, 23 seconds)
2025-09-12 02:12:17,597 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:12:17,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:13:22,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 667.90149 ± 726.838
2025-09-12 02:13:22,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [651.50543, 2200.0059, 13.193234, 13.976969, 1595.7373, 1172.9857, 149.75272, 716.4998, 22.427721, 142.93015]
2025-09-12 02:13:22,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [252.0, 758.0, 15.0, 22.0, 535.0, 446.0, 76.0, 247.0, 22.0, 88.0]
2025-09-12 02:13:22,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 12 hours, 41 minutes, 18 seconds)
2025-09-12 02:24:23,277 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:24:23,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:25:31,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 687.33978 ± 491.356
2025-09-12 02:25:31,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [833.49976, 1216.9297, 4.9177265, 1064.4165, 332.29453, 1434.963, 176.70493, 70.33983, 633.3674, 1105.9647]
2025-09-12 02:25:31,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [306.0, 446.0, 13.0, 367.0, 146.0, 521.0, 87.0, 43.0, 253.0, 384.0]
2025-09-12 02:25:31,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 12 hours, 25 minutes, 7 seconds)
2025-09-12 02:36:56,077 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:36:56,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:37:41,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 441.43082 ± 378.983
2025-09-12 02:37:41,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [63.3419, 677.1013, 149.93764, 11.894393, 1067.8912, 169.80498, 643.013, 10.871963, 855.22876, 765.22296]
2025-09-12 02:37:41,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [38.0, 236.0, 75.0, 14.0, 354.0, 80.0, 243.0, 14.0, 360.0, 268.0]
2025-09-12 02:37:41,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 12 hours, 20 minutes)
2025-09-12 02:48:25,192 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:48:25,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:49:02,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 377.77106 ± 360.117
2025-09-12 02:49:02,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [10.213164, 845.6354, 359.67896, 17.509813, 10.269956, 683.1142, 875.5161, 783.6382, 14.675178, 177.45918]
2025-09-12 02:49:02,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 261.0, 147.0, 23.0, 16.0, 272.0, 306.0, 259.0, 19.0, 81.0]
2025-09-12 02:49:02,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 12 hours, 5 minutes, 20 seconds)
2025-09-12 03:00:13,314 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:00:13,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:00:38,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 262.80463 ± 358.632
2025-09-12 03:00:38,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [828.6232, 1040.9618, 272.86768, 12.344056, 66.24289, 12.601439, 356.26126, 12.900989, 17.726685, 7.516516]
2025-09-12 03:00:38,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [251.0, 322.0, 120.0, 13.0, 44.0, 41.0, 144.0, 15.0, 17.0, 10.0]
2025-09-12 03:00:38,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 11 hours, 54 minutes, 7 seconds)
2025-09-12 03:11:41,498 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:11:41,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:13:07,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 866.02325 ± 804.303
2025-09-12 03:13:07,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [62.804985, 875.5966, 1059.1959, 272.6472, 393.48022, 2038.6526, 897.4372, 154.64175, 303.94327, 2601.832]
2025-09-12 03:13:07,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [46.0, 321.0, 351.0, 125.0, 163.0, 710.0, 310.0, 77.0, 132.0, 1000.0]
2025-09-12 03:13:07,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 11 hours, 45 minutes, 2 seconds)
2025-09-12 03:24:15,650 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:24:15,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:24:59,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 417.40170 ± 241.781
2025-09-12 03:24:59,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [307.0196, 717.65393, 71.7207, 15.879293, 752.1772, 315.60315, 462.26266, 549.164, 647.48224, 335.05426]
2025-09-12 03:24:59,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [133.0, 239.0, 49.0, 20.0, 288.0, 133.0, 176.0, 231.0, 241.0, 145.0]
2025-09-12 03:24:59,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 11 hours, 29 minutes, 43 seconds)
2025-09-12 03:36:36,028 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:36:36,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:37:33,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 569.39539 ± 361.081
2025-09-12 03:37:33,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [569.72754, 18.978315, 494.0556, 746.06177, 293.85397, 880.60486, 792.0788, 175.88518, 1317.5656, 405.14282]
2025-09-12 03:37:33,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [216.0, 20.0, 192.0, 249.0, 140.0, 306.0, 303.0, 88.0, 472.0, 159.0]
2025-09-12 03:37:33,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 11 hours, 22 minutes, 33 seconds)
2025-09-12 03:48:18,489 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:48:18,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:48:59,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 393.34628 ± 390.192
2025-09-12 03:48:59,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [660.9996, 730.62976, 853.9888, 184.23413, 15.7325115, 9.300587, 1097.8684, 360.35703, 13.494691, 6.857349]
2025-09-12 03:48:59,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [216.0, 274.0, 314.0, 87.0, 24.0, 11.0, 429.0, 156.0, 21.0, 13.0]
2025-09-12 03:48:59,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 11 hours, 11 minutes, 28 seconds)
2025-09-12 03:59:54,569 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:59:54,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:00:51,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 614.21130 ± 652.653
2025-09-12 04:00:51,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [869.5835, 88.76394, 12.227238, 347.37888, 14.3857355, 11.730856, 729.32587, 2209.1714, 876.9072, 982.63806]
2025-09-12 04:00:51,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [288.0, 50.0, 17.0, 141.0, 19.0, 15.0, 241.0, 749.0, 286.0, 343.0]
2025-09-12 04:00:51,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 11 hours, 2 minutes, 18 seconds)
2025-09-12 04:12:04,876 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:12:04,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:12:27,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 198.06754 ± 421.746
2025-09-12 04:12:27,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [15.263786, 1452.5117, 63.227837, 14.936498, 174.61974, 100.77815, 15.031389, 125.32011, 8.376561, 10.60958]
2025-09-12 04:12:27,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 481.0, 38.0, 23.0, 138.0, 56.0, 16.0, 64.0, 14.0, 16.0]
2025-09-12 04:12:27,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 10 hours, 40 minutes, 48 seconds)
2025-09-12 04:23:31,143 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:23:31,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:24:21,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 495.57510 ± 706.412
2025-09-12 04:24:21,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [9.452956, 186.18466, 683.0525, 14.597062, 797.49603, 2462.263, 17.258333, 415.5523, 137.69595, 232.19824]
2025-09-12 04:24:21,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 87.0, 251.0, 16.0, 296.0, 869.0, 22.0, 179.0, 69.0, 106.0]
2025-09-12 04:24:21,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 10 hours, 29 minutes, 25 seconds)
2025-09-12 04:35:33,606 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:35:33,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:36:36,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 707.71448 ± 371.263
2025-09-12 04:36:36,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [885.6405, 1258.8025, 193.85391, 783.3236, 793.99567, 1004.4541, 994.85175, 765.03613, 390.42273, 6.7643056]
2025-09-12 04:36:36,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [295.0, 405.0, 91.0, 254.0, 255.0, 318.0, 349.0, 255.0, 148.0, 10.0]
2025-09-12 04:36:36,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 10 hours, 14 minutes, 10 seconds)
2025-09-12 04:47:23,798 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:47:23,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:48:17,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 566.40564 ± 328.113
2025-09-12 04:48:17,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [617.1505, 1160.5963, 395.75876, 110.03986, 832.0045, 394.00705, 948.1521, 99.6483, 454.96747, 651.73145]
2025-09-12 04:48:17,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [221.0, 365.0, 151.0, 60.0, 277.0, 159.0, 317.0, 70.0, 173.0, 259.0]
2025-09-12 04:48:17,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 10 hours, 4 minutes, 55 seconds)
2025-09-12 04:59:40,549 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:59:40,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:00:31,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 536.49548 ± 509.777
2025-09-12 05:00:31,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1223.7937, 1201.9387, 256.23477, 1253.3256, 6.312225, 97.95405, 9.151956, 408.74304, 847.20825, 60.293182]
2025-09-12 05:00:31,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [387.0, 395.0, 113.0, 408.0, 10.0, 54.0, 18.0, 179.0, 306.0, 36.0]
2025-09-12 05:00:31,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 9 hours, 56 minutes, 37 seconds)
2025-09-12 05:11:24,537 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:11:24,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:12:16,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 539.66602 ± 369.007
2025-09-12 05:12:16,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1201.8114, 14.881166, 879.78033, 462.6622, 495.59982, 6.385681, 355.94955, 342.26822, 894.4821, 742.8398]
2025-09-12 05:12:16,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [404.0, 18.0, 313.0, 177.0, 181.0, 9.0, 146.0, 152.0, 314.0, 253.0]
2025-09-12 05:12:16,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 9 hours, 46 minutes, 14 seconds)
2025-09-12 05:23:29,044 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:23:29,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:24:07,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 368.22943 ± 348.694
2025-09-12 05:24:07,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1101.8162, 504.9332, 302.1849, 9.662568, 14.656268, 283.6836, 174.75998, 111.047455, 279.77185, 899.7784]
2025-09-12 05:24:07,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [381.0, 195.0, 152.0, 11.0, 22.0, 134.0, 115.0, 60.0, 113.0, 275.0]
2025-09-12 05:24:07,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 9 hours, 33 minutes, 43 seconds)
2025-09-12 05:35:07,904 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:35:07,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:35:46,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 407.86737 ± 369.046
2025-09-12 05:35:46,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [15.580069, 904.7573, 391.7844, 760.1867, 53.08166, 8.3854, 14.737245, 293.09244, 666.386, 970.6823]
2025-09-12 05:35:46,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 290.0, 152.0, 251.0, 38.0, 17.0, 16.0, 134.0, 223.0, 309.0]
2025-09-12 05:35:46,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 9 hours, 16 minutes, 6 seconds)
2025-09-12 05:46:56,444 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:46:56,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:47:40,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 461.35400 ± 325.509
2025-09-12 05:47:40,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [613.50696, 828.9641, 339.43793, 11.118327, 17.593367, 409.10126, 768.5245, 67.26285, 651.41376, 906.6167]
2025-09-12 05:47:40,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [230.0, 276.0, 136.0, 14.0, 20.0, 158.0, 252.0, 41.0, 241.0, 327.0]
2025-09-12 05:47:40,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 9 hours, 6 minutes, 20 seconds)
2025-09-12 05:58:49,537 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:58:49,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:59:21,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 332.25958 ± 331.486
2025-09-12 05:59:21,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [604.99927, 229.07863, 10.123454, 170.573, 820.8967, 624.5614, 9.222917, 13.054346, 8.934678, 831.15137]
2025-09-12 05:59:21,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [215.0, 103.0, 13.0, 87.0, 260.0, 234.0, 12.0, 20.0, 11.0, 263.0]
2025-09-12 05:59:21,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 8 hours, 49 minutes, 31 seconds)
2025-09-12 06:10:36,576 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:10:36,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:11:47,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 737.39880 ± 642.438
2025-09-12 06:11:47,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [288.9565, 397.73428, 614.0666, 505.9007, 1116.3533, 208.14891, 1002.506, 2381.5818, 8.125168, 850.6155]
2025-09-12 06:11:47,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [127.0, 157.0, 227.0, 201.0, 404.0, 96.0, 348.0, 828.0, 15.0, 273.0]
2025-09-12 06:11:47,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 8 hours, 43 minutes, 44 seconds)
2025-09-12 06:22:35,620 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:22:35,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:23:35,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 673.65656 ± 235.646
2025-09-12 06:23:35,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [858.91644, 820.04443, 555.5598, 88.93941, 453.83942, 888.51654, 731.12946, 863.1974, 709.46423, 766.95825]
2025-09-12 06:23:35,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [267.0, 257.0, 238.0, 52.0, 167.0, 295.0, 232.0, 267.0, 245.0, 252.0]
2025-09-12 06:23:35,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 8 hours, 31 minutes, 23 seconds)
2025-09-12 06:34:41,273 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:34:41,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:35:22,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 446.00385 ± 388.220
2025-09-12 06:35:22,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [800.2747, 94.70949, 96.6736, 9.699333, 800.5174, 956.7455, 235.32965, 130.03769, 1065.8312, 270.2198]
2025-09-12 06:35:22,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [256.0, 52.0, 54.0, 12.0, 261.0, 296.0, 105.0, 71.0, 340.0, 113.0]
2025-09-12 06:35:22,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 8 hours, 20 minutes, 41 seconds)
2025-09-12 06:46:27,922 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:46:27,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:47:18,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 536.00946 ± 274.699
2025-09-12 06:47:18,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [573.849, 543.0132, 701.33966, 671.6738, 838.966, 540.3134, 152.23718, 98.599915, 257.53076, 982.57196]
2025-09-12 06:47:18,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [196.0, 206.0, 221.0, 246.0, 264.0, 176.0, 75.0, 82.0, 111.0, 355.0]
2025-09-12 06:47:18,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 8 minutes, 55 seconds)
2025-09-12 06:58:33,936 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:58:33,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:59:33,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 631.31921 ± 814.431
2025-09-12 06:59:33,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [111.279724, 8.014337, 663.1053, 2819.7998, 723.8707, 140.86383, 12.808458, 6.8060064, 875.6273, 951.0169]
2025-09-12 06:59:33,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [58.0, 12.0, 232.0, 1000.0, 236.0, 71.0, 15.0, 12.0, 310.0, 298.0]
2025-09-12 06:59:33,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 8 hours, 1 minute, 40 seconds)
2025-09-12 07:10:38,911 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:10:38,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:11:49,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 711.13959 ± 631.640
2025-09-12 07:11:49,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1835.1198, 1188.7903, 304.81387, 1546.3318, 1080.9126, 59.000053, 78.33008, 747.47394, 69.6504, 200.97325]
2025-09-12 07:11:49,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [652.0, 438.0, 127.0, 556.0, 390.0, 51.0, 54.0, 260.0, 42.0, 95.0]
2025-09-12 07:11:49,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 7 hours, 48 minutes, 16 seconds)
2025-09-12 07:22:48,612 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:22:48,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:23:47,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 628.31580 ± 641.586
2025-09-12 07:23:47,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [262.6005, 337.24393, 651.75586, 70.74431, 8.933313, 108.56681, 1125.5693, 818.0231, 2255.941, 643.7798]
2025-09-12 07:23:47,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [135.0, 142.0, 203.0, 45.0, 22.0, 73.0, 359.0, 272.0, 740.0, 239.0]
2025-09-12 07:23:47,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 7 hours, 37 minutes, 30 seconds)
2025-09-12 07:34:54,069 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:34:54,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:35:37,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 420.85434 ± 373.015
2025-09-12 07:35:37,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [399.75705, 26.678528, 1152.093, 897.9337, 697.3468, 82.82894, 65.90206, 146.45169, 564.7775, 174.77373]
2025-09-12 07:35:37,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [182.0, 50.0, 394.0, 296.0, 256.0, 64.0, 42.0, 78.0, 184.0, 80.0]
2025-09-12 07:35:37,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 7 hours, 25 minutes, 49 seconds)
2025-09-12 07:46:58,001 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:46:58,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:48:17,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 846.63348 ± 798.805
2025-09-12 07:48:17,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [458.49722, 284.07645, 1010.5881, 222.03197, 1180.0424, 721.8284, 11.875731, 975.63275, 2991.5498, 610.21204]
2025-09-12 07:48:17,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [176.0, 127.0, 346.0, 109.0, 418.0, 269.0, 17.0, 309.0, 1000.0, 208.0]
2025-09-12 07:48:17,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 19 minutes, 2 seconds)
2025-09-12 07:59:12,114 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:59:12,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:59:52,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 414.88461 ± 412.053
2025-09-12 07:59:52,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [717.33966, 7.588781, 708.33545, 8.437021, 9.437385, 1226.6049, 14.485319, 130.74684, 632.9223, 692.9484]
2025-09-12 07:59:52,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [265.0, 10.0, 240.0, 13.0, 17.0, 427.0, 15.0, 73.0, 233.0, 223.0]
2025-09-12 07:59:52,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 2 minutes, 10 seconds)
2025-09-12 08:11:02,755 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:11:02,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:12:05,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 698.43561 ± 578.408
2025-09-12 08:12:05,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [278.44238, 618.808, 319.99792, 2091.9678, 765.16693, 1280.5082, 763.8685, 662.22003, 10.570535, 192.80559]
2025-09-12 08:12:05,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [124.0, 223.0, 140.0, 655.0, 248.0, 398.0, 245.0, 226.0, 17.0, 120.0]
2025-09-12 08:12:05,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 6 hours, 49 minutes, 49 seconds)
2025-09-12 08:23:16,447 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:23:16,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:24:01,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 443.85614 ± 411.879
2025-09-12 08:24:01,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [656.1179, 437.24835, 574.2972, 11.791831, 23.975925, 690.3618, 12.427653, 1324.0217, 12.349165, 695.9701]
2025-09-12 08:24:01,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [230.0, 163.0, 235.0, 15.0, 55.0, 258.0, 15.0, 456.0, 17.0, 233.0]
2025-09-12 08:24:01,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 6 hours, 37 minutes, 32 seconds)
2025-09-12 08:34:59,503 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:34:59,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:36:32,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 1015.76190 ± 640.765
2025-09-12 08:36:32,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [805.1754, 1319.333, 779.4935, 218.13637, 1184.212, 2274.4817, 1448.3925, 576.64514, 1540.5415, 11.208969]
2025-09-12 08:36:32,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [265.0, 408.0, 264.0, 99.0, 387.0, 790.0, 506.0, 224.0, 533.0, 18.0]
2025-09-12 08:36:32,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (1015.76) for latency ExtremeClogL1U23
2025-09-12 08:36:32,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 6 hours, 29 minutes, 54 seconds)
2025-09-12 08:47:45,740 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:47:45,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:48:44,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 647.91736 ± 592.054
2025-09-12 08:48:44,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [77.260666, 72.12608, 2177.9368, 775.0172, 828.92645, 736.2744, 533.34454, 757.1655, 510.8176, 10.304859]
2025-09-12 08:48:44,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [50.0, 59.0, 696.0, 252.0, 258.0, 239.0, 198.0, 263.0, 204.0, 18.0]
2025-09-12 08:48:44,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 14 minutes, 51 seconds)
2025-09-12 08:59:43,446 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:59:43,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:00:38,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 586.26941 ± 426.320
2025-09-12 09:00:38,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [774.46747, 684.9697, 990.32153, 121.96964, 477.84113, 8.070429, 107.11457, 669.5206, 1485.7179, 542.70135]
2025-09-12 09:00:38,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [242.0, 258.0, 316.0, 64.0, 172.0, 10.0, 59.0, 290.0, 496.0, 199.0]
2025-09-12 09:00:38,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 4 minutes, 35 seconds)
2025-09-12 09:12:06,288 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:12:06,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:13:03,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 598.89844 ± 662.189
2025-09-12 09:13:03,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [800.60956, 887.65796, 963.1954, 59.570763, 7.904148, 11.683637, 156.35745, 17.378216, 898.45294, 2186.1746]
2025-09-12 09:13:03,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [256.0, 307.0, 312.0, 36.0, 10.0, 16.0, 83.0, 20.0, 319.0, 784.0]
2025-09-12 09:13:03,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 5 hours, 53 minutes, 33 seconds)
2025-09-12 09:24:03,379 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:24:03,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:24:56,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 567.86072 ± 657.456
2025-09-12 09:24:56,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [15.137076, 622.38837, 95.4568, 636.0927, 2323.038, 13.22949, 688.53595, 463.5619, 8.707155, 812.4596]
2025-09-12 09:24:56,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 225.0, 53.0, 214.0, 799.0, 15.0, 219.0, 181.0, 34.0, 250.0]
2025-09-12 09:24:56,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 5 hours, 41 minutes, 10 seconds)
2025-09-12 09:36:38,754 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:36:38,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:38:14,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 1006.61749 ± 776.274
2025-09-12 09:38:14,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [789.10187, 383.56934, 13.2104845, 1805.1492, 13.795591, 667.1258, 806.8574, 1633.4514, 2500.9375, 1452.9762]
2025-09-12 09:38:14,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [251.0, 160.0, 19.0, 641.0, 23.0, 297.0, 297.0, 562.0, 855.0, 529.0]
2025-09-12 09:38:14,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 33 minutes, 8 seconds)
2025-09-12 09:48:26,416 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:48:26,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:49:26,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 638.89807 ± 565.653
2025-09-12 09:49:26,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [11.885262, 1606.2726, 566.85175, 599.9409, 11.448988, 309.22037, 400.9973, 237.2219, 999.4383, 1645.7029]
2025-09-12 09:49:26,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 555.0, 186.0, 214.0, 17.0, 134.0, 203.0, 103.0, 301.0, 542.0]
2025-09-12 09:49:26,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 15 minutes, 38 seconds)
2025-09-12 10:00:41,356 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:00:41,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:01:39,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 625.94000 ± 848.305
2025-09-12 10:01:39,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1951.4683, 12.159995, 7.516465, 13.385206, 1479.3263, 2240.05, 137.07457, 195.71725, 17.826519, 204.87495]
2025-09-12 10:01:39,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [666.0, 17.0, 12.0, 15.0, 461.0, 709.0, 73.0, 91.0, 20.0, 127.0]
2025-09-12 10:01:39,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 5 minutes, 7 seconds)
2025-09-12 10:12:42,588 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:12:42,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:13:31,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 481.72858 ± 624.546
2025-09-12 10:13:31,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1038.5006, 10.263561, 434.88416, 11.396823, 19.991272, 310.26822, 919.9601, 61.799946, 8.603117, 2001.618]
2025-09-12 10:13:31,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [365.0, 14.0, 171.0, 14.0, 20.0, 156.0, 343.0, 43.0, 12.0, 716.0]
2025-09-12 10:13:31,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 4 hours, 50 minutes, 15 seconds)
2025-09-12 10:24:37,736 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:24:37,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:25:26,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 456.24554 ± 566.941
2025-09-12 10:25:26,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [756.5143, 13.085653, 112.726395, 31.438684, 12.44994, 630.406, 241.59117, 1918.9034, 749.9449, 95.39501]
2025-09-12 10:25:26,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [274.0, 16.0, 83.0, 87.0, 18.0, 236.0, 106.0, 672.0, 271.0, 52.0]
2025-09-12 10:25:26,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 38 minutes, 15 seconds)
2025-09-12 10:36:36,045 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:36:36,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:37:23,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 463.33575 ± 623.767
2025-09-12 10:37:23,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [285.86343, 17.00338, 1388.0701, 88.79038, 333.90778, 24.14789, 1938.5421, 7.9040785, 265.28506, 283.8431]
2025-09-12 10:37:23,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [122.0, 22.0, 500.0, 50.0, 129.0, 24.0, 669.0, 35.0, 110.0, 138.0]
2025-09-12 10:37:23,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 20 minutes, 15 seconds)
2025-09-12 10:48:21,508 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:48:21,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:48:55,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 309.01605 ± 365.220
2025-09-12 10:48:55,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [17.755823, 1184.0969, 19.183907, 742.8219, 132.12344, 286.52853, 102.00333, 10.656729, 143.98491, 451.00516]
2025-09-12 10:48:55,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [37.0, 431.0, 24.0, 296.0, 71.0, 122.0, 56.0, 17.0, 73.0, 174.0]
2025-09-12 10:48:55,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 9 minutes, 50 seconds)
2025-09-12 11:00:25,592 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:00:25,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:01:12,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 462.61133 ± 466.806
2025-09-12 11:01:12,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [327.09964, 382.23572, 167.2392, 402.29745, 710.61, 20.240646, 10.760485, 1328.7981, 1266.8706, 9.961403]
2025-09-12 11:01:12,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 151.0, 84.0, 157.0, 261.0, 20.0, 16.0, 473.0, 436.0, 14.0]
2025-09-12 11:01:12,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 3 hours, 58 minutes, 8 seconds)
2025-09-12 11:11:58,846 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:11:58,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:12:57,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 598.53656 ± 385.958
2025-09-12 11:12:57,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [390.50903, 10.918486, 544.2405, 607.8791, 1384.7809, 665.0784, 1121.5262, 661.96826, 202.49265, 395.9715]
2025-09-12 11:12:57,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [162.0, 15.0, 195.0, 228.0, 442.0, 246.0, 393.0, 246.0, 104.0, 160.0]
2025-09-12 11:12:57,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 3 hours, 45 minutes, 49 seconds)
2025-09-12 11:24:09,891 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:24:09,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:25:07,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 570.48114 ± 522.227
2025-09-12 11:25:07,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [113.8442, 1783.2699, 275.46106, 16.014957, 615.4083, 8.908052, 992.25024, 357.40115, 897.95197, 644.3017]
2025-09-12 11:25:07,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [64.0, 646.0, 170.0, 20.0, 223.0, 14.0, 348.0, 176.0, 281.0, 239.0]
2025-09-12 11:25:07,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 34 minutes, 53 seconds)
2025-09-12 11:36:13,660 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:36:13,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:37:26,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 762.21204 ± 790.467
2025-09-12 11:37:26,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1157.9568, 202.42607, 2858.6624, 754.91583, 14.21443, 436.68256, 982.1865, 223.54579, 840.1742, 151.35574]
2025-09-12 11:37:26,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [366.0, 93.0, 1000.0, 267.0, 21.0, 173.0, 345.0, 100.0, 270.0, 76.0]
2025-09-12 11:37:26,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 24 minutes, 9 seconds)
2025-09-12 11:48:38,351 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:48:38,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:49:29,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 510.96445 ± 445.030
2025-09-12 11:49:29,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [72.553154, 61.32062, 1002.98834, 1117.2551, 1185.5786, 149.1445, 461.32742, 24.708044, 242.20667, 792.5622]
2025-09-12 11:49:29,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [48.0, 43.0, 388.0, 405.0, 369.0, 73.0, 179.0, 21.0, 109.0, 267.0]
2025-09-12 11:49:29,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 13 minutes, 46 seconds)
2025-09-12 12:00:34,803 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:00:34,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:01:42,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 748.79572 ± 475.702
2025-09-12 12:01:42,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [865.19653, 1812.5374, 1043.549, 9.824244, 771.6477, 576.0116, 578.0335, 676.84814, 1002.14514, 152.16394]
2025-09-12 12:01:42,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [293.0, 584.0, 367.0, 12.0, 258.0, 201.0, 217.0, 224.0, 341.0, 76.0]
2025-09-12 12:01:42,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 1 minute, 30 seconds)
2025-09-12 12:12:44,830 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:12:44,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:13:43,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 616.47845 ± 366.981
2025-09-12 12:13:43,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [268.84326, 885.8181, 791.5976, 94.98168, 850.42224, 1062.0161, 1161.2615, 512.78937, 113.901215, 423.15314]
2025-09-12 12:13:43,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [115.0, 322.0, 245.0, 55.0, 274.0, 365.0, 404.0, 196.0, 64.0, 164.0]
2025-09-12 12:13:43,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 50 minutes, 9 seconds)
2025-09-12 12:24:46,650 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:24:46,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:25:48,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 636.84924 ± 812.906
2025-09-12 12:25:48,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [8.774259, 152.16582, 835.4214, 1204.2073, 685.60834, 119.33966, 11.925952, 10.781707, 568.0016, 2772.2666]
2025-09-12 12:25:48,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 119.0, 296.0, 417.0, 243.0, 78.0, 19.0, 12.0, 223.0, 907.0]
2025-09-12 12:25:48,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 37 minutes, 46 seconds)
2025-09-12 12:36:53,884 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:36:53,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:38:02,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 753.36072 ± 628.851
2025-09-12 12:38:02,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1779.8754, 560.28375, 83.9429, 384.72903, 1572.17, 34.125824, 668.9713, 830.4451, 77.27763, 1541.7855]
2025-09-12 12:38:02,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [612.0, 214.0, 52.0, 158.0, 505.0, 29.0, 234.0, 266.0, 49.0, 476.0]
2025-09-12 12:38:03,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 25 minutes, 28 seconds)
2025-09-12 12:49:07,216 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:49:07,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:50:06,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 630.09021 ± 722.560
2025-09-12 12:50:06,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1207.4574, 314.6624, 8.060792, 10.754735, 11.8185005, 621.94836, 311.58444, 711.99036, 2507.8179, 594.8073]
2025-09-12 12:50:06,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [400.0, 132.0, 11.0, 14.0, 14.0, 232.0, 134.0, 259.0, 840.0, 225.0]
2025-09-12 12:50:06,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 13 minutes, 21 seconds)
2025-09-12 13:01:15,539 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:01:15,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:02:00,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 475.47379 ± 569.036
2025-09-12 13:02:00,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [145.68033, 11.478881, 538.0778, 1629.6879, 1132.4016, 1105.5997, 164.5308, 9.899038, 9.746609, 7.63486]
2025-09-12 13:02:00,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [73.0, 13.0, 173.0, 541.0, 417.0, 353.0, 78.0, 16.0, 15.0, 16.0]
2025-09-12 13:02:00,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 36 seconds)
2025-09-12 13:12:51,134 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:12:51,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:13:48,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 597.51160 ± 595.736
2025-09-12 13:13:48,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1042.6344, 2045.8557, 8.181837, 403.12897, 736.5619, 11.714975, 826.30066, 619.6611, 8.0442295, 273.03152]
2025-09-12 13:13:48,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [371.0, 730.0, 12.0, 160.0, 247.0, 16.0, 256.0, 235.0, 15.0, 120.0]
2025-09-12 13:13:48,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 48 minutes, 8 seconds)
2025-09-12 13:24:58,314 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:24:58,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:25:49,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 553.37189 ± 479.899
2025-09-12 13:25:49,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [798.53217, 146.6312, 854.8293, 313.00375, 10.296014, 8.73039, 1097.8011, 321.6533, 1540.6313, 441.61053]
2025-09-12 13:25:49,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [294.0, 87.0, 261.0, 138.0, 13.0, 14.0, 348.0, 135.0, 492.0, 160.0]
2025-09-12 13:25:49,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 36 minutes, 1 second)
2025-09-12 13:36:50,382 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:36:50,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:37:32,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 443.41153 ± 381.867
2025-09-12 13:37:32,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [8.294626, 22.428038, 9.261828, 510.5316, 1088.9955, 262.65912, 558.9113, 293.9833, 616.19995, 1062.8497]
2025-09-12 13:37:32,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 20.0, 11.0, 208.0, 337.0, 114.0, 196.0, 156.0, 224.0, 329.0]
2025-09-12 13:37:32,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 23 minutes, 17 seconds)
2025-09-12 13:49:05,689 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:49:05,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:50:06,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 623.81830 ± 457.621
2025-09-12 13:50:06,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [298.06815, 793.33966, 1225.7316, 284.01804, 12.909438, 213.78217, 871.2282, 203.98503, 931.5116, 1403.609]
2025-09-12 13:50:06,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [153.0, 301.0, 417.0, 124.0, 16.0, 98.0, 291.0, 99.0, 305.0, 483.0]
2025-09-12 13:50:06,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 11 minutes, 59 seconds)
2025-09-12 14:00:54,691 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:00:54,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:02:18,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 927.33484 ± 633.466
2025-09-12 14:02:18,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1042.0459, 1379.1881, 1795.1212, 1436.3779, 80.082535, 829.86566, 1784.5924, 592.9165, 327.4048, 5.754587]
2025-09-12 14:02:18,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [332.0, 436.0, 602.0, 470.0, 48.0, 269.0, 612.0, 218.0, 135.0, 9.0]
2025-09-12 14:02:18,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 17 seconds)
2025-09-12 14:13:19,591 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:13:19,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:14:38,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 865.15704 ± 469.125
2025-09-12 14:14:38,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [906.18085, 1321.883, 641.2222, 13.0160885, 718.71924, 873.729, 313.2063, 1115.2701, 982.1952, 1766.1477]
2025-09-12 14:14:38,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [272.0, 426.0, 237.0, 14.0, 278.0, 306.0, 126.0, 414.0, 337.0, 571.0]
2025-09-12 14:14:38,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 48 minutes, 40 seconds)
2025-09-12 14:25:32,969 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:25:32,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:26:00,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 277.61670 ± 467.732
2025-09-12 14:26:00,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1390.1647, 132.14264, 12.7748375, 16.936852, 110.04586, 19.2437, 995.1606, 19.905174, 67.91823, 11.874374]
2025-09-12 14:26:00,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [437.0, 71.0, 21.0, 19.0, 67.0, 20.0, 348.0, 21.0, 49.0, 18.0]
2025-09-12 14:26:00,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 36 minutes, 6 seconds)
2025-09-12 14:37:53,169 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:37:53,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:38:36,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 417.52481 ± 383.822
2025-09-12 14:38:36,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [436.26727, 393.6457, 72.28202, 9.20245, 78.03972, 222.56783, 631.3791, 1409.2446, 464.07977, 458.5394]
2025-09-12 14:38:36,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [171.0, 154.0, 47.0, 12.0, 47.0, 100.0, 257.0, 501.0, 176.0, 177.0]
2025-09-12 14:38:36,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 24 minutes, 25 seconds)
2025-09-12 14:48:59,927 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:48:59,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:49:52,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 553.45807 ± 590.296
2025-09-12 14:49:52,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [200.59914, 1663.3209, 72.97101, 15.173573, 9.940619, 52.995018, 1083.7654, 1188.5996, 195.1107, 1052.1045]
2025-09-12 14:49:52,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [90.0, 537.0, 43.0, 16.0, 15.0, 95.0, 384.0, 407.0, 88.0, 315.0]
2025-09-12 14:49:52,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 11 minutes, 57 seconds)
2025-09-12 15:01:50,939 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:01:50,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:03:29,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 1057.21289 ± 949.422
2025-09-12 15:03:29,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [2646.7542, 884.53674, 277.20688, 182.17793, 907.1728, 637.8126, 42.07964, 2744.403, 1831.6638, 418.32162]
2025-09-12 15:03:29,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [852.0, 325.0, 123.0, 103.0, 296.0, 242.0, 36.0, 947.0, 585.0, 166.0]
2025-09-12 15:03:29,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (1057.21) for latency ExtremeClogL1U23
2025-09-12 15:03:29,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1251 [DEBUG]: Training session finished
