2025-09-11 18:54:10,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc10-hopper/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 18:54:10,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc10-hopper/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 18:54:10,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1499f21f7290>}
2025-09-11 18:54:10,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1111 [DEBUG]: using device: cuda
2025-09-11 18:54:10,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1133 [INFO]: Creating new trainer
2025-09-11 18:54:10,273 baseline-mbpac-noiseperc10-hopper:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-09-11 18:54:10,273 baseline-mbpac-noiseperc10-hopper:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 18:54:10,281 baseline-mbpac-noiseperc10-hopper:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=3, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-11 18:54:11,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1194 [DEBUG]: Starting training session...
2025-09-11 18:54:11,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 1/100
2025-09-11 19:04:08,042 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:04:08,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:04:20,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 76.80034 ± 18.439
2025-09-11 19:04:20,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [96.95235, 94.76558, 39.863316, 82.3431, 98.1161, 71.95507, 93.30418, 63.465508, 64.093994, 63.144142]
2025-09-11 19:04:20,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [56.0, 55.0, 23.0, 47.0, 56.0, 41.0, 54.0, 36.0, 37.0, 36.0]
2025-09-11 19:04:20,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (76.80) for latency ExtremeClogL1U23
2025-09-11 19:04:20,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 16 hours, 44 minutes, 59 seconds)
2025-09-11 19:15:48,049 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:15:48,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:16:39,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 324.27164 ± 217.510
2025-09-11 19:16:39,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [448.88742, 867.29944, 92.73818, 257.84937, 335.47955, 48.93963, 364.72446, 244.83324, 387.3744, 194.59073]
2025-09-11 19:16:39,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [233.0, 569.0, 68.0, 156.0, 147.0, 28.0, 215.0, 117.0, 272.0, 95.0]
2025-09-11 19:16:39,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (324.27) for latency ExtremeClogL1U23
2025-09-11 19:16:39,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 18 hours, 21 minutes, 30 seconds)
2025-09-11 19:27:49,479 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:27:49,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:28:25,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 255.51738 ± 126.142
2025-09-11 19:28:25,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [201.55177, 321.5292, 321.84573, 281.67892, 346.73938, 386.068, 308.2469, 346.27365, 24.39551, 16.84471]
2025-09-11 19:28:25,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 169.0, 159.0, 130.0, 180.0, 227.0, 136.0, 192.0, 21.0, 20.0]
2025-09-11 19:28:25,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 18 hours, 26 minutes, 58 seconds)
2025-09-11 19:39:39,094 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:39:39,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:40:09,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 218.36856 ± 61.819
2025-09-11 19:40:09,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [193.80075, 215.58784, 389.00116, 230.75818, 183.98987, 244.14679, 208.90431, 175.51456, 164.89435, 177.08774]
2025-09-11 19:40:09,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 124.0, 230.0, 107.0, 90.0, 110.0, 90.0, 87.0, 81.0, 85.0]
2025-09-11 19:40:09,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 18 hours, 23 minutes, 9 seconds)
2025-09-11 19:52:18,122 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:52:18,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:55:24,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 742.48694 ± 402.936
2025-09-11 19:55:24,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1071.1693, 1041.9515, 1044.8613, 1065.965, 910.7732, 824.75867, 215.57582, 18.143173, 191.38702, 1040.2837]
2025-09-11 19:55:24,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 874.0, 1000.0, 1000.0, 862.0, 776.0, 174.0, 22.0, 156.0, 1000.0]
2025-09-11 19:55:24,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (742.49) for latency ExtremeClogL1U23
2025-09-11 19:55:24,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 19 hours, 23 minutes, 9 seconds)
2025-09-11 20:06:13,069 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:06:13,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:07:07,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 332.19464 ± 224.597
2025-09-11 20:07:07,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [616.84937, 20.010035, 79.64734, 82.44789, 404.5188, 229.08594, 389.87997, 561.8297, 681.59827, 256.07907]
2025-09-11 20:07:07,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [364.0, 22.0, 55.0, 52.0, 212.0, 166.0, 165.0, 329.0, 456.0, 188.0]
2025-09-11 20:07:07,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 19 hours, 40 minutes, 24 seconds)
2025-09-11 20:18:04,290 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:18:04,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:18:52,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 329.00336 ± 150.623
2025-09-11 20:18:52,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [447.21606, 361.90137, 414.74216, 169.88704, 379.5955, 310.41507, 51.186947, 508.6087, 140.57129, 505.90942]
2025-09-11 20:18:52,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [228.0, 135.0, 206.0, 115.0, 219.0, 188.0, 37.0, 289.0, 93.0, 261.0]
2025-09-11 20:18:52,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 19 hours, 17 minutes, 7 seconds)
2025-09-11 20:30:17,972 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:30:17,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:31:09,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 439.71826 ± 354.335
2025-09-11 20:31:09,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [633.8459, 105.39801, 1038.0247, 310.56598, 53.457478, 33.69113, 777.6751, 330.8038, 193.6463, 920.0743]
2025-09-11 20:31:09,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [237.0, 65.0, 385.0, 177.0, 51.0, 32.0, 281.0, 154.0, 120.0, 375.0]
2025-09-11 20:31:09,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 19 hours, 14 minutes, 26 seconds)
2025-09-11 20:42:29,773 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:42:29,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:43:15,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 463.93140 ± 226.587
2025-09-11 20:43:15,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [673.9175, 101.293526, 409.45978, 728.36145, 204.048, 556.8264, 128.79433, 532.5565, 630.0075, 674.0491]
2025-09-11 20:43:15,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [228.0, 58.0, 168.0, 251.0, 93.0, 190.0, 70.0, 178.0, 205.0, 233.0]
2025-09-11 20:43:15,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 19 hours, 8 minutes, 31 seconds)
2025-09-11 20:54:42,287 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:54:42,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:55:36,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 522.27094 ± 280.254
2025-09-11 20:55:36,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [590.63214, 20.082556, 666.76215, 358.00674, 64.02496, 879.9062, 475.51077, 833.3982, 637.109, 697.2767]
2025-09-11 20:55:36,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [233.0, 24.0, 249.0, 173.0, 43.0, 307.0, 218.0, 299.0, 223.0, 242.0]
2025-09-11 20:55:36,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 18 hours, 3 minutes, 43 seconds)
2025-09-11 21:07:15,878 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:07:15,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:08:42,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 889.56854 ± 405.830
2025-09-11 21:08:42,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [377.25128, 1191.5192, 718.4897, 44.72441, 772.341, 982.1654, 1160.5266, 944.69586, 1461.237, 1242.7355]
2025-09-11 21:08:42,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [153.0, 424.0, 259.0, 41.0, 283.0, 341.0, 363.0, 340.0, 498.0, 423.0]
2025-09-11 21:08:42,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (889.57) for latency ExtremeClogL1U23
2025-09-11 21:08:42,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 18 hours, 16 minutes, 11 seconds)
2025-09-11 21:19:52,286 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:19:52,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:21:31,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1012.53271 ± 687.188
2025-09-11 21:21:31,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [2485.8706, 1116.9875, 1893.169, 654.97546, 837.8283, 1147.0513, 984.69995, 41.254513, 683.02014, 280.46936]
2025-09-11 21:21:31,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [877.0, 356.0, 628.0, 281.0, 301.0, 396.0, 373.0, 30.0, 279.0, 125.0]
2025-09-11 21:21:31,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (1012.53) for latency ExtremeClogL1U23
2025-09-11 21:21:31,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 18 hours, 22 minutes, 28 seconds)
2025-09-11 21:32:42,587 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:32:42,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:34:08,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 933.29218 ± 214.732
2025-09-11 21:34:08,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [963.5779, 1283.3506, 1003.44696, 697.8988, 724.40643, 598.36566, 979.56085, 1269.7764, 954.6268, 857.91125]
2025-09-11 21:34:08,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [338.0, 418.0, 335.0, 259.0, 236.0, 227.0, 324.0, 420.0, 305.0, 278.0]
2025-09-11 21:34:08,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 18 hours, 15 minutes, 54 seconds)
2025-09-11 21:45:59,614 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:45:59,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:47:14,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 734.55322 ± 904.846
2025-09-11 21:47:14,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [185.20035, 80.1368, 18.458565, 113.44308, 497.55276, 2875.214, 1917.2476, 645.12823, 54.058605, 959.09216]
2025-09-11 21:47:14,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [87.0, 49.0, 26.0, 79.0, 202.0, 1000.0, 685.0, 271.0, 38.0, 347.0]
2025-09-11 21:47:14,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 18 hours, 20 minutes, 38 seconds)
2025-09-11 21:58:03,738 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:58:03,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:00:39,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1579.28467 ± 949.215
2025-09-11 22:00:39,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [974.2365, 1343.614, 2844.8015, 2933.8958, 702.1671, 738.61884, 372.17218, 2875.1665, 990.19604, 2017.9789]
2025-09-11 22:00:39,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [376.0, 490.0, 1000.0, 1000.0, 260.0, 291.0, 155.0, 954.0, 345.0, 753.0]
2025-09-11 22:00:39,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (1579.28) for latency ExtremeClogL1U23
2025-09-11 22:00:39,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 18 hours, 25 minutes, 39 seconds)
2025-09-11 22:12:09,426 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:12:09,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:14:21,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1328.25537 ± 1093.750
2025-09-11 22:14:21,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [523.9777, 1589.4644, 2841.3464, 2829.1846, 145.41406, 15.228484, 787.27826, 337.40625, 1366.2672, 2846.9856]
2025-09-11 22:14:21,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [218.0, 571.0, 1000.0, 1000.0, 75.0, 18.0, 309.0, 143.0, 488.0, 1000.0]
2025-09-11 22:14:21,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 18 hours, 22 minutes, 47 seconds)
2025-09-11 22:25:39,968 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:25:39,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:27:45,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1252.08618 ± 1006.719
2025-09-11 22:27:45,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1148.4828, 976.50946, 24.922304, 2756.102, 1928.485, 226.39958, 47.282543, 2806.493, 633.1412, 1973.044]
2025-09-11 22:27:45,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [414.0, 380.0, 21.0, 1000.0, 684.0, 112.0, 33.0, 1000.0, 257.0, 709.0]
2025-09-11 22:27:45,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 18 hours, 19 minutes, 36 seconds)
2025-09-11 22:39:35,674 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:39:35,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:41:14,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1016.37482 ± 1242.196
2025-09-11 22:41:14,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [190.19179, 405.4888, 254.69157, 3016.8564, 75.62775, 2733.5002, 23.483782, 2962.7842, 327.56912, 173.55336]
2025-09-11 22:41:14,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 166.0, 139.0, 1000.0, 56.0, 920.0, 33.0, 1000.0, 147.0, 88.0]
2025-09-11 22:41:14,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 18 hours, 20 minutes, 22 seconds)
2025-09-11 22:52:09,027 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:52:09,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:53:25,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 749.44110 ± 552.223
2025-09-11 22:53:25,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [454.05283, 145.9214, 890.9691, 1385.6282, 91.480194, 1849.8325, 291.68442, 798.908, 396.9245, 1189.01]
2025-09-11 22:53:25,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [160.0, 76.0, 333.0, 468.0, 62.0, 630.0, 152.0, 302.0, 202.0, 421.0]
2025-09-11 22:53:25,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 17 hours, 51 minutes, 56 seconds)
2025-09-11 23:04:29,809 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:04:29,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:06:11,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1008.18750 ± 777.695
2025-09-11 23:06:11,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [165.4092, 1475.5742, 757.2968, 558.2766, 2993.1936, 653.665, 1300.2826, 187.46082, 1115.4006, 875.31573]
2025-09-11 23:06:11,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [87.0, 531.0, 273.0, 249.0, 1000.0, 251.0, 468.0, 93.0, 416.0, 329.0]
2025-09-11 23:06:11,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 17 hours, 28 minutes, 45 seconds)
2025-09-11 23:17:52,170 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:17:52,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:18:57,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 619.91968 ± 652.634
2025-09-11 23:18:57,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [64.34382, 308.85287, 468.02005, 19.849674, 810.97205, 161.73682, 208.40715, 1535.4745, 2105.1746, 516.3657]
2025-09-11 23:18:57,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [42.0, 141.0, 190.0, 26.0, 301.0, 79.0, 104.0, 567.0, 732.0, 224.0]
2025-09-11 23:18:57,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 17 hours, 52 seconds)
2025-09-11 23:30:07,528 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:30:07,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:32:05,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1258.94446 ± 949.999
2025-09-11 23:32:05,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1259.3751, 2979.7903, 357.22885, 50.832947, 1832.5043, 983.52856, 2774.9097, 1293.2626, 632.1297, 425.88193]
2025-09-11 23:32:05,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [433.0, 1000.0, 147.0, 46.0, 615.0, 324.0, 926.0, 387.0, 225.0, 168.0]
2025-09-11 23:32:05,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 16 hours, 43 minutes, 36 seconds)
2025-09-11 23:43:36,697 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:43:36,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:45:20,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1128.33655 ± 1094.126
2025-09-11 23:45:20,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1009.2459, 375.76273, 455.52805, 146.33452, 815.4241, 273.96112, 2176.28, 64.648865, 3116.9998, 2849.1794]
2025-09-11 23:45:20,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [350.0, 153.0, 167.0, 93.0, 257.0, 127.0, 706.0, 45.0, 1000.0, 913.0]
2025-09-11 23:45:20,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 16 hours, 27 minutes, 7 seconds)
2025-09-11 23:56:56,359 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:56:56,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:58:16,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 814.37860 ± 739.203
2025-09-11 23:58:16,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [2058.1292, 106.31181, 876.6805, 376.82526, 506.21307, 185.80472, 974.20264, 2307.0461, 592.0185, 160.55406]
2025-09-11 23:58:16,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [704.0, 60.0, 328.0, 184.0, 207.0, 89.0, 311.0, 756.0, 224.0, 96.0]
2025-09-11 23:58:16,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 16 hours, 25 minutes, 55 seconds)
2025-09-12 00:09:45,126 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:09:45,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:11:09,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 848.28870 ± 1010.688
2025-09-12 00:11:09,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [604.93933, 95.863556, 1151.3027, 99.12299, 394.74155, 295.2354, 2521.9539, 25.33243, 292.58002, 3001.8147]
2025-09-12 00:11:09,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [232.0, 64.0, 404.0, 76.0, 167.0, 158.0, 799.0, 24.0, 138.0, 1000.0]
2025-09-12 00:11:09,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 16 hours, 14 minutes, 24 seconds)
2025-09-12 00:22:33,015 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:22:33,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:24:35,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1318.81445 ± 1124.749
2025-09-12 00:24:35,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [16.926105, 2784.0967, 1560.1603, 1971.4465, 78.48147, 2247.2053, 3100.5212, 309.46585, 21.855751, 1097.9851]
2025-09-12 00:24:35,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 900.0, 525.0, 645.0, 48.0, 756.0, 1000.0, 132.0, 22.0, 374.0]
2025-09-12 00:24:35,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 16 hours, 11 minutes, 16 seconds)
2025-09-12 00:35:29,213 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:35:29,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:36:42,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 782.00702 ± 711.741
2025-09-12 00:36:42,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [162.50163, 215.05301, 14.607945, 1128.9221, 36.735893, 2072.357, 1622.9846, 1118.071, 174.85971, 1273.9772]
2025-09-12 00:36:42,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 97.0, 19.0, 370.0, 40.0, 675.0, 527.0, 342.0, 84.0, 429.0]
2025-09-12 00:36:42,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 15 hours, 43 minutes, 16 seconds)
2025-09-12 00:48:18,304 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:48:18,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:49:51,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 996.65759 ± 965.227
2025-09-12 00:49:51,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [3150.7383, 1017.52203, 151.46906, 104.17395, 55.164146, 437.15842, 952.26196, 1984.3525, 1757.2437, 356.49234]
2025-09-12 00:49:51,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 317.0, 78.0, 61.0, 50.0, 178.0, 363.0, 603.0, 561.0, 155.0]
2025-09-12 00:49:51,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 15 hours, 28 minutes, 55 seconds)
2025-09-12 01:00:58,693 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:00:58,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:03:13,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1468.17200 ± 1197.975
2025-09-12 01:03:13,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [744.1896, 270.4532, 3082.872, 3094.9229, 2287.8184, 70.61864, 455.90024, 38.69088, 2293.0125, 2343.2412]
2025-09-12 01:03:13,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [289.0, 123.0, 1000.0, 1000.0, 751.0, 44.0, 183.0, 37.0, 759.0, 764.0]
2025-09-12 01:03:13,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 15 hours, 22 minutes, 7 seconds)
2025-09-12 01:15:00,216 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:15:00,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:16:38,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1027.33313 ± 964.620
2025-09-12 01:16:38,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [3144.2236, 267.78488, 2058.8743, 458.20682, 539.21277, 64.64131, 1067.2291, 828.5288, 1825.2506, 19.377945]
2025-09-12 01:16:38,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 124.0, 690.0, 191.0, 211.0, 50.0, 378.0, 303.0, 603.0, 20.0]
2025-09-12 01:16:38,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 15 hours, 16 minutes, 40 seconds)
2025-09-12 01:27:49,996 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:27:50,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:29:00,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 748.39471 ± 1008.276
2025-09-12 01:29:00,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [2071.7285, 147.38382, 27.29695, 575.1708, 56.10264, 289.34842, 16.467472, 1133.7747, 44.61812, 3122.0554]
2025-09-12 01:29:00,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [678.0, 76.0, 22.0, 223.0, 43.0, 128.0, 21.0, 368.0, 33.0, 1000.0]
2025-09-12 01:29:00,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 14 hours, 48 minutes, 59 seconds)
2025-09-12 01:39:49,516 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:39:49,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:41:03,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 726.71765 ± 740.778
2025-09-12 01:41:03,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [384.72574, 790.46484, 396.8873, 470.618, 2682.9524, 172.01984, 117.62042, 1221.8494, 95.623726, 934.41547]
2025-09-12 01:41:03,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [161.0, 311.0, 165.0, 188.0, 892.0, 81.0, 83.0, 426.0, 69.0, 330.0]
2025-09-12 01:41:03,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 14 hours, 35 minutes, 22 seconds)
2025-09-12 01:52:54,904 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:52:54,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:54:20,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 878.81262 ± 536.303
2025-09-12 01:54:20,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [292.77405, 1159.6105, 448.59198, 470.14804, 795.80457, 834.1791, 1351.741, 126.505264, 1878.3159, 1430.4559]
2025-09-12 01:54:20,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [132.0, 405.0, 201.0, 183.0, 290.0, 322.0, 432.0, 65.0, 618.0, 473.0]
2025-09-12 01:54:20,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 14 hours, 24 minutes, 17 seconds)
2025-09-12 02:05:17,651 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:05:17,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:06:25,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 719.15790 ± 526.664
2025-09-12 02:06:25,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1271.3506, 1214.5, 530.80365, 75.38174, 26.204144, 1349.6093, 743.8658, 1354.818, 595.8147, 29.23123]
2025-09-12 02:06:25,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [390.0, 403.0, 198.0, 46.0, 29.0, 456.0, 272.0, 465.0, 222.0, 30.0]
2025-09-12 02:06:25,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 13 hours, 54 minutes, 21 seconds)
2025-09-12 02:17:40,345 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:17:40,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:19:09,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 999.53430 ± 1003.224
2025-09-12 02:19:09,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [15.927062, 646.24963, 795.5288, 3147.1187, 413.7722, 1207.5425, 158.20631, 1037.4354, 50.276993, 2523.2854]
2025-09-12 02:19:09,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 217.0, 287.0, 1000.0, 163.0, 373.0, 77.0, 324.0, 34.0, 772.0]
2025-09-12 02:19:09,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 13 hours, 32 minutes, 47 seconds)
2025-09-12 02:30:30,759 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:30:30,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:32:18,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1123.43359 ± 919.080
2025-09-12 02:32:18,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [3062.0208, 739.5605, 280.47098, 1095.7504, 1486.2489, 754.50195, 21.952444, 1333.8243, 161.76416, 2298.2427]
2025-09-12 02:32:18,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [974.0, 293.0, 126.0, 424.0, 525.0, 274.0, 20.0, 480.0, 90.0, 756.0]
2025-09-12 02:32:18,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 13 hours, 30 minutes, 9 seconds)
2025-09-12 02:43:31,061 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:43:31,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:45:18,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1136.38513 ± 803.654
2025-09-12 02:45:18,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1339.8956, 653.3788, 77.14149, 1367.4265, 115.73574, 963.2684, 1387.1423, 1177.1309, 1178.9978, 3103.734]
2025-09-12 02:45:18,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [459.0, 237.0, 67.0, 469.0, 64.0, 334.0, 463.0, 410.0, 401.0, 1000.0]
2025-09-12 02:45:18,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 13 hours, 29 minutes, 31 seconds)
2025-09-12 02:57:04,168 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:57:04,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:59:04,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1307.61743 ± 1086.051
2025-09-12 02:59:04,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1826.3385, 2550.9001, 3091.989, 518.27734, 14.376789, 586.80493, 940.77625, 2707.0237, 814.78827, 24.899967]
2025-09-12 02:59:04,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [603.0, 784.0, 1000.0, 210.0, 18.0, 221.0, 322.0, 870.0, 303.0, 22.0]
2025-09-12 02:59:04,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 13 hours, 22 minutes, 33 seconds)
2025-09-12 03:10:10,881 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:10:10,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:11:54,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1149.33472 ± 836.713
2025-09-12 03:11:54,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [281.5917, 14.481149, 1464.8097, 1451.2501, 1527.2443, 493.99658, 1226.6493, 3162.663, 1086.9761, 783.68555]
2025-09-12 03:11:54,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [124.0, 17.0, 467.0, 466.0, 486.0, 191.0, 404.0, 1000.0, 337.0, 279.0]
2025-09-12 03:11:54,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 13 hours, 18 minutes, 56 seconds)
2025-09-12 03:23:25,891 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:23:25,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:25:01,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1020.67957 ± 717.644
2025-09-12 03:25:01,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [16.143486, 1982.6024, 549.8582, 471.6577, 2244.115, 987.02856, 1181.9001, 1649.4211, 981.9705, 142.09886]
2025-09-12 03:25:01,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 637.0, 249.0, 182.0, 730.0, 300.0, 402.0, 521.0, 344.0, 85.0]
2025-09-12 03:25:01,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 13 hours, 10 minutes, 26 seconds)
2025-09-12 03:36:23,154 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:36:23,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:38:18,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1287.95483 ± 1078.520
2025-09-12 03:38:18,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1636.2252, 2769.7295, 77.14671, 124.43314, 1036.6445, 536.1009, 2528.0557, 182.55807, 3021.2512, 967.4034]
2025-09-12 03:38:18,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [529.0, 897.0, 47.0, 65.0, 353.0, 200.0, 803.0, 91.0, 929.0, 326.0]
2025-09-12 03:38:18,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 12 hours, 58 minutes, 50 seconds)
2025-09-12 03:49:29,735 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:49:29,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:51:04,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1002.77771 ± 1080.478
2025-09-12 03:51:04,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [14.189541, 1988.1057, 296.08832, 21.934553, 217.94008, 144.07971, 697.07434, 2594.9202, 981.2406, 3072.2046]
2025-09-12 03:51:04,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 650.0, 139.0, 30.0, 106.0, 75.0, 254.0, 842.0, 351.0, 1000.0]
2025-09-12 03:51:04,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 12 hours, 42 minutes, 51 seconds)
2025-09-12 04:02:37,109 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:02:37,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:05:04,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1621.19507 ± 1260.481
2025-09-12 04:05:04,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [3126.7595, 3136.3813, 286.80222, 1162.0272, 138.2809, 50.28579, 2906.9932, 588.60474, 1812.0839, 3003.733]
2025-09-12 04:05:04,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 125.0, 411.0, 84.0, 51.0, 931.0, 219.0, 597.0, 1000.0]
2025-09-12 04:05:04,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (1621.20) for latency ExtremeClogL1U23
2025-09-12 04:05:04,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 12 hours, 32 minutes, 21 seconds)
2025-09-12 04:16:17,026 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:16:17,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:19:09,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1924.91699 ± 1046.785
2025-09-12 04:19:09,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [3144.5024, 1993.573, 1417.4672, 1620.3856, 389.58353, 772.8419, 711.62634, 3186.3071, 2887.8328, 3125.0503]
2025-09-12 04:19:09,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 652.0, 476.0, 536.0, 191.0, 283.0, 276.0, 1000.0, 932.0, 1000.0]
2025-09-12 04:19:09,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (1924.92) for latency ExtremeClogL1U23
2025-09-12 04:19:09,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 12 hours, 33 minutes, 10 seconds)
2025-09-12 04:30:14,249 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:30:14,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:32:15,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1346.22559 ± 1096.870
2025-09-12 04:32:15,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [46.669537, 1719.0327, 35.257645, 1136.4819, 2894.023, 44.60156, 2245.9417, 1115.7031, 1036.4985, 3188.0464]
2025-09-12 04:32:15,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [40.0, 561.0, 32.0, 391.0, 927.0, 32.0, 716.0, 395.0, 337.0, 1000.0]
2025-09-12 04:32:15,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 12 hours, 19 minutes, 36 seconds)
2025-09-12 04:44:04,033 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:44:04,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:45:34,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 930.83075 ± 1007.597
2025-09-12 04:45:34,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [2472.5544, 3063.5198, 24.621447, 307.37848, 463.6043, 380.05743, 281.62805, 1582.5916, 289.49933, 442.8537]
2025-09-12 04:45:34,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [784.0, 961.0, 24.0, 174.0, 190.0, 157.0, 133.0, 564.0, 155.0, 177.0]
2025-09-12 04:45:34,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 12 hours, 6 minutes, 27 seconds)
2025-09-12 04:57:05,271 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:57:05,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:58:14,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 697.12042 ± 466.278
2025-09-12 04:58:14,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [32.044964, 66.55558, 394.83954, 919.03894, 1688.056, 616.18146, 640.10254, 810.50226, 672.73035, 1131.1523]
2025-09-12 04:58:14,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [41.0, 48.0, 170.0, 313.0, 554.0, 239.0, 233.0, 304.0, 243.0, 406.0]
2025-09-12 04:58:14,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 11 hours, 51 minutes, 57 seconds)
2025-09-12 05:09:01,779 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:09:01,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:10:31,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 923.76379 ± 896.505
2025-09-12 05:10:31,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1496.8054, 1672.4005, 223.35081, 49.446198, 3065.2117, 493.21832, 1075.3679, 685.94244, 12.148035, 463.74585]
2025-09-12 05:10:31,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [512.0, 579.0, 107.0, 43.0, 1000.0, 189.0, 371.0, 271.0, 15.0, 181.0]
2025-09-12 05:10:31,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 11 hours, 20 minutes, 43 seconds)
2025-09-12 05:21:40,482 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:21:40,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:22:46,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 657.53217 ± 555.277
2025-09-12 05:22:46,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1283.4585, 1745.5183, 1303.732, 94.99662, 416.0549, 384.45685, 693.7386, 385.69183, 226.58105, 41.093082]
2025-09-12 05:22:46,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [441.0, 576.0, 454.0, 54.0, 163.0, 186.0, 245.0, 180.0, 103.0, 46.0]
2025-09-12 05:22:46,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 10 hours, 48 minutes, 50 seconds)
2025-09-12 05:34:39,344 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:34:39,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:36:12,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1031.35400 ± 1143.161
2025-09-12 05:36:12,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [109.92092, 40.965195, 1240.6376, 3011.9563, 166.31476, 3067.0012, 21.65451, 19.024809, 1025.759, 1610.3054]
2025-09-12 05:36:12,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [58.0, 31.0, 403.0, 940.0, 89.0, 956.0, 24.0, 23.0, 341.0, 524.0]
2025-09-12 05:36:12,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 10 hours, 39 minutes, 22 seconds)
2025-09-12 05:47:47,821 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:47:47,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:49:43,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1259.21362 ± 1025.331
2025-09-12 05:49:43,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [814.2893, 2913.675, 477.41492, 116.744026, 726.6861, 192.5959, 3236.0618, 1733.343, 969.2972, 1412.0304]
2025-09-12 05:49:43,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [280.0, 941.0, 189.0, 76.0, 258.0, 92.0, 1000.0, 555.0, 340.0, 485.0]
2025-09-12 05:49:43,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 10 hours, 28 minutes, 44 seconds)
2025-09-12 06:00:12,022 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:00:12,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:02:56,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1907.11462 ± 1178.060
2025-09-12 06:02:56,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [640.3321, 718.6077, 1734.8062, 3152.257, 3189.4565, 2979.4092, 84.84416, 801.6764, 3011.3132, 2758.4434]
2025-09-12 06:02:56,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [235.0, 254.0, 560.0, 1000.0, 1000.0, 946.0, 50.0, 279.0, 964.0, 850.0]
2025-09-12 06:02:56,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 10 hours, 21 minutes, 8 seconds)
2025-09-12 06:14:26,895 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:14:26,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:15:27,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 624.77081 ± 931.301
2025-09-12 06:15:27,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [14.088236, 1035.3926, 3148.8782, 47.892796, 119.1897, 1173.6892, 76.5085, 309.97223, 304.36575, 17.730825]
2025-09-12 06:15:27,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 342.0, 1000.0, 42.0, 80.0, 407.0, 48.0, 157.0, 131.0, 22.0]
2025-09-12 06:15:27,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 10 hours, 10 minutes, 22 seconds)
2025-09-12 06:27:05,051 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:27:05,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:28:38,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1032.22876 ± 592.876
2025-09-12 06:28:38,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [165.68367, 912.4209, 1399.9385, 891.4522, 894.42236, 20.92847, 1878.8789, 832.4958, 1647.5839, 1678.4827]
2025-09-12 06:28:38,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [104.0, 310.0, 467.0, 281.0, 310.0, 25.0, 604.0, 289.0, 563.0, 534.0]
2025-09-12 06:28:38,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 10 hours, 5 minutes, 59 seconds)
2025-09-12 06:39:31,004 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:39:31,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:41:45,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1481.56079 ± 1007.497
2025-09-12 06:41:45,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [2621.7686, 399.69904, 2050.7727, 3075.7058, 1035.5476, 2078.345, 712.0327, 166.49129, 2311.4282, 363.81604]
2025-09-12 06:41:45,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [857.0, 174.0, 674.0, 978.0, 358.0, 675.0, 272.0, 85.0, 730.0, 149.0]
2025-09-12 06:41:45,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 9 hours, 50 minutes, 3 seconds)
2025-09-12 06:53:12,046 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:53:12,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:54:38,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 918.50671 ± 1173.108
2025-09-12 06:54:38,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [97.6241, 83.183044, 297.893, 116.31759, 371.98254, 1401.6403, 115.8396, 408.30118, 3138.2869, 3153.9988]
2025-09-12 06:54:38,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [56.0, 51.0, 145.0, 64.0, 153.0, 463.0, 63.0, 176.0, 1000.0, 1000.0]
2025-09-12 06:54:38,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 9 hours, 31 minutes, 17 seconds)
2025-09-12 07:05:34,504 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:05:34,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:07:08,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 992.00763 ± 574.336
2025-09-12 07:07:08,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1867.5381, 1087.6304, 904.70953, 2014.6842, 86.69113, 352.80594, 850.6344, 827.6995, 653.4493, 1274.233]
2025-09-12 07:07:08,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [610.0, 379.0, 335.0, 655.0, 60.0, 160.0, 324.0, 290.0, 240.0, 425.0]
2025-09-12 07:07:08,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 9 hours, 12 minutes)
2025-09-12 07:19:03,104 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:19:03,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:20:58,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1284.19385 ± 980.266
2025-09-12 07:20:58,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [392.90878, 1339.1191, 2705.6018, 327.33862, 3161.4338, 250.19347, 1961.3715, 437.16678, 958.54474, 1308.2594]
2025-09-12 07:20:58,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [168.0, 444.0, 845.0, 137.0, 1000.0, 110.0, 631.0, 174.0, 340.0, 397.0]
2025-09-12 07:20:58,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 9 hours, 10 minutes, 24 seconds)
2025-09-12 07:31:40,920 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:31:40,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:32:59,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 841.06689 ± 786.327
2025-09-12 07:32:59,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [481.26288, 175.8473, 88.8468, 20.522575, 2509.9968, 1639.1351, 1033.3502, 1418.0654, 88.22149, 955.4206]
2025-09-12 07:32:59,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [184.0, 88.0, 60.0, 23.0, 781.0, 555.0, 314.0, 446.0, 71.0, 347.0]
2025-09-12 07:32:59,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 47 minutes, 36 seconds)
2025-09-12 07:44:11,413 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:44:11,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:46:13,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1337.01831 ± 1069.690
2025-09-12 07:46:13,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [474.1166, 176.66263, 746.86475, 1639.3658, 2482.582, 2463.0505, 3110.6309, 1963.3248, 95.349205, 218.23665]
2025-09-12 07:46:13,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [191.0, 97.0, 265.0, 508.0, 809.0, 799.0, 1000.0, 630.0, 54.0, 95.0]
2025-09-12 07:46:13,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 8 hours, 35 minutes, 39 seconds)
2025-09-12 07:57:31,101 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:57:31,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:58:33,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 653.64838 ± 902.242
2025-09-12 07:58:33,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [3120.1848, 285.8085, 347.12296, 1221.2876, 54.078655, 19.15376, 55.269302, 657.9454, 736.84546, 38.787506]
2025-09-12 07:58:33,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [992.0, 135.0, 147.0, 417.0, 47.0, 24.0, 45.0, 232.0, 288.0, 36.0]
2025-09-12 07:58:33,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 18 minutes, 31 seconds)
2025-09-12 08:09:44,101 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:09:44,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:11:50,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1395.37561 ± 998.151
2025-09-12 08:11:50,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [904.8107, 258.37952, 2642.8203, 3073.458, 2686.2122, 1221.256, 720.69415, 1567.4419, 284.11844, 594.5642]
2025-09-12 08:11:50,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [307.0, 113.0, 837.0, 1000.0, 840.0, 421.0, 267.0, 523.0, 124.0, 217.0]
2025-09-12 08:11:50,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 8 hours, 11 minutes, 45 seconds)
2025-09-12 08:22:53,386 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:22:53,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:26:06,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 2177.58813 ± 956.908
2025-09-12 08:26:06,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [2825.601, 2212.8455, 3075.3032, 733.39044, 1395.068, 1427.1613, 3103.2954, 3130.1719, 3117.1807, 755.8648]
2025-09-12 08:26:06,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [910.0, 737.0, 1000.0, 260.0, 463.0, 487.0, 967.0, 1000.0, 1000.0, 266.0]
2025-09-12 08:26:06,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (2177.59) for latency ExtremeClogL1U23
2025-09-12 08:26:06,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 8 hours, 1 minute, 53 seconds)
2025-09-12 08:37:25,830 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:37:25,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:39:59,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1706.27209 ± 1127.083
2025-09-12 08:39:59,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [2538.3628, 730.60583, 1766.3832, 2976.813, 349.3756, 464.25342, 3084.5056, 1793.0221, 239.43427, 3119.964]
2025-09-12 08:39:59,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [809.0, 289.0, 583.0, 958.0, 146.0, 190.0, 1000.0, 602.0, 104.0, 1000.0]
2025-09-12 08:39:59,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 8 hours, 2 minutes, 27 seconds)
2025-09-12 08:51:29,892 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:51:29,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:54:02,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1699.74182 ± 1293.355
2025-09-12 08:54:02,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [65.10185, 3047.5498, 686.67596, 3108.5537, 312.52646, 331.45374, 3090.1587, 3137.5447, 2449.8748, 767.9787]
2025-09-12 08:54:02,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [42.0, 1000.0, 247.0, 1000.0, 136.0, 141.0, 1000.0, 1000.0, 788.0, 280.0]
2025-09-12 08:54:02,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 54 minutes, 45 seconds)
2025-09-12 09:05:15,937 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:05:15,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:07:19,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1371.32214 ± 1089.400
2025-09-12 09:07:19,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1315.0282, 1604.2418, 3118.8074, 1200.2847, 168.02599, 191.68024, 42.44802, 2696.1753, 2728.943, 647.5867]
2025-09-12 09:07:19,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [439.0, 499.0, 1000.0, 411.0, 82.0, 90.0, 38.0, 851.0, 868.0, 241.0]
2025-09-12 09:07:19,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 47 minutes, 37 seconds)
2025-09-12 09:18:37,531 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:18:37,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:20:14,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1065.75659 ± 1278.106
2025-09-12 09:20:14,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [18.603024, 13.633301, 89.7148, 3200.67, 3188.3738, 176.45044, 77.7406, 1467.3024, 177.00548, 2248.0723]
2025-09-12 09:20:14,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 17.0, 61.0, 1000.0, 1000.0, 82.0, 72.0, 473.0, 83.0, 717.0]
2025-09-12 09:20:14,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 7 hours, 31 minutes, 26 seconds)
2025-09-12 09:31:11,499 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:31:11,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:33:46,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1761.02246 ± 1061.127
2025-09-12 09:33:46,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [3141.658, 3208.5647, 1633.9994, 1966.1561, 1302.7869, 1692.4591, 727.2602, 673.6198, 3164.9966, 98.72437]
2025-09-12 09:33:46,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 534.0, 607.0, 438.0, 569.0, 251.0, 241.0, 1000.0, 61.0]
2025-09-12 09:33:46,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 7 hours, 13 minutes, 3 seconds)
2025-09-12 09:45:43,241 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:45:43,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:47:32,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1197.09338 ± 1154.289
2025-09-12 09:47:32,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1075.4816, 122.94925, 542.19086, 149.7135, 3047.9333, 217.87315, 21.970037, 1377.847, 2340.9421, 3074.0325]
2025-09-12 09:47:32,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [387.0, 74.0, 204.0, 72.0, 1000.0, 97.0, 23.0, 445.0, 758.0, 1000.0]
2025-09-12 09:47:32,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 58 minutes, 50 seconds)
2025-09-12 09:58:13,328 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:58:13,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:00:29,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1527.29248 ± 972.458
2025-09-12 10:00:29,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1194.2517, 478.8141, 13.250855, 2225.6426, 1036.5337, 3110.1816, 3098.1616, 1475.6096, 1606.6624, 1033.817]
2025-09-12 10:00:29,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [392.0, 180.0, 17.0, 737.0, 319.0, 1000.0, 1000.0, 459.0, 525.0, 360.0]
2025-09-12 10:00:29,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 38 minutes, 38 seconds)
2025-09-12 10:11:41,120 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:11:41,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:13:23,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1117.05444 ± 1099.901
2025-09-12 10:13:23,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [217.65433, 2383.7815, 1312.4414, 2458.8508, 53.55553, 122.79266, 46.293655, 356.0287, 1102.0421, 3117.1045]
2025-09-12 10:13:23,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 794.0, 437.0, 790.0, 49.0, 64.0, 32.0, 146.0, 387.0, 1000.0]
2025-09-12 10:13:23,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 23 minutes, 9 seconds)
2025-09-12 10:24:50,122 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:24:50,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:26:43,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1254.22412 ± 1119.488
2025-09-12 10:26:43,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [456.1235, 256.26337, 379.66895, 1083.7733, 2458.837, 3140.0115, 2735.2607, 140.36534, 1804.5349, 87.40273]
2025-09-12 10:26:43,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [170.0, 112.0, 148.0, 406.0, 754.0, 992.0, 884.0, 71.0, 606.0, 64.0]
2025-09-12 10:26:43,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 12 minutes, 17 seconds)
2025-09-12 10:37:47,226 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:37:47,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:40:01,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1517.37524 ± 1040.606
2025-09-12 10:40:01,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [3101.5447, 1127.9515, 1603.0227, 129.62013, 2517.4353, 1299.8339, 860.8707, 3136.8887, 1327.2269, 69.35938]
2025-09-12 10:40:01,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 386.0, 539.0, 66.0, 814.0, 387.0, 303.0, 1000.0, 446.0, 47.0]
2025-09-12 10:40:01,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 57 minutes, 45 seconds)
2025-09-12 10:52:05,115 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:52:05,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:53:09,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 651.69739 ± 615.359
2025-09-12 10:53:09,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [340.11243, 1568.147, 17.73527, 772.254, 1533.3909, 1506.5995, 111.87825, 14.403555, 237.86938, 414.58392]
2025-09-12 10:53:09,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [137.0, 535.0, 23.0, 279.0, 502.0, 509.0, 76.0, 21.0, 106.0, 177.0]
2025-09-12 10:53:09,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 41 minutes, 11 seconds)
2025-09-12 11:03:55,603 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:03:55,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:06:07,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1458.24438 ± 850.170
2025-09-12 11:06:07,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1899.935, 2605.094, 1739.7726, 1283.0559, 983.643, 1298.428, 462.6078, 293.97577, 911.2442, 3104.6873]
2025-09-12 11:06:07,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [613.0, 825.0, 558.0, 423.0, 344.0, 442.0, 175.0, 124.0, 311.0, 1000.0]
2025-09-12 11:06:07,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 28 minutes, 12 seconds)
2025-09-12 11:17:52,346 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:17:52,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:20:01,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1415.20605 ± 1111.321
2025-09-12 11:20:01,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [2542.6296, 3160.259, 94.0985, 568.2012, 1327.1505, 3104.6814, 1840.2313, 451.9864, 392.48062, 670.3413]
2025-09-12 11:20:01,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [821.0, 1000.0, 71.0, 221.0, 430.0, 1000.0, 593.0, 174.0, 163.0, 244.0]
2025-09-12 11:20:01,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 19 minutes, 48 seconds)
2025-09-12 11:31:07,704 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:31:07,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:33:48,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1766.30664 ± 703.077
2025-09-12 11:33:48,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [963.54034, 1928.0205, 1698.7239, 3124.6438, 686.7455, 2793.0283, 1476.4525, 1678.5806, 1865.9263, 1447.4052]
2025-09-12 11:33:48,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [328.0, 637.0, 552.0, 1000.0, 243.0, 919.0, 475.0, 555.0, 597.0, 456.0]
2025-09-12 11:33:48,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 8 minutes, 34 seconds)
2025-09-12 11:45:14,948 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:45:14,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:46:27,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 756.77234 ± 563.797
2025-09-12 11:46:27,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1364.5758, 1095.0538, 37.399406, 1421.4935, 580.7391, 373.3228, 14.293075, 208.89275, 1587.2036, 884.74945]
2025-09-12 11:46:27,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [450.0, 368.0, 40.0, 474.0, 215.0, 158.0, 18.0, 95.0, 497.0, 307.0]
2025-09-12 11:46:27,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 52 minutes, 21 seconds)
2025-09-12 11:57:48,487 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:57:48,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:59:38,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1198.19861 ± 821.279
2025-09-12 11:59:38,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [337.87344, 993.63007, 2256.768, 17.938498, 2284.282, 700.3267, 656.18854, 649.9017, 2013.205, 2071.873]
2025-09-12 11:59:38,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [134.0, 338.0, 679.0, 23.0, 727.0, 244.0, 247.0, 261.0, 675.0, 647.0]
2025-09-12 11:59:38,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 39 minutes, 14 seconds)
2025-09-12 12:10:49,921 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:10:49,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:12:26,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1084.02686 ± 1009.977
2025-09-12 12:12:26,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [91.96224, 3063.6245, 2812.8306, 580.2868, 474.19504, 410.5137, 16.46303, 1040.051, 1363.942, 986.3998]
2025-09-12 12:12:26,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [52.0, 1000.0, 896.0, 208.0, 189.0, 172.0, 20.0, 356.0, 396.0, 310.0]
2025-09-12 12:12:26,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 25 minutes, 16 seconds)
2025-09-12 12:23:51,881 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:23:51,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:25:42,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1249.03784 ± 1020.474
2025-09-12 12:25:42,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [3098.1663, 282.18387, 2160.429, 933.02954, 17.965075, 589.31885, 1002.74084, 165.91812, 2581.982, 1658.6443]
2025-09-12 12:25:42,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 119.0, 683.0, 315.0, 21.0, 214.0, 337.0, 81.0, 799.0, 530.0]
2025-09-12 12:25:42,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 9 minutes, 38 seconds)
2025-09-12 12:36:26,175 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:36:26,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:38:10,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1202.47339 ± 822.247
2025-09-12 12:38:10,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [134.93611, 970.97217, 589.532, 1089.6387, 1357.1458, 1920.8672, 2321.9097, 213.842, 741.46765, 2684.4216]
2025-09-12 12:38:10,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [68.0, 320.0, 225.0, 354.0, 404.0, 601.0, 740.0, 115.0, 255.0, 825.0]
2025-09-12 12:38:10,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 51 minutes, 44 seconds)
2025-09-12 12:49:27,745 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:49:27,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:50:21,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 550.52478 ± 910.269
2025-09-12 12:50:21,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [18.600521, 343.28506, 18.039932, 369.49005, 3116.7158, 1104.9624, 264.58826, 15.009235, 17.410631, 237.14607]
2025-09-12 12:50:21,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 141.0, 23.0, 167.0, 1000.0, 373.0, 114.0, 17.0, 23.0, 102.0]
2025-09-12 12:50:21,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 37 minutes, 13 seconds)
2025-09-12 13:01:57,669 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:01:57,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:03:30,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1028.16968 ± 809.692
2025-09-12 13:03:30,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [333.9295, 1079.9176, 745.42834, 52.173805, 2075.9675, 2689.4246, 797.08234, 13.101468, 1221.1803, 1273.4913]
2025-09-12 13:03:30,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [135.0, 373.0, 262.0, 64.0, 651.0, 859.0, 289.0, 17.0, 400.0, 378.0]
2025-09-12 13:03:30,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 24 minutes, 20 seconds)
2025-09-12 13:14:02,412 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:14:02,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:15:47,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1173.42615 ± 875.432
2025-09-12 13:15:47,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [712.3645, 1258.0408, 1282.3651, 899.441, 2373.5955, 507.39014, 3119.2588, 638.0511, 927.894, 15.861137]
2025-09-12 13:15:47,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [247.0, 408.0, 424.0, 312.0, 744.0, 184.0, 1000.0, 226.0, 333.0, 19.0]
2025-09-12 13:15:47,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 10 minutes, 2 seconds)
2025-09-12 13:27:12,089 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:27:12,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:29:08,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1270.76343 ± 1036.350
2025-09-12 13:29:08,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [924.1322, 538.1927, 3116.773, 901.3659, 3004.3486, 554.4104, 2074.0754, 1206.2363, 356.25858, 31.841055]
2025-09-12 13:29:08,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [293.0, 199.0, 1000.0, 308.0, 1000.0, 205.0, 672.0, 428.0, 143.0, 33.0]
2025-09-12 13:29:08,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 57 minutes, 36 seconds)
2025-09-12 13:40:30,239 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:40:30,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:42:43,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1496.66309 ± 1097.725
2025-09-12 13:42:43,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [14.956589, 2750.322, 1696.8611, 1246.7046, 1141.59, 3178.891, 208.43655, 741.5571, 3111.8328, 875.4797]
2025-09-12 13:42:43,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 889.0, 550.0, 414.0, 345.0, 1000.0, 93.0, 263.0, 1000.0, 300.0]
2025-09-12 13:42:43,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 47 minutes, 48 seconds)
2025-09-12 13:53:30,009 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:53:30,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:54:53,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 956.38623 ± 1012.731
2025-09-12 13:54:53,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [59.534866, 1616.4646, 263.47278, 1566.4929, 35.7835, 13.558705, 1646.5513, 3243.3704, 1009.56757, 109.06541]
2025-09-12 13:54:53,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [40.0, 527.0, 113.0, 521.0, 35.0, 19.0, 501.0, 993.0, 309.0, 59.0]
2025-09-12 13:54:53,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 34 minutes, 54 seconds)
2025-09-12 14:06:32,686 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:06:32,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:09:57,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 2344.22607 ± 1167.034
2025-09-12 14:09:57,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [2284.5686, 19.198309, 3077.5781, 3119.73, 3164.3262, 2989.9636, 3076.7644, 3114.4668, 2451.4924, 144.17256]
2025-09-12 14:09:57,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [746.0, 24.0, 1000.0, 1000.0, 1000.0, 940.0, 1000.0, 1000.0, 787.0, 79.0]
2025-09-12 14:09:57,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (2344.23) for latency ExtremeClogL1U23
2025-09-12 14:09:57,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 26 minutes, 12 seconds)
2025-09-12 14:20:37,329 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:20:37,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:22:26,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1193.42480 ± 764.605
2025-09-12 14:22:26,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [2150.2898, 836.0978, 2235.862, 2419.8008, 273.67032, 932.3544, 1339.1312, 301.06485, 585.1452, 860.83105]
2025-09-12 14:22:26,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [705.0, 285.0, 717.0, 774.0, 119.0, 294.0, 443.0, 142.0, 221.0, 295.0]
2025-09-12 14:22:26,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 13 minutes, 18 seconds)
2025-09-12 14:33:55,262 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:33:55,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:35:54,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1306.81812 ± 1150.149
2025-09-12 14:35:54,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [2699.1426, 1934.1396, 941.23956, 2351.686, 82.90175, 18.613792, 48.304306, 1757.9944, 111.49937, 3122.6606]
2025-09-12 14:35:54,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [860.0, 635.0, 338.0, 769.0, 74.0, 22.0, 51.0, 586.0, 60.0, 1000.0]
2025-09-12 14:35:54,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 11 seconds)
2025-09-12 14:47:40,374 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:47:40,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:49:15,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1059.35254 ± 767.890
2025-09-12 14:49:15,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [919.3145, 1252.9491, 360.5813, 105.051216, 1843.1913, 193.00557, 2574.445, 545.75903, 1776.8971, 1022.3314]
2025-09-12 14:49:15,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [311.0, 417.0, 142.0, 61.0, 584.0, 96.0, 828.0, 201.0, 556.0, 344.0]
2025-09-12 14:49:15,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 46 minutes, 27 seconds)
2025-09-12 14:59:46,408 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:59:46,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:01:21,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1049.29053 ± 997.759
2025-09-12 15:01:21,673 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [558.24603, 386.1962, 931.5371, 3206.5945, 365.27194, 1377.8783, 368.3204, 624.96967, 2606.0925, 67.79815]
2025-09-12 15:01:21,673 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [201.0, 152.0, 305.0, 1000.0, 145.0, 433.0, 145.0, 252.0, 844.0, 60.0]
2025-09-12 15:01:21,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 33 minutes, 2 seconds)
2025-09-12 15:12:18,417 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:12:18,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:14:05,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1181.86853 ± 807.083
2025-09-12 15:14:05,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [646.94763, 2029.2092, 1470.1919, 295.61072, 310.58075, 1738.2051, 1899.1588, 2462.2732, 12.933216, 953.574]
2025-09-12 15:14:05,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [227.0, 678.0, 448.0, 125.0, 133.0, 566.0, 615.0, 789.0, 17.0, 333.0]
2025-09-12 15:14:05,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 16 minutes, 56 seconds)
2025-09-12 15:25:13,848 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:25:13,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:26:57,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1164.22351 ± 1045.394
2025-09-12 15:26:57,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [150.62561, 2336.662, 1506.8328, 3199.3625, 1879.9181, 63.01319, 1434.6885, 14.275487, 117.0699, 939.78705]
2025-09-12 15:26:57,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [77.0, 742.0, 506.0, 1000.0, 601.0, 43.0, 488.0, 18.0, 62.0, 318.0]
2025-09-12 15:26:57,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 4 minutes, 31 seconds)
2025-09-12 15:38:47,933 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:38:47,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:41:16,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1652.06714 ± 1112.047
2025-09-12 15:41:16,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [3120.293, 736.7124, 2329.1016, 498.86786, 15.896349, 842.2919, 1473.2974, 1332.8711, 3023.9958, 3147.3447]
2025-09-12 15:41:16,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 259.0, 754.0, 207.0, 15.0, 308.0, 504.0, 455.0, 969.0, 1000.0]
2025-09-12 15:41:16,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 52 minutes, 17 seconds)
2025-09-12 15:51:42,750 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:51:42,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:54:07,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1658.55688 ± 1052.664
2025-09-12 15:54:07,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [2494.7708, 289.87555, 3138.0986, 1649.3534, 882.9554, 1747.8525, 557.83636, 2529.6458, 280.25754, 3014.9224]
2025-09-12 15:54:07,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [771.0, 125.0, 1000.0, 534.0, 300.0, 566.0, 213.0, 810.0, 138.0, 935.0]
2025-09-12 15:54:07,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 38 minutes, 55 seconds)
2025-09-12 16:05:12,625 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:05:12,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:07:17,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1381.40466 ± 1108.565
2025-09-12 16:07:17,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [3211.6267, 2090.6921, 152.9864, 2868.925, 1863.2874, 719.5575, 275.4359, 2027.9915, 112.12468, 491.41986]
2025-09-12 16:07:17,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 672.0, 82.0, 924.0, 607.0, 266.0, 124.0, 645.0, 58.0, 186.0]
2025-09-12 16:07:17,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 26 minutes, 22 seconds)
2025-09-12 16:19:09,615 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:19:09,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:21:30,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1579.99878 ± 1224.543
2025-09-12 16:21:30,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [230.40494, 2583.3538, 2044.199, 15.458555, 1105.832, 2865.2048, 717.8324, 3198.3257, 60.66613, 2978.7104]
2025-09-12 16:21:30,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [103.0, 818.0, 654.0, 23.0, 384.0, 921.0, 253.0, 1000.0, 62.0, 944.0]
2025-09-12 16:21:30,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 13 minutes, 29 seconds)
2025-09-12 16:32:03,610 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:32:03,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:33:26,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 871.62048 ± 901.247
2025-09-12 16:33:26,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [466.09143, 76.73025, 794.6929, 642.03894, 75.16453, 413.2187, 568.11646, 3128.1887, 600.76807, 1951.1947]
2025-09-12 16:33:26,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [173.0, 62.0, 293.0, 229.0, 66.0, 159.0, 213.0, 1000.0, 216.0, 635.0]
2025-09-12 16:33:26,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1251 [DEBUG]: Training session finished
