2025-09-11 18:44:36,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc0-hopper/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 18:44:36,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc0-hopper/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 18:44:36,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14a07b198550>}
2025-09-11 18:44:36,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1111 [DEBUG]: using device: cuda
2025-09-11 18:44:36,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1133 [INFO]: Creating new trainer
2025-09-11 18:44:36,595 baseline-mbpac-noiseperc0-hopper:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-09-11 18:44:36,595 baseline-mbpac-noiseperc0-hopper:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 18:44:36,602 baseline-mbpac-noiseperc0-hopper:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=3, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-11 18:44:37,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1194 [DEBUG]: Starting training session...
2025-09-11 18:44:37,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 1/100
2025-09-11 18:54:55,552 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 18:54:55,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 18:55:05,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 62.54413 ± 7.066
2025-09-11 18:55:05,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [59.210415, 58.697903, 49.639572, 74.9834, 60.47474, 62.660175, 73.46622, 58.665943, 66.077965, 61.56501]
2025-09-11 18:55:05,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [33.0, 33.0, 28.0, 46.0, 34.0, 35.0, 40.0, 33.0, 37.0, 35.0]
2025-09-11 18:55:05,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (62.54) for latency ExtremeClogL1U23
2025-09-11 18:55:05,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 17 hours, 15 minutes, 39 seconds)
2025-09-11 19:06:34,619 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:06:34,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:07:57,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 372.74933 ± 234.707
2025-09-11 19:07:57,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [270.92233, 340.26974, 399.06284, 30.71397, 430.0231, 228.6369, 844.95026, 162.47841, 294.88574, 725.54987]
2025-09-11 19:07:57,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [271.0, 274.0, 346.0, 31.0, 342.0, 192.0, 640.0, 113.0, 258.0, 621.0]
2025-09-11 19:07:57,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (372.75) for latency ExtremeClogL1U23
2025-09-11 19:07:57,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 19 hours, 3 minutes, 45 seconds)
2025-09-11 19:19:15,664 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:19:15,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:19:22,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 6.36787 ± 8.205
2025-09-11 19:19:22,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [23.544157, 1.7106158, 2.5297399, 1.894906, 1.9017407, 1.5324682, 2.0162194, 1.8008213, 4.993651, 21.754332]
2025-09-11 19:19:22,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [45.0, 22.0, 25.0, 24.0, 24.0, 23.0, 22.0, 22.0, 18.0, 43.0]
2025-09-11 19:19:22,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 18 hours, 43 minutes, 49 seconds)
2025-09-11 19:30:56,428 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:30:56,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:32:21,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 357.24640 ± 201.291
2025-09-11 19:32:21,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [131.29904, 634.2576, 483.78064, 492.20853, 327.10373, 290.2417, 311.17267, 157.8984, 684.8668, 59.634953]
2025-09-11 19:32:21,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [125.0, 591.0, 450.0, 415.0, 303.0, 242.0, 249.0, 161.0, 534.0, 44.0]
2025-09-11 19:32:21,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 19 hours, 5 minutes, 27 seconds)
2025-09-11 19:43:46,401 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:43:46,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:44:23,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 272.92764 ± 86.829
2025-09-11 19:44:23,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [252.7505, 236.80412, 298.7954, 40.65172, 317.96835, 367.64426, 310.20612, 248.7985, 322.41635, 333.24124]
2025-09-11 19:44:23,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [128.0, 127.0, 156.0, 31.0, 155.0, 199.0, 137.0, 135.0, 154.0, 165.0]
2025-09-11 19:44:23,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 18 hours, 55 minutes, 42 seconds)
2025-09-11 19:55:51,129 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:55:51,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:56:25,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 245.49895 ± 114.352
2025-09-11 19:56:25,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [384.7135, 143.97945, 67.818726, 347.071, 126.503365, 362.48544, 229.74554, 123.062584, 343.5488, 326.06113]
2025-09-11 19:56:25,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [191.0, 73.0, 56.0, 186.0, 78.0, 182.0, 109.0, 82.0, 161.0, 134.0]
2025-09-11 19:56:25,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 19 hours, 13 minutes, 2 seconds)
2025-09-11 20:07:49,592 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:07:49,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:08:24,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 286.38022 ± 24.709
2025-09-11 20:08:24,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [243.17705, 305.754, 306.9408, 286.01077, 297.14798, 266.3695, 303.4561, 246.05577, 291.1464, 317.74396]
2025-09-11 20:08:24,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [104.0, 125.0, 139.0, 148.0, 136.0, 139.0, 131.0, 105.0, 143.0, 129.0]
2025-09-11 20:08:24,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 18 hours, 44 minutes, 20 seconds)
2025-09-11 20:19:40,919 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:19:40,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:20:27,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 301.30994 ± 74.288
2025-09-11 20:20:27,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [366.83136, 344.48834, 402.85733, 244.92313, 207.564, 337.21338, 359.80917, 224.12512, 343.54855, 181.739]
2025-09-11 20:20:27,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [225.0, 160.0, 363.0, 129.0, 107.0, 165.0, 184.0, 97.0, 181.0, 130.0]
2025-09-11 20:20:27,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 18 hours, 43 minutes, 44 seconds)
2025-09-11 20:31:51,237 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:31:51,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:32:39,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 471.53458 ± 148.876
2025-09-11 20:32:39,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [657.10406, 436.03827, 376.29517, 105.02956, 523.4201, 454.05014, 418.3893, 592.1646, 567.2527, 585.60144]
2025-09-11 20:32:39,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [223.0, 179.0, 163.0, 58.0, 227.0, 192.0, 172.0, 205.0, 201.0, 213.0]
2025-09-11 20:32:39,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (471.53) for latency ExtremeClogL1U23
2025-09-11 20:32:39,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 18 hours, 17 minutes, 43 seconds)
2025-09-11 20:43:42,838 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:43:42,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:44:18,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 335.41629 ± 92.518
2025-09-11 20:44:18,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [270.9432, 529.94403, 264.27502, 317.2804, 288.3483, 398.1474, 339.18057, 438.29733, 317.7495, 189.99701]
2025-09-11 20:44:18,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [121.0, 182.0, 118.0, 133.0, 121.0, 155.0, 135.0, 156.0, 147.0, 102.0]
2025-09-11 20:44:18,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 17 hours, 58 minutes, 31 seconds)
2025-09-11 20:55:27,205 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:55:27,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:56:25,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 669.32532 ± 305.011
2025-09-11 20:56:25,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [171.97726, 620.51025, 195.58824, 1105.4518, 984.9782, 802.01056, 673.8962, 480.39615, 653.36084, 1005.0833]
2025-09-11 20:56:25,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [86.0, 207.0, 89.0, 358.0, 299.0, 248.0, 218.0, 177.0, 214.0, 312.0]
2025-09-11 20:56:25,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (669.33) for latency ExtremeClogL1U23
2025-09-11 20:56:25,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 17 hours, 48 minutes, 9 seconds)
2025-09-11 21:07:37,672 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:07:37,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:08:38,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 702.61377 ± 194.466
2025-09-11 21:08:38,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [593.41187, 574.90204, 646.1997, 976.57385, 306.8382, 624.80615, 710.45703, 801.6854, 784.6905, 1006.5735]
2025-09-11 21:08:38,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [175.0, 182.0, 203.0, 326.0, 121.0, 200.0, 210.0, 247.0, 300.0, 332.0]
2025-09-11 21:08:38,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (702.61) for latency ExtremeClogL1U23
2025-09-11 21:08:38,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 17 hours, 40 minutes, 1 second)
2025-09-11 21:19:52,482 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:19:52,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:20:56,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 720.40302 ± 311.970
2025-09-11 21:20:56,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [531.5421, 961.7283, 155.56934, 661.0298, 224.09421, 979.0951, 1006.7302, 682.9998, 1040.0704, 961.1708]
2025-09-11 21:20:56,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [172.0, 320.0, 75.0, 234.0, 115.0, 318.0, 316.0, 222.0, 364.0, 318.0]
2025-09-11 21:20:56,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (720.40) for latency ExtremeClogL1U23
2025-09-11 21:20:56,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 17 hours, 32 minutes, 23 seconds)
2025-09-11 21:31:49,904 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:31:49,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:33:02,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 824.45050 ± 360.989
2025-09-11 21:33:02,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [745.86755, 926.73047, 81.37057, 991.0597, 1119.4423, 1002.7148, 475.5257, 449.54236, 1188.8633, 1263.388]
2025-09-11 21:33:02,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [234.0, 352.0, 56.0, 312.0, 352.0, 362.0, 193.0, 186.0, 373.0, 407.0]
2025-09-11 21:33:02,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (824.45) for latency ExtremeClogL1U23
2025-09-11 21:33:02,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 17 hours, 18 minutes, 35 seconds)
2025-09-11 21:43:58,815 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:43:58,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:45:09,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 826.79431 ± 217.792
2025-09-11 21:45:09,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1035.7255, 981.268, 1026.0697, 1012.14453, 337.3928, 721.2082, 1009.0183, 802.08655, 666.4492, 676.581]
2025-09-11 21:45:09,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [328.0, 302.0, 329.0, 332.0, 140.0, 245.0, 321.0, 261.0, 195.0, 224.0]
2025-09-11 21:45:09,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (826.79) for latency ExtremeClogL1U23
2025-09-11 21:45:09,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 17 hours, 14 minutes, 12 seconds)
2025-09-11 21:56:20,710 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:56:20,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:57:14,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 589.55023 ± 277.935
2025-09-11 21:57:14,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [207.6729, 374.51184, 747.0441, 434.22455, 609.8687, 162.62187, 761.0206, 959.8592, 623.5619, 1015.11633]
2025-09-11 21:57:14,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [96.0, 147.0, 256.0, 180.0, 198.0, 77.0, 259.0, 301.0, 206.0, 348.0]
2025-09-11 21:57:14,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 17 hours, 1 minute, 36 seconds)
2025-09-11 22:08:03,713 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:08:03,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:10:00,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1303.94580 ± 301.452
2025-09-11 22:10:00,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [966.00024, 1797.8451, 1252.1062, 890.6652, 1018.8497, 1196.6631, 1539.6263, 1482.2667, 1168.7646, 1726.6708]
2025-09-11 22:10:00,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [301.0, 571.0, 443.0, 338.0, 463.0, 373.0, 477.0, 468.0, 434.0, 638.0]
2025-09-11 22:10:00,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (1303.95) for latency ExtremeClogL1U23
2025-09-11 22:10:00,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 16 hours, 58 minutes, 31 seconds)
2025-09-11 22:21:01,973 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:21:01,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:22:14,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 837.65607 ± 359.299
2025-09-11 22:22:14,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [186.76451, 1243.4137, 727.4126, 1181.6946, 628.94135, 1189.5432, 909.77545, 851.55035, 1162.1111, 295.3537]
2025-09-11 22:22:14,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 396.0, 232.0, 361.0, 209.0, 378.0, 295.0, 304.0, 370.0, 137.0]
2025-09-11 22:22:14,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 16 hours, 45 minutes, 20 seconds)
2025-09-11 22:33:23,095 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:33:23,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:34:32,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 759.88995 ± 461.400
2025-09-11 22:34:32,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [116.56832, 1529.1177, 1081.4585, 815.1441, 597.2629, 1401.3429, 282.37524, 617.3001, 214.96826, 943.36176]
2025-09-11 22:34:32,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [63.0, 515.0, 339.0, 324.0, 234.0, 450.0, 124.0, 236.0, 104.0, 290.0]
2025-09-11 22:34:32,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 16 hours, 36 minutes, 3 seconds)
2025-09-11 22:45:31,705 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:45:31,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:46:36,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 747.94824 ± 421.654
2025-09-11 22:46:36,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [270.37866, 61.168125, 188.04987, 1089.3119, 1339.8359, 995.96356, 948.98, 774.73346, 635.78546, 1175.2753]
2025-09-11 22:46:36,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [129.0, 51.0, 88.0, 343.0, 422.0, 301.0, 292.0, 242.0, 238.0, 379.0]
2025-09-11 22:46:36,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 16 hours, 23 minutes, 12 seconds)
2025-09-11 22:57:21,482 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:57:21,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:58:25,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 761.62823 ± 339.239
2025-09-11 22:58:25,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1028.6992, 828.6733, 814.48834, 862.794, 828.9878, 454.19727, 345.54312, 1398.3384, 897.79767, 156.76299]
2025-09-11 22:58:25,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [315.0, 273.0, 260.0, 269.0, 260.0, 170.0, 144.0, 428.0, 278.0, 75.0]
2025-09-11 22:58:25,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 16 hours, 6 minutes, 47 seconds)
2025-09-11 23:09:27,844 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:09:27,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:11:03,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1039.23889 ± 916.271
2025-09-11 23:11:03,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1641.7874, 482.02112, 2081.982, 71.861176, 1156.5211, 209.38147, 2878.427, 182.94139, 209.65665, 1477.8105]
2025-09-11 23:11:03,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [513.0, 189.0, 738.0, 54.0, 404.0, 98.0, 1000.0, 93.0, 105.0, 514.0]
2025-09-11 23:11:03,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 15 hours, 52 minutes, 35 seconds)
2025-09-11 23:22:22,621 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:22:22,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:24:20,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1430.09790 ± 342.866
2025-09-11 23:24:20,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [965.9137, 1857.5964, 1515.3612, 1908.2369, 1358.0748, 962.044, 1642.8273, 994.15967, 1384.1799, 1712.5847]
2025-09-11 23:24:20,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [327.0, 602.0, 462.0, 574.0, 417.0, 326.0, 514.0, 351.0, 414.0, 518.0]
2025-09-11 23:24:20,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (1430.10) for latency ExtremeClogL1U23
2025-09-11 23:24:20,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 15 hours, 56 minutes, 29 seconds)
2025-09-11 23:35:24,888 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:35:24,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:36:41,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 900.60480 ± 446.164
2025-09-11 23:36:41,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1242.7476, 936.6024, 1237.8038, 1288.6793, 449.93494, 1267.0732, 68.03199, 950.90643, 1305.0011, 259.26785]
2025-09-11 23:36:41,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [364.0, 285.0, 372.0, 385.0, 171.0, 373.0, 49.0, 301.0, 389.0, 107.0]
2025-09-11 23:36:41,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 15 hours, 44 minutes, 41 seconds)
2025-09-11 23:48:31,300 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:48:31,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:49:56,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 953.51501 ± 876.614
2025-09-11 23:49:56,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [206.31827, 991.0188, 3145.652, 264.41458, 1284.312, 345.5476, 1113.4457, 544.0201, 66.34205, 1574.0786]
2025-09-11 23:49:56,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 295.0, 1000.0, 112.0, 375.0, 140.0, 379.0, 207.0, 48.0, 480.0]
2025-09-11 23:49:56,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 15 hours, 50 minutes, 9 seconds)
2025-09-12 00:01:23,563 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:01:23,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:03:32,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1376.87939 ± 971.470
2025-09-12 00:03:32,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [781.68854, 443.40936, 2946.3088, 357.69562, 1002.39044, 937.5566, 2257.5564, 2113.4888, 2697.91, 230.78926]
2025-09-12 00:03:32,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [297.0, 197.0, 1000.0, 148.0, 355.0, 349.0, 766.0, 651.0, 855.0, 119.0]
2025-09-12 00:03:32,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 16 hours, 3 minutes, 47 seconds)
2025-09-12 00:15:15,325 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:15:15,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:17:08,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1255.83643 ± 753.004
2025-09-12 00:17:08,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2775.7, 1076.2205, 300.5071, 1809.7175, 1315.6649, 477.39847, 2070.2473, 401.50168, 954.13544, 1377.2711]
2025-09-12 00:17:08,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [874.0, 359.0, 141.0, 561.0, 396.0, 181.0, 671.0, 156.0, 335.0, 466.0]
2025-09-12 00:17:08,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 16 hours, 4 minutes, 48 seconds)
2025-09-12 00:28:34,052 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:28:34,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:30:56,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1609.19275 ± 854.082
2025-09-12 00:30:56,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1676.1847, 2225.8179, 2699.8604, 838.8771, 1873.8236, 260.21054, 2715.1643, 932.6084, 2299.3523, 570.0288]
2025-09-12 00:30:56,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [512.0, 742.0, 851.0, 291.0, 585.0, 135.0, 891.0, 313.0, 732.0, 207.0]
2025-09-12 00:30:56,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (1609.19) for latency ExtremeClogL1U23
2025-09-12 00:30:56,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 15 hours, 59 minutes, 3 seconds)
2025-09-12 00:42:39,876 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:42:39,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:44:57,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1551.60620 ± 836.289
2025-09-12 00:44:57,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [470.12988, 1692.014, 3008.4468, 1206.4226, 964.72546, 1107.775, 3104.8289, 1063.6299, 1867.3064, 1030.7827]
2025-09-12 00:44:57,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [180.0, 538.0, 1000.0, 372.0, 335.0, 321.0, 1000.0, 335.0, 612.0, 361.0]
2025-09-12 00:44:57,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 16 hours, 9 minutes, 28 seconds)
2025-09-12 00:57:07,815 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:57:07,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:58:36,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1002.91699 ± 831.123
2025-09-12 00:58:36,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2084.2605, 106.55484, 1820.4645, 31.374557, 2488.9934, 788.8968, 1107.3923, 156.64717, 1025.4338, 419.1522]
2025-09-12 00:58:36,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [602.0, 71.0, 537.0, 31.0, 818.0, 252.0, 361.0, 77.0, 336.0, 160.0]
2025-09-12 00:58:36,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 16 hours, 1 minute, 19 seconds)
2025-09-12 01:09:40,932 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:09:40,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:11:30,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1253.34399 ± 776.216
2025-09-12 01:11:30,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1260.6283, 2099.8755, 197.75778, 2040.5466, 155.49237, 1573.8922, 1407.2501, 1215.3887, 2349.2402, 233.36829]
2025-09-12 01:11:30,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [412.0, 660.0, 92.0, 577.0, 76.0, 499.0, 425.0, 419.0, 716.0, 104.0]
2025-09-12 01:11:30,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 15 hours, 37 minutes, 45 seconds)
2025-09-12 01:23:50,799 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:23:50,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:26:09,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1510.13440 ± 1128.302
2025-09-12 01:26:09,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3150.5286, 742.7948, 1265.2922, 1518.0409, 3103.1646, 214.7925, 1486.8989, 229.68164, 3011.6567, 378.49442]
2025-09-12 01:26:09,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 260.0, 416.0, 518.0, 1000.0, 97.0, 488.0, 114.0, 1000.0, 150.0]
2025-09-12 01:26:09,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 15 hours, 38 minutes, 26 seconds)
2025-09-12 01:36:59,885 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:36:59,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:39:15,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1516.25049 ± 1021.130
2025-09-12 01:39:15,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1502.2872, 79.08164, 2027.7157, 2987.4858, 2463.6565, 696.75226, 1177.8517, 722.4318, 427.90884, 3077.3333]
2025-09-12 01:39:15,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [443.0, 46.0, 689.0, 901.0, 751.0, 253.0, 413.0, 254.0, 168.0, 1000.0]
2025-09-12 01:39:15,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 15 hours, 15 minutes, 16 seconds)
2025-09-12 01:51:30,842 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:51:30,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:53:12,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1217.49902 ± 647.548
2025-09-12 01:53:12,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1281.3685, 477.69952, 1791.07, 103.63635, 962.6324, 1748.3656, 898.17267, 1503.9532, 973.3382, 2434.7542]
2025-09-12 01:53:12,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [371.0, 183.0, 541.0, 66.0, 312.0, 527.0, 293.0, 448.0, 278.0, 721.0]
2025-09-12 01:53:12,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 15 hours, 59 seconds)
2025-09-12 02:04:20,326 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:04:20,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:05:50,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1064.30005 ± 477.024
2025-09-12 02:05:50,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1651.2535, 1370.728, 787.1137, 290.4375, 1101.6016, 473.39597, 533.8512, 1554.9357, 1381.9884, 1497.6946]
2025-09-12 02:05:50,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [479.0, 446.0, 269.0, 124.0, 314.0, 184.0, 208.0, 464.0, 396.0, 426.0]
2025-09-12 02:05:50,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 14 hours, 33 minutes, 57 seconds)
2025-09-12 02:17:33,617 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:17:33,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:19:37,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1488.12744 ± 439.617
2025-09-12 02:19:37,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [934.9141, 1810.2173, 1993.0537, 1286.329, 1379.2627, 1363.968, 2336.8127, 1562.0278, 1410.8098, 803.87866]
2025-09-12 02:19:37,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [308.0, 527.0, 604.0, 377.0, 400.0, 400.0, 740.0, 488.0, 425.0, 270.0]
2025-09-12 02:19:37,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 14 hours, 32 minutes, 5 seconds)
2025-09-12 02:31:55,651 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:31:55,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:32:51,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 622.87958 ± 523.260
2025-09-12 02:32:51,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [786.9269, 1286.7872, 185.65314, 1443.2328, 373.33908, 60.06522, 185.41533, 200.05034, 329.03546, 1378.2902]
2025-09-12 02:32:51,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [246.0, 380.0, 85.0, 415.0, 149.0, 52.0, 84.0, 93.0, 147.0, 421.0]
2025-09-12 02:32:51,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 14 hours, 34 seconds)
2025-09-12 02:43:43,832 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:43:43,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:44:49,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 694.51105 ± 409.138
2025-09-12 02:44:49,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [99.83391, 814.6636, 482.6679, 1131.5227, 1366.7936, 85.098175, 980.8475, 615.8208, 965.2258, 402.63657]
2025-09-12 02:44:49,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [55.0, 290.0, 197.0, 351.0, 393.0, 60.0, 338.0, 231.0, 327.0, 159.0]
2025-09-12 02:44:49,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 13 hours, 32 minutes, 59 seconds)
2025-09-12 02:56:24,472 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:56:24,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:57:44,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 928.13623 ± 454.270
2025-09-12 02:57:44,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1168.0354, 505.11377, 543.3023, 1082.4479, 613.5979, 1365.6288, 980.9453, 134.87444, 1760.8738, 1126.543]
2025-09-12 02:57:44,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [359.0, 199.0, 201.0, 314.0, 216.0, 396.0, 323.0, 70.0, 510.0, 353.0]
2025-09-12 02:57:44,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 13 hours, 7 minutes, 15 seconds)
2025-09-12 03:08:52,222 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:08:52,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:10:17,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1001.84216 ± 436.406
2025-09-12 03:10:17,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1132.0516, 72.1173, 616.31146, 1511.9132, 583.5168, 1367.2423, 1165.8479, 867.4436, 1258.5897, 1443.3877]
2025-09-12 03:10:17,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [368.0, 51.0, 229.0, 472.0, 203.0, 394.0, 344.0, 292.0, 386.0, 437.0]
2025-09-12 03:10:17,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 12 hours, 53 minutes, 25 seconds)
2025-09-12 03:22:06,175 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:22:06,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:23:28,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 915.52911 ± 944.802
2025-09-12 03:23:28,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [292.0245, 80.27977, 1465.236, 445.8083, 305.38474, 1784.4905, 116.83835, 1308.7272, 223.27214, 3133.23]
2025-09-12 03:23:28,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [127.0, 62.0, 430.0, 169.0, 124.0, 525.0, 67.0, 428.0, 114.0, 1000.0]
2025-09-12 03:23:28,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 12 hours, 33 minutes, 23 seconds)
2025-09-12 03:34:50,146 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:34:50,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:37:08,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1693.47290 ± 867.649
2025-09-12 03:37:08,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1493.2664, 3037.7856, 836.37805, 1272.349, 2703.052, 1278.687, 2489.4714, 1519.8214, 73.21202, 2230.7056]
2025-09-12 03:37:08,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [448.0, 926.0, 276.0, 374.0, 802.0, 364.0, 751.0, 475.0, 44.0, 697.0]
2025-09-12 03:37:08,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (1693.47) for latency ExtremeClogL1U23
2025-09-12 03:37:08,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 12 hours, 25 minutes, 43 seconds)
2025-09-12 03:48:31,568 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:48:31,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:50:31,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1408.63501 ± 847.754
2025-09-12 03:50:31,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3127.599, 2058.5515, 1113.5947, 1413.0372, 1187.8948, 403.04617, 2153.861, 991.4096, 1605.1571, 32.19894]
2025-09-12 03:50:31,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 666.0, 330.0, 408.0, 357.0, 157.0, 698.0, 283.0, 528.0, 34.0]
2025-09-12 03:50:31,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 12 hours, 28 minutes, 59 seconds)
2025-09-12 04:02:16,933 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:02:16,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:04:48,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1730.36450 ± 937.324
2025-09-12 04:04:48,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3163.2173, 2937.0012, 2399.6807, 1589.3998, 1007.6841, 68.01956, 1142.8947, 2481.91, 1544.1199, 969.71796]
2025-09-12 04:04:48,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 931.0, 765.0, 501.0, 337.0, 49.0, 381.0, 800.0, 503.0, 330.0]
2025-09-12 04:04:48,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (1730.36) for latency ExtremeClogL1U23
2025-09-12 04:04:48,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 12 hours, 31 minutes, 1 second)
2025-09-12 04:15:57,160 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:15:57,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:17:51,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1393.01440 ± 960.930
2025-09-12 04:17:51,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [194.22186, 2316.6907, 230.50537, 2177.5435, 2128.2773, 445.98224, 3088.0242, 1341.2292, 1428.4094, 579.26]
2025-09-12 04:17:51,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [90.0, 675.0, 101.0, 653.0, 601.0, 171.0, 913.0, 396.0, 424.0, 225.0]
2025-09-12 04:17:51,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 12 hours, 23 minutes, 18 seconds)
2025-09-12 04:29:12,457 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:29:12,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:31:41,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1737.07300 ± 667.935
2025-09-12 04:31:41,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1020.8918, 2890.8752, 1277.0715, 1050.2739, 894.73114, 1780.3271, 2671.3447, 2210.9392, 2032.1536, 1542.123]
2025-09-12 04:31:41,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [348.0, 909.0, 445.0, 366.0, 312.0, 521.0, 800.0, 719.0, 659.0, 462.0]
2025-09-12 04:31:41,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (1737.07) for latency ExtremeClogL1U23
2025-09-12 04:31:41,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 12 hours, 16 minutes, 44 seconds)
2025-09-12 04:43:13,925 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:43:13,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:45:32,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1580.02271 ± 922.898
2025-09-12 04:45:32,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [215.97543, 3092.3364, 2560.266, 1874.4706, 732.77936, 1077.9773, 2875.4473, 1300.5139, 1000.5995, 1069.8607]
2025-09-12 04:45:32,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [99.0, 1000.0, 752.0, 606.0, 275.0, 358.0, 934.0, 421.0, 296.0, 384.0]
2025-09-12 04:45:32,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 12 hours, 4 minutes, 54 seconds)
2025-09-12 04:57:06,813 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:57:06,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:59:27,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1601.21375 ± 935.803
2025-09-12 04:59:27,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2338.594, 1402.743, 1293.6509, 1141.8728, 648.28455, 992.7262, 1667.0745, 3093.6118, 3188.511, 245.06903]
2025-09-12 04:59:27,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [765.0, 446.0, 380.0, 387.0, 244.0, 357.0, 496.0, 1000.0, 1000.0, 108.0]
2025-09-12 04:59:27,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 11 hours, 56 minutes, 55 seconds)
2025-09-12 05:10:45,098 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:10:45,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:11:44,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 666.95001 ± 812.985
2025-09-12 05:11:44,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [55.808285, 1247.3086, 83.36477, 74.15797, 111.43832, 1801.4525, 2368.0916, 781.0465, 71.76687, 75.06404]
2025-09-12 05:11:44,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [38.0, 400.0, 60.0, 53.0, 73.0, 523.0, 732.0, 259.0, 43.0, 59.0]
2025-09-12 05:11:44,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 11 hours, 22 minutes, 51 seconds)
2025-09-12 05:23:14,293 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:23:14,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:24:56,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1164.78796 ± 930.718
2025-09-12 05:24:56,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1075.3129, 2974.4795, 1392.5184, 2387.6248, 246.21315, 238.64572, 1856.0398, 410.40683, 155.01076, 911.6279]
2025-09-12 05:24:56,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [374.0, 913.0, 451.0, 729.0, 108.0, 120.0, 532.0, 159.0, 89.0, 319.0]
2025-09-12 05:24:56,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 11 hours, 10 minutes, 46 seconds)
2025-09-12 05:36:34,967 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:36:34,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:37:59,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1036.92200 ± 750.258
2025-09-12 05:37:59,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2652.7183, 228.83975, 1710.4426, 1030.1409, 1238.6359, 574.3289, 1270.2072, 176.30536, 156.52011, 1331.0797]
2025-09-12 05:37:59,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [755.0, 102.0, 508.0, 297.0, 394.0, 202.0, 352.0, 86.0, 75.0, 402.0]
2025-09-12 05:38:00,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 10 hours, 49 minutes, 47 seconds)
2025-09-12 05:49:45,611 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:49:45,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:52:08,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1653.89355 ± 899.157
2025-09-12 05:52:08,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2844.8293, 1248.7638, 1382.6672, 1607.252, 1023.9432, 1525.4092, 145.19225, 2628.5615, 3172.3267, 959.9897]
2025-09-12 05:52:08,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [887.0, 381.0, 417.0, 480.0, 343.0, 516.0, 74.0, 837.0, 1000.0, 325.0]
2025-09-12 05:52:08,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 10 hours, 39 minutes, 24 seconds)
2025-09-12 06:03:12,954 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:03:12,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:04:56,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1265.90161 ± 912.364
2025-09-12 06:04:56,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [130.58333, 1378.5375, 2433.0002, 147.83789, 2594.7292, 63.934128, 1815.436, 1919.7401, 636.67224, 1538.5448]
2025-09-12 06:04:56,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [68.0, 412.0, 712.0, 75.0, 772.0, 39.0, 533.0, 561.0, 226.0, 469.0]
2025-09-12 06:04:56,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 10 hours, 15 minutes, 39 seconds)
2025-09-12 06:16:19,794 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:16:19,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:17:53,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1147.09937 ± 462.268
2025-09-12 06:17:53,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1437.6488, 1029.021, 1920.7238, 859.06964, 490.9045, 1312.9127, 1173.0807, 1622.0746, 1286.6952, 338.86215]
2025-09-12 06:17:53,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [422.0, 292.0, 561.0, 288.0, 197.0, 381.0, 328.0, 515.0, 362.0, 133.0]
2025-09-12 06:17:53,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 10 hours, 8 minutes, 30 seconds)
2025-09-12 06:29:26,961 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:29:26,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:31:59,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1861.57129 ± 852.567
2025-09-12 06:31:59,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1854.473, 3232.1572, 2586.5085, 1941.0009, 1284.6868, 2314.0938, 2900.8198, 855.5877, 1013.23553, 633.1488]
2025-09-12 06:31:59,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [539.0, 1000.0, 814.0, 570.0, 364.0, 711.0, 833.0, 273.0, 336.0, 220.0]
2025-09-12 06:31:59,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (1861.57) for latency ExtremeClogL1U23
2025-09-12 06:31:59,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 10 hours, 3 minutes, 22 seconds)
2025-09-12 06:43:36,909 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:43:36,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:46:22,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2015.24146 ± 935.408
2025-09-12 06:46:22,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1583.7603, 672.2664, 3324.2905, 2571.3904, 993.2652, 3208.195, 1510.3248, 3232.326, 1700.3165, 1356.2798]
2025-09-12 06:46:22,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [452.0, 231.0, 1000.0, 768.0, 307.0, 1000.0, 428.0, 1000.0, 504.0, 406.0]
2025-09-12 06:46:22,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (2015.24) for latency ExtremeClogL1U23
2025-09-12 06:46:22,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 10 hours, 1 minute, 39 seconds)
2025-09-12 06:57:54,787 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:57:54,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:00:23,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1788.63770 ± 1112.495
2025-09-12 07:00:23,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [363.78873, 1725.2332, 1607.224, 3229.5151, 340.91678, 3228.652, 2192.1787, 2680.242, 2387.4714, 131.15523]
2025-09-12 07:00:23,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [146.0, 516.0, 505.0, 1000.0, 137.0, 964.0, 641.0, 792.0, 759.0, 70.0]
2025-09-12 07:00:23,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 9 hours, 47 minutes)
2025-09-12 07:11:53,572 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:11:53,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:14:16,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1646.62500 ± 948.169
2025-09-12 07:14:16,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2114.158, 1019.2316, 1301.8176, 3199.326, 1905.625, 2624.062, 2651.0947, 192.81575, 630.4809, 827.6395]
2025-09-12 07:14:16,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [620.0, 340.0, 429.0, 1000.0, 615.0, 841.0, 839.0, 104.0, 240.0, 292.0]
2025-09-12 07:14:16,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 9 hours, 42 minutes, 22 seconds)
2025-09-12 07:25:28,486 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:25:28,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:28:34,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2210.44385 ± 1046.677
2025-09-12 07:28:34,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3258.401, 2497.5615, 3235.9043, 3182.071, 1507.9, 149.17168, 2545.925, 1173.8372, 1317.4792, 3236.1895]
2025-09-12 07:28:34,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 775.0, 1000.0, 1000.0, 491.0, 74.0, 797.0, 404.0, 422.0, 943.0]
2025-09-12 07:28:34,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (2210.44) for latency ExtremeClogL1U23
2025-09-12 07:28:34,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 9 hours, 39 minutes, 38 seconds)
2025-09-12 07:40:21,259 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:40:21,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:42:19,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1465.36108 ± 867.874
2025-09-12 07:42:19,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1364.0199, 2084.1274, 1623.5955, 416.33923, 3383.9438, 1255.1166, 119.154594, 1632.4313, 920.56726, 1854.3152]
2025-09-12 07:42:19,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [392.0, 639.0, 462.0, 159.0, 1000.0, 363.0, 64.0, 461.0, 295.0, 538.0]
2025-09-12 07:42:19,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 9 hours, 22 minutes, 41 seconds)
2025-09-12 07:53:25,800 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:53:25,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:55:30,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1491.89502 ± 565.600
2025-09-12 07:55:30,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2695.2383, 1246.2213, 1328.4785, 1067.43, 835.9041, 2450.192, 1390.5825, 1390.9944, 1303.1436, 1210.7653]
2025-09-12 07:55:30,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [805.0, 399.0, 382.0, 308.0, 284.0, 726.0, 403.0, 437.0, 417.0, 393.0]
2025-09-12 07:55:30,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 59 minutes, 13 seconds)
2025-09-12 08:07:42,717 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:07:42,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:09:52,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1603.42908 ± 877.820
2025-09-12 08:09:52,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3307.7725, 1245.0989, 1859.4138, 195.36491, 2129.3618, 807.39685, 1576.1027, 1003.68744, 2698.7979, 1211.2943]
2025-09-12 08:09:52,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 378.0, 536.0, 91.0, 617.0, 260.0, 461.0, 290.0, 778.0, 340.0]
2025-09-12 08:09:52,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 8 hours, 47 minutes, 58 seconds)
2025-09-12 08:20:39,085 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:20:39,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:23:58,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2381.98096 ± 1226.410
2025-09-12 08:23:58,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3187.939, 3268.8115, 245.10092, 182.62482, 3144.4492, 3296.1733, 1339.1433, 3324.5686, 2569.8228, 3261.1777]
2025-09-12 08:23:58,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 104.0, 104.0, 922.0, 1000.0, 377.0, 1000.0, 805.0, 1000.0]
2025-09-12 08:23:58,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (2381.98) for latency ExtremeClogL1U23
2025-09-12 08:23:58,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 8 hours, 35 minutes, 44 seconds)
2025-09-12 08:35:40,801 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:35:40,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:37:57,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1616.61023 ± 1053.279
2025-09-12 08:37:57,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1451.2831, 801.9691, 2642.0708, 384.606, 667.9398, 2002.0784, 947.3333, 713.66156, 3322.9932, 3232.1682]
2025-09-12 08:37:57,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [413.0, 277.0, 781.0, 150.0, 249.0, 639.0, 337.0, 254.0, 1000.0, 963.0]
2025-09-12 08:37:57,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 8 hours, 19 minutes, 29 seconds)
2025-09-12 08:49:41,105 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:49:41,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:51:22,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1244.20325 ± 641.587
2025-09-12 08:51:22,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1020.3263, 2318.2595, 994.4079, 804.50507, 1119.9159, 2156.195, 54.401474, 1019.9327, 1820.0837, 1134.0052]
2025-09-12 08:51:22,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [314.0, 683.0, 294.0, 276.0, 330.0, 632.0, 37.0, 303.0, 525.0, 362.0]
2025-09-12 08:51:22,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 8 hours, 3 minutes, 25 seconds)
2025-09-12 09:03:12,999 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:03:13,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:06:02,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2016.29749 ± 991.201
2025-09-12 09:06:02,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [194.15706, 2609.8687, 1085.7172, 951.5024, 2580.061, 2318.205, 3239.4944, 1293.8716, 2956.372, 2933.7253]
2025-09-12 09:06:02,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [101.0, 788.0, 357.0, 317.0, 815.0, 694.0, 1000.0, 425.0, 878.0, 909.0]
2025-09-12 09:06:02,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 59 minutes, 41 seconds)
2025-09-12 09:16:44,552 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:16:44,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:18:52,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1510.67603 ± 1135.839
2025-09-12 09:18:52,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3403.092, 2147.2314, 128.57433, 986.25757, 523.0863, 464.104, 2409.918, 564.8256, 1267.1119, 3212.5576]
2025-09-12 09:18:52,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 627.0, 82.0, 322.0, 206.0, 187.0, 710.0, 219.0, 407.0, 1000.0]
2025-09-12 09:18:52,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 7 hours, 35 minutes, 26 seconds)
2025-09-12 09:30:25,246 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:30:25,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:32:02,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1192.29907 ± 501.534
2025-09-12 09:32:02,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1628.1548, 263.82635, 1433.7853, 1320.9828, 2096.0679, 1216.4744, 1286.4535, 1172.1514, 1048.3818, 456.71158]
2025-09-12 09:32:02,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [516.0, 116.0, 404.0, 374.0, 616.0, 378.0, 359.0, 327.0, 319.0, 189.0]
2025-09-12 09:32:02,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 7 hours, 15 minutes, 39 seconds)
2025-09-12 09:44:16,996 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:44:17,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:45:56,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1195.57300 ± 707.971
2025-09-12 09:45:56,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1026.5238, 1596.6205, 1343.5715, 1333.0233, 2660.337, 96.28142, 64.117096, 1258.889, 1539.8663, 1036.4999]
2025-09-12 09:45:56,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [346.0, 449.0, 426.0, 400.0, 795.0, 66.0, 51.0, 351.0, 460.0, 337.0]
2025-09-12 09:45:56,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 1 minute, 33 seconds)
2025-09-12 09:56:55,250 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:56:55,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:59:57,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2092.90137 ± 1022.999
2025-09-12 09:59:57,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2426.7358, 1497.7283, 3159.618, 3235.331, 3013.2664, 1012.43524, 3235.898, 1898.418, 1201.4329, 248.15105]
2025-09-12 09:59:57,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [765.0, 494.0, 1000.0, 1000.0, 937.0, 345.0, 1000.0, 613.0, 417.0, 120.0]
2025-09-12 09:59:57,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 51 minutes, 29 seconds)
2025-09-12 10:11:34,192 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:11:34,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:13:47,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1581.87061 ± 938.819
2025-09-12 10:13:47,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3405.103, 620.9053, 229.78032, 1780.4376, 2375.66, 356.3142, 1560.7777, 2313.1548, 1713.2053, 1463.368]
2025-09-12 10:13:47,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 216.0, 117.0, 570.0, 705.0, 141.0, 456.0, 712.0, 542.0, 471.0]
2025-09-12 10:13:47,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 32 minutes, 54 seconds)
2025-09-12 10:25:15,277 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:25:15,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:27:16,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1504.50647 ± 674.593
2025-09-12 10:27:16,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2203.2078, 1012.6462, 222.44382, 1293.2025, 1743.2778, 1460.3369, 2689.6382, 1385.8109, 2089.2656, 945.234]
2025-09-12 10:27:16,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [637.0, 334.0, 114.0, 423.0, 503.0, 428.0, 785.0, 400.0, 592.0, 276.0]
2025-09-12 10:27:16,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 23 minutes)
2025-09-12 10:39:12,960 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:39:12,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:41:39,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1735.93652 ± 1058.668
2025-09-12 10:41:39,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3269.4998, 1855.9812, 1066.0557, 1207.669, 1732.2866, 2778.012, 1823.6681, 195.00235, 3239.5378, 191.6526]
2025-09-12 10:41:39,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 574.0, 366.0, 355.0, 558.0, 812.0, 577.0, 90.0, 1000.0, 89.0]
2025-09-12 10:41:39,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 15 minutes, 54 seconds)
2025-09-12 10:52:48,708 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:52:48,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:55:43,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2009.35583 ± 928.771
2025-09-12 10:55:43,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1672.5154, 2816.4966, 1770.4403, 579.4292, 3193.0266, 2677.0354, 2020.5878, 3239.0771, 511.53268, 1613.417]
2025-09-12 10:55:43,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [543.0, 903.0, 578.0, 209.0, 1000.0, 836.0, 663.0, 1000.0, 187.0, 534.0]
2025-09-12 10:55:43,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 2 minutes, 52 seconds)
2025-09-12 11:06:52,965 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:06:52,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:09:21,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1693.90625 ± 1329.498
2025-09-12 11:09:21,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2997.661, 3167.1482, 220.42593, 2474.2834, 68.60407, 732.5877, 3153.178, 297.65482, 632.37335, 3195.1462]
2025-09-12 11:09:21,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [928.0, 1000.0, 99.0, 781.0, 55.0, 249.0, 1000.0, 137.0, 225.0, 1000.0]
2025-09-12 11:09:21,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 46 minutes, 57 seconds)
2025-09-12 11:21:05,785 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:21:05,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:23:01,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1341.86108 ± 1035.849
2025-09-12 11:23:01,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [849.4183, 1614.8933, 1082.9269, 1364.0455, 3237.668, 684.4116, 303.49564, 3252.4802, 181.73975, 847.5306]
2025-09-12 11:23:01,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [284.0, 521.0, 358.0, 450.0, 934.0, 238.0, 148.0, 1000.0, 93.0, 284.0]
2025-09-12 11:23:02,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 32 minutes, 22 seconds)
2025-09-12 11:34:53,798 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:34:53,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:36:59,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1426.10925 ± 1082.912
2025-09-12 11:36:59,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [794.0158, 710.64575, 65.89467, 615.6792, 374.9049, 2544.1753, 2810.5762, 3181.6133, 908.5696, 2255.0183]
2025-09-12 11:36:59,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [283.0, 244.0, 53.0, 223.0, 153.0, 802.0, 868.0, 1000.0, 302.0, 717.0]
2025-09-12 11:36:59,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 20 minutes, 44 seconds)
2025-09-12 11:48:40,848 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:48:40,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:50:03,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 930.62268 ± 590.153
2025-09-12 11:50:03,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1583.8807, 127.60499, 1530.7701, 357.7535, 729.15594, 437.65652, 806.3296, 1382.5007, 433.69995, 1916.8739]
2025-09-12 11:50:03,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [521.0, 68.0, 436.0, 140.0, 248.0, 166.0, 278.0, 450.0, 179.0, 559.0]
2025-09-12 11:50:03,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 57 seconds)
2025-09-12 12:01:32,202 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:01:32,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:03:12,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1162.79126 ± 983.802
2025-09-12 12:03:12,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [345.4749, 2329.056, 237.9276, 1462.7136, 137.728, 236.06639, 1541.136, 3281.451, 775.7129, 1280.6464]
2025-09-12 12:03:12,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [140.0, 704.0, 104.0, 463.0, 70.0, 118.0, 504.0, 1000.0, 243.0, 367.0]
2025-09-12 12:03:12,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 43 minutes, 25 seconds)
2025-09-12 12:13:56,968 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:13:56,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:15:54,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1338.64258 ± 1138.306
2025-09-12 12:15:54,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3159.873, 435.45917, 472.43314, 2369.7432, 498.25574, 98.12703, 586.5789, 3120.7458, 607.00903, 2038.2006]
2025-09-12 12:15:54,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 165.0, 171.0, 701.0, 182.0, 66.0, 209.0, 1000.0, 213.0, 661.0]
2025-09-12 12:15:54,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 26 minutes, 14 seconds)
2025-09-12 12:27:25,812 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:27:25,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:28:56,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1082.18494 ± 682.430
2025-09-12 12:28:56,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1316.0813, 1590.2251, 996.81757, 122.58595, 934.1932, 2583.7341, 394.08533, 1302.6964, 1263.7948, 317.63586]
2025-09-12 12:28:56,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [380.0, 469.0, 324.0, 64.0, 307.0, 755.0, 152.0, 363.0, 376.0, 130.0]
2025-09-12 12:28:56,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 10 minutes, 26 seconds)
2025-09-12 12:40:32,049 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:40:32,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:43:21,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2094.76831 ± 934.112
2025-09-12 12:43:21,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1826.8633, 1771.6637, 1842.6227, 3251.0198, 764.55505, 1276.0039, 927.7394, 3398.2732, 2625.045, 3263.8965]
2025-09-12 12:43:21,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [532.0, 507.0, 523.0, 1000.0, 259.0, 379.0, 294.0, 978.0, 784.0, 1000.0]
2025-09-12 12:43:21,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 58 minutes, 52 seconds)
2025-09-12 12:55:11,676 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:55:11,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:56:57,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1294.39783 ± 447.798
2025-09-12 12:56:57,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1304.5503, 1375.4207, 1235.9917, 546.08246, 1317.6738, 702.6249, 1325.6874, 1320.9175, 2314.2893, 1500.7399]
2025-09-12 12:56:57,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [378.0, 390.0, 355.0, 194.0, 393.0, 239.0, 382.0, 378.0, 718.0, 484.0]
2025-09-12 12:56:57,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 47 minutes, 26 seconds)
2025-09-12 13:08:03,503 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:08:03,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:10:10,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1512.74304 ± 1033.159
2025-09-12 13:10:10,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3232.9426, 458.3126, 83.476456, 3278.958, 1254.6954, 2227.116, 1503.8354, 1343.3608, 785.5083, 959.2257]
2025-09-12 13:10:10,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 173.0, 58.0, 1000.0, 351.0, 688.0, 424.0, 432.0, 267.0, 319.0]
2025-09-12 13:10:10,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 34 minutes, 16 seconds)
2025-09-12 13:21:44,781 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:21:44,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:24:02,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1718.86938 ± 895.954
2025-09-12 13:24:02,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1962.2783, 2087.4993, 632.71014, 2587.3506, 1035.6117, 3446.4294, 1968.091, 1855.2411, 1399.2935, 214.18983]
2025-09-12 13:24:02,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [566.0, 604.0, 225.0, 724.0, 334.0, 994.0, 617.0, 513.0, 424.0, 97.0]
2025-09-12 13:24:02,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 24 minutes, 23 seconds)
2025-09-12 13:35:51,041 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:35:51,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:37:33,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1206.30542 ± 835.293
2025-09-12 13:37:33,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [446.60016, 1515.3212, 70.79869, 1354.2335, 1604.703, 1295.4429, 52.836945, 2343.4011, 2621.3772, 758.3385]
2025-09-12 13:37:33,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [167.0, 458.0, 57.0, 430.0, 458.0, 429.0, 49.0, 692.0, 788.0, 271.0]
2025-09-12 13:37:33,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 12 minutes, 9 seconds)
2025-09-12 13:48:59,779 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:48:59,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:50:43,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1231.83618 ± 889.763
2025-09-12 13:50:43,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [196.45604, 1787.2574, 761.9844, 1559.1018, 1595.1901, 104.56319, 2711.14, 704.18866, 2508.5708, 389.9091]
2025-09-12 13:50:43,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 521.0, 265.0, 464.0, 455.0, 58.0, 793.0, 247.0, 781.0, 173.0]
2025-09-12 13:50:43,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 55 minutes, 11 seconds)
2025-09-12 14:02:17,443 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:02:17,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:04:55,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1851.46130 ± 1070.593
2025-09-12 14:04:55,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1634.8944, 1519.576, 3193.1572, 67.22706, 1116.6354, 2440.4163, 3235.7302, 3223.9775, 1442.8052, 640.1963]
2025-09-12 14:04:55,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [524.0, 489.0, 1000.0, 44.0, 370.0, 762.0, 1000.0, 1000.0, 418.0, 222.0]
2025-09-12 14:04:55,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 43 minutes, 8 seconds)
2025-09-12 14:16:42,691 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:16:42,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:18:22,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1252.01208 ± 588.104
2025-09-12 14:18:22,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [515.07996, 2312.4612, 944.0835, 1276.8248, 1520.2281, 1321.0742, 2088.4653, 937.1339, 1258.3262, 346.44476]
2025-09-12 14:18:22,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [180.0, 693.0, 310.0, 370.0, 417.0, 368.0, 577.0, 276.0, 346.0, 136.0]
2025-09-12 14:18:22,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 30 minutes, 3 seconds)
2025-09-12 14:30:12,565 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:30:12,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:32:07,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1484.45178 ± 654.033
2025-09-12 14:32:07,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1396.3806, 1145.8512, 1290.5067, 1315.8348, 1867.8809, 2735.5513, 75.899506, 1936.5911, 1238.1394, 1841.882]
2025-09-12 14:32:07,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [426.0, 336.0, 364.0, 367.0, 533.0, 772.0, 45.0, 548.0, 348.0, 528.0]
2025-09-12 14:32:07,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 16 minutes, 9 seconds)
2025-09-12 14:43:21,111 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:43:21,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:45:00,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1275.99084 ± 601.328
2025-09-12 14:45:00,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [884.94196, 1324.7538, 179.28125, 1221.3303, 1380.5923, 1262.0856, 1055.631, 1249.0076, 1455.88, 2746.4038]
2025-09-12 14:45:00,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [270.0, 370.0, 87.0, 350.0, 401.0, 349.0, 300.0, 355.0, 406.0, 789.0]
2025-09-12 14:45:00,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 1 minute, 23 seconds)
2025-09-12 14:56:06,087 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:56:06,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:58:08,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1462.80652 ± 1188.300
2025-09-12 14:58:08,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1474.3096, 3297.8672, 576.32697, 3471.7703, 726.6439, 1629.827, 2572.7969, 245.28148, 415.2152, 218.02875]
2025-09-12 14:58:08,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [415.0, 1000.0, 205.0, 1000.0, 248.0, 519.0, 795.0, 106.0, 160.0, 112.0]
2025-09-12 14:58:08,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 47 minutes, 51 seconds)
2025-09-12 15:10:37,412 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:10:37,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:12:41,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1474.47229 ± 1154.817
2025-09-12 15:12:41,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [700.5885, 2496.9368, 748.07043, 3214.2517, 354.27405, 1677.7804, 3351.02, 323.4379, 96.07922, 1782.283]
2025-09-12 15:12:41,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [242.0, 707.0, 256.0, 1000.0, 142.0, 476.0, 1000.0, 128.0, 70.0, 566.0]
2025-09-12 15:12:41,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 34 minutes, 51 seconds)
2025-09-12 15:23:43,384 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:23:43,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:25:46,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1534.01453 ± 356.265
2025-09-12 15:25:46,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1304.9333, 1671.672, 1843.4562, 1522.6793, 1282.1844, 1684.3776, 1829.4098, 1033.1398, 1004.16895, 2164.1238]
2025-09-12 15:25:46,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [389.0, 474.0, 549.0, 420.0, 369.0, 522.0, 570.0, 329.0, 338.0, 623.0]
2025-09-12 15:25:46,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 20 minutes, 53 seconds)
2025-09-12 15:36:47,113 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:36:47,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:38:49,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1459.46741 ± 1167.564
2025-09-12 15:38:49,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [414.9598, 3289.844, 1342.93, 164.81116, 409.12335, 3268.8599, 1357.9581, 1833.4786, 54.01362, 2458.6956]
2025-09-12 15:38:49,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [157.0, 1000.0, 388.0, 79.0, 172.0, 1000.0, 393.0, 571.0, 48.0, 711.0]
2025-09-12 15:38:49,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 6 minutes, 41 seconds)
2025-09-12 15:51:00,346 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:51:00,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:53:04,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1506.33350 ± 849.117
2025-09-12 15:53:04,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1751.4823, 3361.9468, 1868.6254, 1283.2024, 1301.3488, 207.87587, 870.20483, 942.63025, 2433.6292, 1042.39]
2025-09-12 15:53:04,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [535.0, 1000.0, 525.0, 373.0, 375.0, 96.0, 298.0, 323.0, 741.0, 329.0]
2025-09-12 15:53:04,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 54 minutes, 27 seconds)
2025-09-12 16:04:17,135 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:04:17,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:06:20,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1489.58862 ± 1072.080
2025-09-12 16:06:20,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2765.9087, 1642.4943, 1608.5928, 2700.0007, 66.64414, 1174.1824, 3299.864, 155.38043, 915.9438, 566.8763]
2025-09-12 16:06:20,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [838.0, 480.0, 456.0, 815.0, 54.0, 372.0, 1000.0, 75.0, 288.0, 201.0]
2025-09-12 16:06:20,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 40 minutes, 55 seconds)
2025-09-12 16:17:42,886 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:17:42,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:19:08,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1055.38086 ± 605.667
2025-09-12 16:19:08,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1283.0325, 1725.7089, 256.0539, 558.0389, 1701.3105, 1489.6333, 130.66692, 448.92343, 1266.6438, 1693.7965]
2025-09-12 16:19:08,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [366.0, 487.0, 124.0, 201.0, 490.0, 418.0, 66.0, 165.0, 381.0, 479.0]
2025-09-12 16:19:08,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 26 minutes, 34 seconds)
2025-09-12 16:30:50,541 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:30:50,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:32:35,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1299.49438 ± 769.763
2025-09-12 16:32:35,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1020.8809, 3299.7666, 1249.8228, 1863.2551, 1426.9331, 469.57385, 545.91754, 1223.1553, 936.1006, 959.5391]
2025-09-12 16:32:35,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [316.0, 945.0, 363.0, 528.0, 408.0, 175.0, 192.0, 361.0, 287.0, 312.0]
2025-09-12 16:32:35,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 13 minutes, 21 seconds)
2025-09-12 16:44:01,557 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:44:01,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:45:54,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1364.30322 ± 790.766
2025-09-12 16:45:54,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [875.9443, 1538.4054, 448.87054, 182.00641, 1943.7344, 1657.9385, 2185.778, 2790.7812, 1437.2596, 582.31366]
2025-09-12 16:45:54,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [310.0, 437.0, 172.0, 84.0, 605.0, 488.0, 628.0, 816.0, 406.0, 208.0]
2025-09-12 16:45:54,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1251 [DEBUG]: Training session finished
