2025-09-11 19:05:39,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc20-hopper/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 19:05:39,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc20-hopper/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 19:05:39,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14f8e67081d0>}
2025-09-11 19:05:39,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1111 [DEBUG]: using device: cuda
2025-09-11 19:05:39,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1133 [INFO]: Creating new trainer
2025-09-11 19:05:39,173 baseline-mbpac-noiseperc20-hopper:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-09-11 19:05:39,173 baseline-mbpac-noiseperc20-hopper:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 19:05:39,181 baseline-mbpac-noiseperc20-hopper:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=3, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-11 19:05:40,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1194 [DEBUG]: Starting training session...
2025-09-11 19:05:40,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 1/100
2025-09-11 19:16:02,586 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:16:02,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:16:16,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 82.06602 ± 22.063
2025-09-11 19:16:16,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [95.47369, 94.70897, 98.59175, 94.53743, 96.0614, 93.11959, 95.1106, 61.609554, 29.123302, 62.323975]
2025-09-11 19:16:16,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [55.0, 54.0, 57.0, 55.0, 55.0, 53.0, 54.0, 36.0, 24.0, 46.0]
2025-09-11 19:16:16,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (82.07) for latency ExtremeClogL1U23
2025-09-11 19:16:16,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 17 hours, 30 minutes, 4 seconds)
2025-09-11 19:28:09,783 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:28:09,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:28:33,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 163.42061 ± 78.961
2025-09-11 19:28:33,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [189.75717, 264.85022, 148.07225, 211.18324, 104.59673, 85.24745, 103.88086, 31.428263, 202.09062, 293.0992]
2025-09-11 19:28:33,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [92.0, 114.0, 79.0, 97.0, 57.0, 62.0, 63.0, 33.0, 99.0, 131.0]
2025-09-11 19:28:33,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (163.42) for latency ExtremeClogL1U23
2025-09-11 19:28:33,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 18 hours, 41 minutes, 11 seconds)
2025-09-11 19:40:19,268 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:40:19,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:40:50,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 225.94263 ± 95.919
2025-09-11 19:40:50,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [292.05002, 309.83447, 266.86172, 297.8374, 28.784433, 266.11835, 255.63496, 48.56377, 242.0719, 251.6693]
2025-09-11 19:40:50,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [126.0, 138.0, 115.0, 138.0, 22.0, 122.0, 129.0, 34.0, 151.0, 122.0]
2025-09-11 19:40:50,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (225.94) for latency ExtremeClogL1U23
2025-09-11 19:40:50,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 18 hours, 57 minutes, 4 seconds)
2025-09-11 19:52:39,961 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:52:39,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:53:20,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 195.32016 ± 115.906
2025-09-11 19:53:20,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [322.33795, 355.00998, 6.5925617, 122.730606, 152.20201, 212.83557, 364.69028, 124.73222, 220.62831, 71.4421]
2025-09-11 19:53:20,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [158.0, 197.0, 9.0, 150.0, 183.0, 149.0, 211.0, 137.0, 179.0, 56.0]
2025-09-11 19:53:20,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 19 hours, 3 minutes, 54 seconds)
2025-09-11 20:05:58,529 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:05:58,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:07:19,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 333.71600 ± 141.785
2025-09-11 20:07:19,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [282.48132, 163.70883, 437.0102, 374.82355, 209.04294, 276.21936, 619.0022, 342.07925, 148.71469, 484.07758]
2025-09-11 20:07:19,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [257.0, 132.0, 423.0, 364.0, 184.0, 268.0, 531.0, 307.0, 130.0, 305.0]
2025-09-11 20:07:19,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (333.72) for latency ExtremeClogL1U23
2025-09-11 20:07:19,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 19 hours, 31 minutes, 22 seconds)
2025-09-11 20:17:58,524 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:17:58,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:18:42,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 325.73700 ± 112.247
2025-09-11 20:18:42,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [312.11206, 334.65994, 304.15387, 353.657, 300.64603, 339.59268, 54.04316, 544.5326, 353.80624, 360.16626]
2025-09-11 20:18:42,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 148.0, 135.0, 163.0, 127.0, 151.0, 47.0, 387.0, 155.0, 168.0]
2025-09-11 20:18:42,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 19 hours, 33 minutes, 47 seconds)
2025-09-11 20:30:02,271 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:30:02,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:31:03,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 332.25412 ± 186.858
2025-09-11 20:31:03,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [388.82867, 206.19482, 403.26816, 19.187431, 385.95343, 259.04483, 348.76016, 515.5027, 101.00153, 694.79956]
2025-09-11 20:31:03,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [214.0, 143.0, 237.0, 22.0, 305.0, 155.0, 290.0, 332.0, 76.0, 490.0]
2025-09-11 20:31:04,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 19 hours, 22 minutes, 42 seconds)
2025-09-11 20:42:28,373 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:42:28,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:43:08,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 304.47867 ± 84.928
2025-09-11 20:43:08,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [345.52625, 355.80994, 343.20975, 59.61603, 359.3009, 275.5594, 342.12982, 327.99878, 327.3547, 308.2811]
2025-09-11 20:43:08,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [170.0, 168.0, 159.0, 43.0, 161.0, 148.0, 160.0, 140.0, 152.0, 145.0]
2025-09-11 20:43:08,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 19 hours, 6 minutes, 18 seconds)
2025-09-11 20:54:33,632 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:54:33,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:55:01,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 232.55386 ± 142.309
2025-09-11 20:55:01,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [254.11664, 9.797465, 373.50928, 349.654, 240.74866, 333.7576, 328.74033, 56.912796, 9.326318, 368.97534]
2025-09-11 20:55:01,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [129.0, 12.0, 154.0, 131.0, 112.0, 136.0, 134.0, 41.0, 16.0, 139.0]
2025-09-11 20:55:01,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 18 hours, 42 minutes, 41 seconds)
2025-09-11 21:06:27,378 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:06:27,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:06:54,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 208.50845 ± 192.631
2025-09-11 21:06:54,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [331.4437, 256.45657, 10.167855, 52.259327, 77.69186, 13.880232, 577.5284, 414.69485, 11.10658, 339.855]
2025-09-11 21:06:54,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [143.0, 151.0, 13.0, 48.0, 47.0, 22.0, 247.0, 170.0, 15.0, 140.0]
2025-09-11 21:06:54,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 17 hours, 52 minutes, 28 seconds)
2025-09-11 21:18:18,279 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:18:18,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:19:05,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 407.68610 ± 430.377
2025-09-11 21:19:05,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1601.9711, 447.30045, 53.06159, 419.36136, 285.81808, 203.74272, 259.00452, 575.3209, 218.6694, 12.610804]
2025-09-11 21:19:05,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [608.0, 203.0, 34.0, 183.0, 144.0, 99.0, 122.0, 190.0, 104.0, 20.0]
2025-09-11 21:19:05,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (407.69) for latency ExtremeClogL1U23
2025-09-11 21:19:05,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 17 hours, 54 minutes, 45 seconds)
2025-09-11 21:30:28,047 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:30:28,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:31:08,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 367.97189 ± 217.676
2025-09-11 21:31:08,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [36.723034, 286.5171, 664.6126, 530.9339, 235.54062, 547.5855, 358.28964, 375.23273, 14.920422, 629.3635]
2025-09-11 21:31:08,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [72.0, 122.0, 220.0, 189.0, 105.0, 206.0, 137.0, 163.0, 20.0, 240.0]
2025-09-11 21:31:08,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 17 hours, 37 minutes, 17 seconds)
2025-09-11 21:42:16,749 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:42:16,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:43:02,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 437.24438 ± 268.143
2025-09-11 21:43:02,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [345.1278, 60.714195, 444.00272, 12.9992895, 673.58026, 253.88544, 356.2989, 734.23773, 829.68854, 661.9091]
2025-09-11 21:43:02,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [192.0, 54.0, 165.0, 14.0, 221.0, 108.0, 154.0, 245.0, 275.0, 233.0]
2025-09-11 21:43:02,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (437.24) for latency ExtremeClogL1U23
2025-09-11 21:43:02,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 17 hours, 22 minutes, 11 seconds)
2025-09-11 21:54:37,148 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:54:37,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:55:22,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 453.00366 ± 362.619
2025-09-11 21:55:22,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [657.1079, 336.2929, 1067.8746, 238.1576, 617.98615, 919.3889, 17.92805, 21.975391, 629.7165, 23.6091]
2025-09-11 21:55:22,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [216.0, 148.0, 357.0, 102.0, 207.0, 310.0, 16.0, 25.0, 246.0, 24.0]
2025-09-11 21:55:22,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (453.00) for latency ExtremeClogL1U23
2025-09-11 21:55:22,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 17 hours, 17 minutes, 55 seconds)
2025-09-11 22:06:19,251 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:06:19,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:07:22,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 627.85413 ± 367.562
2025-09-11 22:07:22,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [720.2103, 655.7906, 985.52045, 616.64404, 441.9938, 246.17142, 211.29132, 1513.869, 502.37885, 384.6709]
2025-09-11 22:07:22,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [235.0, 221.0, 338.0, 231.0, 192.0, 108.0, 97.0, 519.0, 210.0, 150.0]
2025-09-11 22:07:22,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (627.85) for latency ExtremeClogL1U23
2025-09-11 22:07:22,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 17 hours, 7 minutes, 56 seconds)
2025-09-11 22:18:40,306 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:18:40,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:19:56,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 761.41199 ± 398.730
2025-09-11 22:19:56,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [660.20483, 1087.0673, 604.09216, 318.82788, 1426.1252, 1149.5336, 1028.4059, 16.247763, 745.6405, 577.9748]
2025-09-11 22:19:56,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [224.0, 398.0, 222.0, 141.0, 484.0, 425.0, 410.0, 20.0, 277.0, 213.0]
2025-09-11 22:19:56,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (761.41) for latency ExtremeClogL1U23
2025-09-11 22:19:56,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 17 hours, 2 minutes, 19 seconds)
2025-09-11 22:31:12,597 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:31:12,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:32:28,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 717.35449 ± 599.456
2025-09-11 22:32:28,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [222.56406, 1325.224, 789.1859, 390.51996, 13.065111, 512.6771, 1902.195, 11.900155, 1366.7242, 639.4895]
2025-09-11 22:32:28,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [159.0, 479.0, 282.0, 167.0, 17.0, 238.0, 709.0, 14.0, 485.0, 245.0]
2025-09-11 22:32:28,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 16 hours, 58 minutes, 2 seconds)
2025-09-11 22:43:46,732 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:43:46,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:45:13,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 861.00916 ± 912.619
2025-09-11 22:45:13,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [223.7908, 12.680925, 86.78747, 1734.7727, 2925.3037, 1461.181, 244.83136, 17.65403, 715.6292, 1187.4607]
2025-09-11 22:45:13,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [103.0, 14.0, 51.0, 607.0, 1000.0, 494.0, 117.0, 22.0, 323.0, 456.0]
2025-09-11 22:45:13,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (861.01) for latency ExtremeClogL1U23
2025-09-11 22:45:13,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 16 hours, 59 minutes, 52 seconds)
2025-09-11 22:56:37,502 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:56:37,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:58:06,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 883.98486 ± 637.061
2025-09-11 22:58:06,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1591.7928, 1385.3086, 1812.9777, 978.1465, 170.66084, 1611.9255, 421.4733, 566.9965, 197.66017, 102.90769]
2025-09-11 22:58:06,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [536.0, 504.0, 624.0, 379.0, 92.0, 564.0, 162.0, 195.0, 94.0, 57.0]
2025-09-11 22:58:06,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (883.98) for latency ExtremeClogL1U23
2025-09-11 22:58:06,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 16 hours, 56 minutes, 16 seconds)
2025-09-11 23:09:24,174 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:09:24,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:10:05,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 409.95645 ± 446.123
2025-09-11 23:10:05,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1463.9996, 154.6722, 304.186, 855.15027, 393.29233, 25.966976, 7.4427295, 710.4411, 129.17314, 55.23999]
2025-09-11 23:10:05,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [469.0, 75.0, 137.0, 291.0, 142.0, 42.0, 13.0, 261.0, 68.0, 37.0]
2025-09-11 23:10:05,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 16 hours, 43 minutes, 25 seconds)
2025-09-11 23:21:05,709 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:21:05,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:22:00,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 548.72290 ± 431.901
2025-09-11 23:22:00,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1249.4531, 554.8379, 91.05257, 133.53136, 867.739, 246.40521, 1351.9181, 418.83188, 282.33414, 291.1255]
2025-09-11 23:22:00,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [433.0, 210.0, 51.0, 68.0, 273.0, 109.0, 465.0, 173.0, 132.0, 129.0]
2025-09-11 23:22:00,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 16 hours, 20 minutes, 37 seconds)
2025-09-11 23:34:04,031 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:34:04,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:36:08,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1165.52478 ± 831.743
2025-09-11 23:36:08,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [2284.5264, 370.72263, 1585.3256, 1679.2784, 14.444172, 22.237534, 1282.5072, 1422.8113, 2408.4531, 584.9413]
2025-09-11 23:36:08,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [812.0, 150.0, 542.0, 599.0, 15.0, 22.0, 471.0, 482.0, 829.0, 221.0]
2025-09-11 23:36:08,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (1165.52) for latency ExtremeClogL1U23
2025-09-11 23:36:08,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 16 hours, 33 minutes, 13 seconds)
2025-09-11 23:48:16,201 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:48:16,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:49:35,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 712.87933 ± 455.945
2025-09-11 23:49:35,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [161.00381, 917.96796, 871.4761, 1561.569, 404.95566, 843.4804, 1195.2584, 861.09033, 160.24458, 151.74771]
2025-09-11 23:49:35,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 323.0, 310.0, 534.0, 212.0, 340.0, 436.0, 270.0, 76.0, 78.0]
2025-09-11 23:49:35,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 16 hours, 31 minutes, 21 seconds)
2025-09-12 00:02:25,853 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:02:25,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:04:07,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 973.72784 ± 716.533
2025-09-12 00:04:07,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [9.366391, 764.1765, 415.55566, 1068.901, 532.5515, 1100.9227, 1929.6233, 569.12775, 786.963, 2560.0906]
2025-09-12 00:04:07,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [11.0, 275.0, 161.0, 342.0, 206.0, 417.0, 664.0, 209.0, 297.0, 827.0]
2025-09-12 00:04:07,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 16 hours, 43 minutes, 33 seconds)
2025-09-12 00:16:57,511 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:16:57,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:18:32,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 841.73938 ± 823.876
2025-09-12 00:18:32,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [292.34665, 876.9369, 926.65405, 17.166649, 290.85635, 1801.3976, 2825.8967, 93.62293, 653.59564, 638.9195]
2025-09-12 00:18:32,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [138.0, 318.0, 301.0, 16.0, 158.0, 608.0, 1000.0, 54.0, 307.0, 239.0]
2025-09-12 00:18:32,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 17 hours, 6 minutes, 43 seconds)
2025-09-12 00:30:22,432 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:30:22,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:31:34,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 682.43762 ± 752.358
2025-09-12 00:31:34,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [803.4752, 1021.64655, 187.19077, 442.6522, 66.10147, 444.43372, 2703.3176, 108.2747, 928.228, 119.05629]
2025-09-12 00:31:34,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [264.0, 320.0, 88.0, 168.0, 42.0, 223.0, 881.0, 60.0, 309.0, 61.0]
2025-09-12 00:31:34,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 17 hours, 9 minutes, 30 seconds)
2025-09-12 00:43:57,101 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:43:57,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:44:53,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 459.63339 ± 320.045
2025-09-12 00:44:53,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [692.25806, 50.03237, 542.8291, 908.5251, 141.5399, 107.72784, 772.7863, 659.41974, 35.52247, 685.6934]
2025-09-12 00:44:53,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [264.0, 49.0, 231.0, 320.0, 72.0, 96.0, 308.0, 245.0, 56.0, 263.0]
2025-09-12 00:44:53,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 16 hours, 43 minutes, 54 seconds)
2025-09-12 00:57:30,841 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:57:30,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:58:40,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 634.56769 ± 606.383
2025-09-12 00:58:40,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [2017.84, 200.06053, 11.276919, 99.29561, 934.7741, 380.85022, 1125.2928, 977.1325, 584.7665, 14.387891]
2025-09-12 00:58:40,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [672.0, 180.0, 14.0, 74.0, 293.0, 158.0, 354.0, 338.0, 209.0, 16.0]
2025-09-12 00:58:40,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 16 hours, 34 minutes, 40 seconds)
2025-09-12 01:11:15,606 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:11:15,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:13:04,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1041.38220 ± 513.770
2025-09-12 01:13:04,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1195.3683, 1652.4274, 1290.4347, 1571.1631, 877.15955, 501.96173, 908.693, 1695.5736, 683.9011, 37.139717]
2025-09-12 01:13:04,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [407.0, 554.0, 462.0, 517.0, 282.0, 193.0, 311.0, 583.0, 244.0, 62.0]
2025-09-12 01:13:04,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 16 hours, 19 minutes, 4 seconds)
2025-09-12 01:25:30,427 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:25:30,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:26:24,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 464.49625 ± 493.410
2025-09-12 01:26:24,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1028.6493, 1450.6555, 19.118015, 19.523432, 189.88194, 122.82938, 489.01422, 18.306715, 1030.401, 276.5831]
2025-09-12 01:26:24,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [374.0, 495.0, 20.0, 20.0, 90.0, 70.0, 206.0, 22.0, 368.0, 140.0]
2025-09-12 01:26:24,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 15 hours, 50 minutes, 6 seconds)
2025-09-12 01:38:46,661 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:38:46,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:39:58,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 642.11176 ± 863.184
2025-09-12 01:39:58,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [392.0458, 12.707615, 1347.7557, 245.58377, 221.2599, 24.107584, 120.12584, 627.4363, 449.5612, 2980.534]
2025-09-12 01:39:58,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [161.0, 19.0, 471.0, 111.0, 120.0, 23.0, 62.0, 254.0, 178.0, 1000.0]
2025-09-12 01:39:58,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 15 hours, 43 minutes, 53 seconds)
2025-09-12 01:52:32,537 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:52:32,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:53:21,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 419.94815 ± 263.366
2025-09-12 01:53:21,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [282.13998, 352.11942, 12.564921, 798.75165, 527.3791, 766.73376, 85.07876, 285.0063, 728.1818, 361.52588]
2025-09-12 01:53:21,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [122.0, 152.0, 29.0, 309.0, 190.0, 285.0, 58.0, 122.0, 247.0, 147.0]
2025-09-12 01:53:21,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 15 hours, 31 minutes, 5 seconds)
2025-09-12 02:06:02,599 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:06:02,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:07:09,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 665.46765 ± 456.344
2025-09-12 02:07:09,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [701.3554, 121.3694, 23.33524, 569.93, 1119.6838, 85.82589, 1030.9031, 527.6303, 1323.454, 1151.1891]
2025-09-12 02:07:09,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [219.0, 62.0, 22.0, 215.0, 348.0, 48.0, 327.0, 195.0, 432.0, 393.0]
2025-09-12 02:07:09,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 15 hours, 17 minutes, 42 seconds)
2025-09-12 02:19:19,724 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:19:19,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:20:15,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 500.15643 ± 726.610
2025-09-12 02:20:15,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [2108.3743, 1135.7689, 1421.3127, 119.99569, 11.809692, 10.652896, 13.749754, 57.5848, 71.43626, 50.879475]
2025-09-12 02:20:15,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [679.0, 400.0, 489.0, 65.0, 15.0, 14.0, 24.0, 61.0, 48.0, 59.0]
2025-09-12 02:20:15,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 14 hours, 46 minutes, 46 seconds)
2025-09-12 02:32:47,493 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:32:47,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:34:04,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 709.55060 ± 668.263
2025-09-12 02:34:04,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1408.2933, 742.3904, 562.1012, 730.6028, 163.4804, 462.27298, 33.624794, 14.015824, 641.40564, 2337.3184]
2025-09-12 02:34:04,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [520.0, 251.0, 212.0, 266.0, 80.0, 193.0, 61.0, 14.0, 226.0, 785.0]
2025-09-12 02:34:04,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 14 hours, 39 minutes, 47 seconds)
2025-09-12 02:46:41,438 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:46:41,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:48:14,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 901.97900 ± 933.999
2025-09-12 02:48:14,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [922.0259, 1750.373, 71.72646, 1057.4631, 1671.3296, 216.03874, 205.44481, 2987.1006, 8.585962, 129.70241]
2025-09-12 02:48:14,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [312.0, 568.0, 47.0, 331.0, 568.0, 99.0, 104.0, 1000.0, 11.0, 65.0]
2025-09-12 02:48:14,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 14 hours, 33 minutes, 56 seconds)
2025-09-12 03:00:09,968 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:00:09,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:00:46,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 298.86823 ± 397.670
2025-09-12 03:00:46,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1055.7809, 13.876026, 203.11713, 60.091454, 24.823578, 199.5802, 75.443474, 226.08919, 26.735218, 1103.1453]
2025-09-12 03:00:46,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [344.0, 17.0, 109.0, 36.0, 25.0, 94.0, 49.0, 111.0, 28.0, 403.0]
2025-09-12 03:00:46,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 14 hours, 9 minutes, 21 seconds)
2025-09-12 03:13:21,896 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:13:21,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:14:25,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 575.54639 ± 747.149
2025-09-12 03:14:25,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [17.362303, 12.248282, 774.4733, 12.494731, 793.06866, 138.39265, 1218.892, 9.550098, 317.30502, 2461.6772]
2025-09-12 03:14:25,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 17.0, 276.0, 15.0, 281.0, 70.0, 427.0, 13.0, 182.0, 820.0]
2025-09-12 03:14:25,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 13 hours, 54 minutes, 3 seconds)
2025-09-12 03:26:37,022 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:26:37,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:28:11,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 964.40076 ± 587.542
2025-09-12 03:28:11,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [761.26184, 135.29321, 230.73671, 1570.1107, 1224.3027, 291.8389, 798.9709, 1452.5348, 1235.0411, 1943.9165]
2025-09-12 03:28:11,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [240.0, 68.0, 116.0, 518.0, 367.0, 129.0, 256.0, 493.0, 383.0, 627.0]
2025-09-12 03:28:12,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 13 hours, 48 minutes, 57 seconds)
2025-09-12 03:40:37,027 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:40:37,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:42:24,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1060.61694 ± 866.860
2025-09-12 03:42:24,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1643.8654, 3036.8567, 261.59512, 698.6761, 22.582636, 1190.0742, 1616.2565, 1393.9703, 502.38074, 239.91183]
2025-09-12 03:42:24,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [553.0, 1000.0, 117.0, 251.0, 56.0, 377.0, 501.0, 466.0, 192.0, 123.0]
2025-09-12 03:42:24,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 13 hours, 40 minutes, 3 seconds)
2025-09-12 03:55:04,698 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:55:04,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:55:53,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 417.88446 ± 561.820
2025-09-12 03:55:53,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [897.9694, 20.21891, 235.54375, 58.74404, 119.289215, 22.46457, 1190.1495, 11.419003, 8.860106, 1614.1865]
2025-09-12 03:55:53,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [372.0, 21.0, 112.0, 40.0, 64.0, 23.0, 435.0, 16.0, 11.0, 542.0]
2025-09-12 03:55:53,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 13 hours, 18 minutes, 6 seconds)
2025-09-12 04:08:05,691 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:08:05,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:09:58,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1129.86133 ± 780.393
2025-09-12 04:09:58,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [273.5381, 22.934158, 121.669334, 1860.4243, 1608.3118, 782.5139, 1078.9417, 1189.7251, 2083.085, 2277.4712]
2025-09-12 04:09:58,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [141.0, 22.0, 86.0, 616.0, 520.0, 268.0, 368.0, 394.0, 672.0, 728.0]
2025-09-12 04:09:58,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 13 hours, 22 minutes, 49 seconds)
2025-09-12 04:22:25,397 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:22:25,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:23:45,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 776.61902 ± 815.332
2025-09-12 04:23:45,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [56.702126, 69.24299, 1181.8398, 769.1077, 1527.385, 965.7735, 421.92023, 63.282314, 21.204199, 2689.7324]
2025-09-12 04:23:45,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [54.0, 73.0, 379.0, 291.0, 531.0, 328.0, 158.0, 45.0, 22.0, 823.0]
2025-09-12 04:23:45,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 13 hours, 10 minutes, 30 seconds)
2025-09-12 04:36:17,667 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:36:17,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:37:21,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 627.83264 ± 569.065
2025-09-12 04:37:21,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [223.18626, 277.86826, 1622.5687, 898.9656, 47.95237, 1356.9818, 1134.5446, 693.56055, 14.503092, 8.195776]
2025-09-12 04:37:21,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [101.0, 120.0, 500.0, 279.0, 29.0, 479.0, 344.0, 258.0, 21.0, 15.0]
2025-09-12 04:37:21,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 12 hours, 54 minutes, 32 seconds)
2025-09-12 04:49:28,615 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:49:28,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:50:41,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 682.72632 ± 784.725
2025-09-12 04:50:41,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [80.91121, 1166.9218, 18.986488, 10.427439, 516.349, 2485.6309, 1631.3287, 183.27954, 510.93658, 222.49121]
2025-09-12 04:50:41,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [48.0, 430.0, 19.0, 14.0, 200.0, 798.0, 510.0, 86.0, 199.0, 114.0]
2025-09-12 04:50:41,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 12 hours, 31 minutes, 2 seconds)
2025-09-12 05:03:01,178 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:03:01,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:03:57,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 512.78833 ± 520.677
2025-09-12 05:03:57,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [70.04209, 66.69368, 1643.422, 47.981667, 9.372132, 915.1965, 487.78787, 379.72992, 385.21445, 1122.4432]
2025-09-12 05:03:57,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [46.0, 54.0, 544.0, 43.0, 13.0, 321.0, 182.0, 148.0, 164.0, 349.0]
2025-09-12 05:03:57,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 12 hours, 15 minutes, 7 seconds)
2025-09-12 05:16:20,216 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:16:20,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:17:11,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 448.78214 ± 558.687
2025-09-12 05:17:11,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [272.5369, 9.633944, 12.279266, 119.60142, 1078.4808, 291.49753, 102.749504, 1822.1122, 676.2014, 102.728546]
2025-09-12 05:17:11,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [120.0, 15.0, 17.0, 62.0, 342.0, 126.0, 65.0, 607.0, 277.0, 78.0]
2025-09-12 05:17:11,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 11 hours, 52 minutes, 23 seconds)
2025-09-12 05:29:57,608 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:29:57,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:30:53,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 506.44571 ± 547.889
2025-09-12 05:30:53,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [278.80203, 19.236044, 22.44406, 1556.5265, 22.641218, 524.7971, 7.089961, 389.8569, 874.8701, 1368.1931]
2025-09-12 05:30:53,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [121.0, 21.0, 33.0, 533.0, 22.0, 192.0, 11.0, 159.0, 319.0, 425.0]
2025-09-12 05:30:53,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 11 hours, 38 minutes, 3 seconds)
2025-09-12 05:43:21,448 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:43:21,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:44:44,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 810.83710 ± 498.745
2025-09-12 05:44:44,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [783.6978, 1335.8933, 346.34378, 1729.6321, 155.7542, 927.04877, 694.4639, 1238.9152, 95.59304, 801.02856]
2025-09-12 05:44:44,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [280.0, 404.0, 140.0, 572.0, 75.0, 319.0, 252.0, 404.0, 55.0, 246.0]
2025-09-12 05:44:44,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 11 hours, 27 minutes, 17 seconds)
2025-09-12 05:57:02,137 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:57:02,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:58:48,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1064.96143 ± 832.367
2025-09-12 05:58:48,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [182.5937, 535.23627, 1016.194, 192.73271, 1673.132, 625.92737, 760.46246, 1031.4614, 1503.6384, 3128.2363]
2025-09-12 05:58:48,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 205.0, 364.0, 90.0, 550.0, 214.0, 269.0, 320.0, 471.0, 1000.0]
2025-09-12 05:58:48,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 11 hours, 21 minutes, 7 seconds)
2025-09-12 06:11:00,863 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:11:00,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:12:06,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 649.20471 ± 950.451
2025-09-12 06:12:07,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [3209.9473, 73.64378, 11.488184, 334.01587, 1295.0137, 1007.455, 186.68459, 6.8898053, 312.1339, 54.77442]
2025-09-12 06:12:07,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 47.0, 17.0, 136.0, 380.0, 333.0, 109.0, 10.0, 116.0, 38.0]
2025-09-12 06:12:07,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 11 hours, 8 minutes, 1 second)
2025-09-12 06:24:31,572 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:24:31,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:26:00,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 850.47034 ± 973.221
2025-09-12 06:26:00,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [369.12158, 97.70387, 1943.2614, 56.73494, 9.47176, 73.46939, 575.3102, 1577.054, 3068.7505, 733.82574]
2025-09-12 06:26:00,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [151.0, 74.0, 614.0, 37.0, 16.0, 46.0, 220.0, 512.0, 1000.0, 269.0]
2025-09-12 06:26:00,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 11 hours, 44 seconds)
2025-09-12 06:38:36,016 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:38:36,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:40:06,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 875.50684 ± 726.446
2025-09-12 06:40:06,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [418.6094, 12.338988, 1675.3259, 1712.2635, 1657.0128, 1073.9998, 479.84293, 1702.5668, 11.247666, 11.860801]
2025-09-12 06:40:06,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [157.0, 17.0, 562.0, 569.0, 547.0, 366.0, 184.0, 577.0, 15.0, 19.0]
2025-09-12 06:40:06,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 10 hours, 50 minutes, 39 seconds)
2025-09-12 06:52:24,816 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:52:24,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:53:28,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 607.02203 ± 669.072
2025-09-12 06:53:28,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [141.40572, 104.14741, 86.86932, 977.55347, 170.92513, 324.8585, 1605.4583, 2014.9652, 628.2978, 15.739019]
2025-09-12 06:53:28,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [70.0, 62.0, 52.0, 337.0, 81.0, 135.0, 547.0, 625.0, 223.0, 19.0]
2025-09-12 06:53:28,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 10 hours, 32 minutes, 26 seconds)
2025-09-12 07:06:02,961 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:06:02,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:07:33,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 845.02527 ± 748.645
2025-09-12 07:07:33,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [20.588648, 1147.5231, 101.60368, 710.18414, 360.17572, 1681.4662, 626.8821, 2574.7417, 312.5074, 914.5807]
2025-09-12 07:07:33,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 394.0, 64.0, 259.0, 153.0, 593.0, 272.0, 861.0, 160.0, 278.0]
2025-09-12 07:07:33,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 10 hours, 18 minutes, 48 seconds)
2025-09-12 07:19:58,030 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:19:58,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:21:14,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 763.58496 ± 586.738
2025-09-12 07:21:14,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [11.039755, 473.21234, 1178.0193, 1153.7362, 2018.7113, 217.76483, 608.85614, 929.9235, 984.1824, 60.403408]
2025-09-12 07:21:14,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 208.0, 363.0, 383.0, 668.0, 100.0, 210.0, 276.0, 300.0, 41.0]
2025-09-12 07:21:14,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 10 hours, 8 minutes, 20 seconds)
2025-09-12 07:33:40,693 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:33:40,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:35:55,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1350.44702 ± 1136.772
2025-09-12 07:35:55,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1722.6555, 11.650878, 151.26614, 10.568706, 2999.0173, 2058.6458, 589.1223, 2036.3228, 792.2635, 3132.9563]
2025-09-12 07:35:55,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [548.0, 20.0, 73.0, 13.0, 1000.0, 650.0, 219.0, 642.0, 279.0, 1000.0]
2025-09-12 07:35:55,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (1350.45) for latency ExtremeClogL1U23
2025-09-12 07:35:55,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 10 hours, 1 minute, 12 seconds)
2025-09-12 07:48:24,721 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:48:24,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:49:56,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 870.08136 ± 704.949
2025-09-12 07:49:56,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [11.664543, 786.5516, 1718.2329, 1310.6897, 1155.0042, 2039.3942, 17.89166, 19.415176, 1273.2146, 368.7545]
2025-09-12 07:49:56,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 324.0, 554.0, 457.0, 397.0, 677.0, 21.0, 21.0, 431.0, 172.0]
2025-09-12 07:49:56,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 9 hours, 46 minutes, 36 seconds)
2025-09-12 08:02:31,380 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:02:31,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:04:11,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1008.47461 ± 831.079
2025-09-12 08:04:11,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [63.87118, 14.140189, 1059.9299, 1792.638, 2648.256, 1490.9469, 575.87897, 850.08514, 25.43575, 1563.5641]
2025-09-12 08:04:11,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [41.0, 19.0, 332.0, 587.0, 843.0, 453.0, 215.0, 312.0, 45.0, 523.0]
2025-09-12 08:04:11,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 9 hours, 39 minutes, 53 seconds)
2025-09-12 08:16:35,779 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:16:35,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:18:45,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1310.58521 ± 1158.744
2025-09-12 08:18:45,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [994.44574, 2611.7944, 567.4379, 1451.8702, 57.346355, 17.534658, 232.69212, 3107.5112, 3124.7502, 940.47]
2025-09-12 08:18:45,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [334.0, 823.0, 206.0, 512.0, 44.0, 18.0, 103.0, 1000.0, 1000.0, 316.0]
2025-09-12 08:18:45,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 9 hours, 29 minutes, 35 seconds)
2025-09-12 08:31:39,811 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:31:39,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:33:00,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 796.85699 ± 818.318
2025-09-12 08:33:00,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1183.1771, 590.2473, 164.46664, 17.104622, 11.736035, 238.16238, 11.611646, 2193.4607, 1591.5325, 1967.0708]
2025-09-12 08:33:00,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [386.0, 208.0, 77.0, 20.0, 25.0, 120.0, 14.0, 711.0, 492.0, 627.0]
2025-09-12 08:33:00,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 9 hours, 19 minutes, 46 seconds)
2025-09-12 08:45:17,581 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:45:17,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:46:26,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 698.49396 ± 643.404
2025-09-12 08:46:26,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [922.6443, 8.150832, 166.06328, 12.972041, 421.4628, 445.37466, 2302.2888, 791.63654, 912.6175, 1001.7283]
2025-09-12 08:46:26,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [308.0, 11.0, 111.0, 18.0, 161.0, 169.0, 696.0, 248.0, 308.0, 305.0]
2025-09-12 08:46:26,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 8 hours, 55 minutes, 58 seconds)
2025-09-12 08:58:50,817 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:58:50,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:59:32,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 369.99396 ± 456.434
2025-09-12 08:59:32,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [15.634799, 18.632923, 8.812507, 841.3629, 1519.0952, 243.83376, 15.10244, 416.3784, 381.7168, 239.36984]
2025-09-12 08:59:32,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 22.0, 12.0, 291.0, 495.0, 126.0, 21.0, 174.0, 159.0, 108.0]
2025-09-12 08:59:32,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 8 hours, 35 minutes, 5 seconds)
2025-09-12 09:11:54,106 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:11:54,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:13:42,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1058.03589 ± 667.720
2025-09-12 09:13:42,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [980.5592, 397.38733, 2240.9043, 735.1504, 1203.0834, 1658.7963, 243.91373, 1830.4926, 130.23837, 1159.8341]
2025-09-12 09:13:42,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [354.0, 165.0, 720.0, 268.0, 421.0, 542.0, 107.0, 602.0, 68.0, 395.0]
2025-09-12 09:13:42,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 8 hours, 20 minutes, 30 seconds)
2025-09-12 09:26:34,983 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:26:35,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:28:19,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1080.97180 ± 980.859
2025-09-12 09:28:19,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [11.18502, 170.55817, 3167.3635, 1333.8044, 1483.5604, 1759.8131, 1836.8888, 145.81561, 878.4051, 22.322975]
2025-09-12 09:28:19,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 81.0, 1000.0, 402.0, 484.0, 516.0, 567.0, 74.0, 302.0, 23.0]
2025-09-12 09:28:19,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 8 hours, 6 minutes, 58 seconds)
2025-09-12 09:41:27,796 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:41:27,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:42:52,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 813.07477 ± 673.438
2025-09-12 09:42:52,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [50.039318, 1001.8482, 977.39075, 529.7163, 12.581213, 862.77234, 697.7003, 2321.6094, 1500.5448, 176.54486]
2025-09-12 09:42:52,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [34.0, 343.0, 297.0, 207.0, 16.0, 311.0, 249.0, 794.0, 493.0, 84.0]
2025-09-12 09:42:52,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 55 minutes, 4 seconds)
2025-09-12 09:54:53,290 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:54:53,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:56:34,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 998.32812 ± 854.576
2025-09-12 09:56:34,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1683.4547, 13.29824, 1077.9238, 353.45526, 1441.9255, 212.95232, 656.1281, 25.12987, 2696.643, 1822.3705]
2025-09-12 09:56:34,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [560.0, 19.0, 364.0, 141.0, 455.0, 92.0, 250.0, 24.0, 862.0, 591.0]
2025-09-12 09:56:34,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 7 hours, 42 minutes, 51 seconds)
2025-09-12 10:08:31,591 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:08:31,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:11:13,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1636.09534 ± 1134.218
2025-09-12 10:11:13,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1958.4109, 850.9552, 11.231415, 3066.486, 266.04492, 1820.541, 3170.295, 3189.4297, 1067.314, 960.2437]
2025-09-12 10:11:13,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [627.0, 301.0, 15.0, 1000.0, 114.0, 588.0, 1000.0, 1000.0, 362.0, 337.0]
2025-09-12 10:11:13,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (1636.10) for latency ExtremeClogL1U23
2025-09-12 10:11:13,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 7 hours, 38 minutes, 42 seconds)
2025-09-12 10:23:45,279 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:23:45,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:25:22,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 933.46027 ± 920.761
2025-09-12 10:25:22,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [287.95166, 1360.7411, 12.807408, 3108.706, 1812.3379, 958.97235, 334.84653, 13.563521, 381.24084, 1063.4355]
2025-09-12 10:25:22,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [128.0, 463.0, 22.0, 1000.0, 608.0, 328.0, 139.0, 21.0, 156.0, 420.0]
2025-09-12 10:25:22,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 24 minutes, 21 seconds)
2025-09-12 10:38:17,441 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:38:17,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:39:56,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 946.04572 ± 985.278
2025-09-12 10:39:56,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [193.21259, 3066.662, 1560.917, 67.056206, 94.95736, 388.34116, 13.253301, 2089.5, 1364.642, 621.91614]
2025-09-12 10:39:56,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [89.0, 1000.0, 529.0, 43.0, 53.0, 155.0, 23.0, 674.0, 467.0, 238.0]
2025-09-12 10:39:56,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 9 minutes, 40 seconds)
2025-09-12 10:52:42,585 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:52:42,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:54:01,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 780.17480 ± 636.501
2025-09-12 10:54:01,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1226.2109, 327.5228, 1189.6182, 1988.1178, 880.4702, 1394.5916, 105.54255, 19.554186, 16.290379, 653.8295]
2025-09-12 10:54:01,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [423.0, 142.0, 359.0, 602.0, 306.0, 479.0, 60.0, 20.0, 18.0, 224.0]
2025-09-12 10:54:01,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 52 minutes, 38 seconds)
2025-09-12 11:05:50,595 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:05:50,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:07:36,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1075.66284 ± 869.377
2025-09-12 11:07:36,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [2305.0698, 665.3578, 1123.0717, 2712.1309, 169.5795, 68.88689, 1683.2217, 1058.5476, 870.8787, 99.88298]
2025-09-12 11:07:36,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [756.0, 274.0, 375.0, 823.0, 82.0, 50.0, 513.0, 360.0, 302.0, 56.0]
2025-09-12 11:07:36,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 37 minutes, 48 seconds)
2025-09-12 11:20:52,678 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:20:52,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:22:58,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1265.87524 ± 966.941
2025-09-12 11:22:58,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1639.389, 1427.6549, 359.20746, 787.4881, 583.55426, 904.45996, 666.48376, 260.49878, 3169.9116, 2860.1055]
2025-09-12 11:22:58,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [539.0, 477.0, 141.0, 285.0, 207.0, 339.0, 240.0, 115.0, 1000.0, 898.0]
2025-09-12 11:22:58,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 27 minutes, 29 seconds)
2025-09-12 11:34:58,058 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:34:58,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:36:17,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 778.08221 ± 946.919
2025-09-12 11:36:17,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [221.97389, 3158.9863, 1528.7095, 1168.4114, 9.394935, 88.29539, 644.2404, 10.350142, 19.517448, 930.94305]
2025-09-12 11:36:17,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [101.0, 1000.0, 515.0, 388.0, 19.0, 62.0, 247.0, 13.0, 23.0, 314.0]
2025-09-12 11:36:17,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 8 minutes, 44 seconds)
2025-09-12 11:48:29,009 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:48:29,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:49:42,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 725.41888 ± 834.736
2025-09-12 11:49:42,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [895.7962, 2873.6228, 106.43, 11.877449, 916.3512, 301.05057, 137.14021, 546.9421, 1379.472, 85.50637]
2025-09-12 11:49:42,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [275.0, 933.0, 61.0, 24.0, 274.0, 125.0, 68.0, 217.0, 451.0, 49.0]
2025-09-12 11:49:42,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 48 minutes, 52 seconds)
2025-09-12 12:02:11,468 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:02:11,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:03:53,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1009.15643 ± 1074.919
2025-09-12 12:03:53,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1027.5975, 67.47965, 3154.2405, 237.76537, 710.1524, 87.256935, 1913.0674, 358.79337, 11.0098095, 2524.2007]
2025-09-12 12:03:53,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [332.0, 56.0, 1000.0, 108.0, 261.0, 54.0, 626.0, 137.0, 14.0, 827.0]
2025-09-12 12:03:53,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 35 minutes, 23 seconds)
2025-09-12 12:16:50,154 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:16:50,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:18:34,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1034.42786 ± 781.478
2025-09-12 12:18:34,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [15.945729, 14.247335, 1157.1606, 1109.8928, 856.4075, 1280.8245, 1931.5432, 184.75723, 2568.2742, 1225.2258]
2025-09-12 12:18:34,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 23.0, 366.0, 341.0, 293.0, 431.0, 665.0, 86.0, 841.0, 433.0]
2025-09-12 12:18:34,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 26 minutes, 23 seconds)
2025-09-12 12:31:02,970 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:31:02,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:31:33,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 244.75455 ± 364.958
2025-09-12 12:31:33,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [103.152794, 1095.3943, 9.194929, 104.11987, 201.76453, 83.595345, 813.2533, 13.4851, 12.323101, 11.262233]
2025-09-12 12:31:33,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [61.0, 371.0, 14.0, 61.0, 105.0, 51.0, 298.0, 19.0, 22.0, 15.0]
2025-09-12 12:31:33,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 1 minute, 43 seconds)
2025-09-12 12:43:59,013 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:43:59,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:45:12,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 723.18884 ± 584.373
2025-09-12 12:45:12,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1259.1119, 12.934105, 232.31268, 118.858765, 1425.6132, 930.6064, 741.0355, 7.609614, 799.5813, 1704.2258]
2025-09-12 12:45:12,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [414.0, 34.0, 106.0, 69.0, 473.0, 303.0, 267.0, 10.0, 262.0, 511.0]
2025-09-12 12:45:12,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 49 minutes, 24 seconds)
2025-09-12 12:57:26,684 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:57:26,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:58:08,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 368.91202 ± 405.075
2025-09-12 12:58:08,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [106.18521, 10.412581, 1327.6663, 147.71362, 572.4548, 158.08397, 11.798547, 544.51624, 67.28285, 743.0062]
2025-09-12 12:58:08,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [62.0, 13.0, 433.0, 74.0, 215.0, 89.0, 13.0, 210.0, 45.0, 240.0]
2025-09-12 12:58:08,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 33 minutes, 42 seconds)
2025-09-12 13:10:39,760 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:10:39,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:11:44,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 605.22144 ± 775.109
2025-09-12 13:11:44,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [955.3743, 2387.6985, 10.61713, 533.7427, 11.809312, 125.808044, 18.594997, 1630.7737, 300.56168, 77.23438]
2025-09-12 13:11:44,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [330.0, 771.0, 17.0, 225.0, 13.0, 64.0, 23.0, 528.0, 122.0, 47.0]
2025-09-12 13:11:44,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 17 minutes, 48 seconds)
2025-09-12 13:24:57,812 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:24:57,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:26:17,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 785.22797 ± 893.489
2025-09-12 13:26:17,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [511.37695, 37.78654, 3055.4382, 19.241405, 936.1274, 813.4797, 16.19461, 85.699486, 888.9359, 1487.9996]
2025-09-12 13:26:17,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [190.0, 30.0, 964.0, 23.0, 306.0, 291.0, 24.0, 68.0, 336.0, 458.0]
2025-09-12 13:26:17,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 3 minutes, 49 seconds)
2025-09-12 13:38:32,688 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:38:32,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:40:07,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 908.02393 ± 610.351
2025-09-12 13:40:07,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1963.7397, 1722.4916, 1257.7528, 842.9195, 110.575645, 1003.2853, 97.70919, 1151.2887, 280.92685, 649.55]
2025-09-12 13:40:07,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [663.0, 528.0, 422.0, 291.0, 59.0, 353.0, 63.0, 404.0, 117.0, 230.0]
2025-09-12 13:40:07,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 53 minutes, 7 seconds)
2025-09-12 13:52:41,412 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:52:41,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:53:32,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 444.39688 ± 381.575
2025-09-12 13:53:32,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [481.37112, 182.11264, 389.4111, 9.257934, 686.9953, 10.565141, 154.05406, 374.2435, 1242.7994, 913.1585]
2025-09-12 13:53:32,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [188.0, 149.0, 154.0, 12.0, 241.0, 14.0, 75.0, 148.0, 421.0, 305.0]
2025-09-12 13:53:32,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 38 minutes, 42 seconds)
2025-09-12 14:05:49,611 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:05:49,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:06:28,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 325.22855 ± 303.467
2025-09-12 14:06:28,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [84.30059, 186.46788, 13.365163, 893.496, 649.5938, 26.797348, 114.07241, 740.81744, 216.3378, 327.0369]
2025-09-12 14:06:28,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [48.0, 90.0, 20.0, 314.0, 230.0, 33.0, 63.0, 262.0, 106.0, 133.0]
2025-09-12 14:06:28,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 25 minutes)
2025-09-12 14:19:09,344 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:19:09,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:21:14,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1292.53345 ± 1129.580
2025-09-12 14:21:14,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1868.5791, 3137.6875, 76.41149, 8.646254, 12.078415, 475.03055, 2125.3403, 732.7821, 1624.3341, 2864.4453]
2025-09-12 14:21:14,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [574.0, 1000.0, 46.0, 12.0, 18.0, 176.0, 700.0, 256.0, 531.0, 917.0]
2025-09-12 14:21:14,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 14 minutes, 36 seconds)
2025-09-12 14:33:13,301 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:33:13,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:35:29,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1268.08984 ± 941.530
2025-09-12 14:35:29,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [2671.8142, 1823.1504, 1770.7078, 471.11642, 2938.2542, 238.08861, 391.29736, 317.96274, 852.4491, 1206.0562]
2025-09-12 14:35:29,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [885.0, 653.0, 579.0, 250.0, 1000.0, 127.0, 156.0, 166.0, 287.0, 418.0]
2025-09-12 14:35:29,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 59 minutes, 53 seconds)
2025-09-12 14:48:11,209 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:48:11,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:49:53,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1007.46191 ± 840.218
2025-09-12 14:49:53,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [141.21492, 630.6298, 129.08696, 322.63666, 2221.5852, 2312.1387, 921.8048, 928.9188, 2137.446, 329.15753]
2025-09-12 14:49:53,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [70.0, 231.0, 67.0, 135.0, 715.0, 751.0, 331.0, 310.0, 688.0, 136.0]
2025-09-12 14:49:53,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 47 minutes, 27 seconds)
2025-09-12 15:02:41,680 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:02:41,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:04:23,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 985.28455 ± 935.003
2025-09-12 15:04:23,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [710.9354, 14.874308, 12.687846, 1252.849, 89.19795, 1583.0547, 1402.7368, 2904.5989, 1858.9824, 22.927929]
2025-09-12 15:04:23,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [254.0, 18.0, 16.0, 483.0, 51.0, 546.0, 487.0, 914.0, 617.0, 22.0]
2025-09-12 15:04:23,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 35 minutes, 52 seconds)
2025-09-12 15:16:42,856 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:16:42,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:18:10,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 883.65656 ± 713.854
2025-09-12 15:18:10,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [14.419952, 313.26733, 2344.7817, 1112.3737, 13.365706, 658.3952, 1638.4435, 1000.55554, 1337.3315, 403.6319]
2025-09-12 15:18:10,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 127.0, 750.0, 387.0, 20.0, 231.0, 486.0, 334.0, 409.0, 160.0]
2025-09-12 15:18:10,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 23 minutes, 24 seconds)
2025-09-12 15:30:34,959 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:30:34,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:31:16,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 351.02985 ± 527.281
2025-09-12 15:31:16,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [414.33682, 10.637931, 639.54987, 12.242748, 9.61521, 1768.4183, 12.169066, 10.06372, 89.30873, 543.95593]
2025-09-12 15:31:16,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [163.0, 15.0, 273.0, 17.0, 23.0, 611.0, 14.0, 14.0, 56.0, 206.0]
2025-09-12 15:31:16,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 6 minutes, 3 seconds)
2025-09-12 15:44:54,635 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:44:54,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:46:58,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1252.27661 ± 1281.132
2025-09-12 15:46:58,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [2102.9521, 16.459517, 2979.6243, 941.9752, 3101.2832, 440.03104, 2844.2822, 11.923235, 14.03541, 70.20043]
2025-09-12 15:46:58,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [685.0, 17.0, 954.0, 321.0, 1000.0, 172.0, 907.0, 18.0, 19.0, 48.0]
2025-09-12 15:46:58,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 54 minutes, 23 seconds)
2025-09-12 15:58:26,757 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:58:26,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:59:02,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 302.95172 ± 396.249
2025-09-12 15:59:02,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [13.688405, 14.015483, 1131.5531, 569.89417, 11.080175, 19.744333, 14.803914, 52.44449, 321.75504, 880.5383]
2025-09-12 15:59:02,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 16.0, 403.0, 212.0, 26.0, 23.0, 17.0, 43.0, 134.0, 304.0]
2025-09-12 15:59:02,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 36 minutes, 49 seconds)
2025-09-12 16:11:55,148 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:11:55,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:13:20,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 787.23254 ± 784.008
2025-09-12 16:13:20,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [15.726893, 108.31055, 1627.5538, 188.10356, 614.0002, 2333.5547, 555.84045, 1760.0144, 22.57599, 646.64404]
2025-09-12 16:13:20,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 93.0, 576.0, 87.0, 221.0, 765.0, 213.0, 605.0, 42.0, 237.0]
2025-09-12 16:13:20,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 22 minutes, 44 seconds)
2025-09-12 16:25:57,562 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:25:57,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:26:57,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 578.04895 ± 603.111
2025-09-12 16:26:57,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [487.91586, 20.763132, 28.081057, 1119.3, 1446.4594, 317.35242, 1765.5592, 233.35289, 12.477117, 349.22827]
2025-09-12 16:26:57,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [210.0, 20.0, 26.0, 351.0, 424.0, 133.0, 568.0, 104.0, 14.0, 141.0]
2025-09-12 16:26:57,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 8 minutes, 46 seconds)
2025-09-12 16:39:08,052 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:39:08,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:41:13,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1265.22595 ± 1129.730
2025-09-12 16:41:13,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [119.680664, 3078.55, 804.0367, 1465.2343, 11.573029, 759.1153, 2305.6125, 3179.242, 300.05484, 629.1602]
2025-09-12 16:41:13,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [71.0, 1000.0, 244.0, 482.0, 22.0, 239.0, 699.0, 1000.0, 130.0, 231.0]
2025-09-12 16:41:13,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 55 minutes, 57 seconds)
2025-09-12 16:53:36,431 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:53:36,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:55:30,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1115.57056 ± 1054.886
2025-09-12 16:55:30,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [239.42273, 10.981654, 1981.7388, 2088.903, 16.597607, 273.66528, 3058.0774, 1739.8569, 84.11966, 1662.342]
2025-09-12 16:55:30,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [106.0, 19.0, 672.0, 678.0, 22.0, 119.0, 1000.0, 567.0, 50.0, 549.0]
2025-09-12 16:55:30,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 41 minutes, 7 seconds)
2025-09-12 17:08:13,472 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:08:13,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:09:39,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 845.80145 ± 1039.223
2025-09-12 17:09:39,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [458.72388, 1386.8594, 189.75406, 81.1336, 19.23478, 322.7839, 3202.737, 2269.6414, 13.624671, 513.52234]
2025-09-12 17:09:39,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [181.0, 462.0, 88.0, 64.0, 20.0, 148.0, 1000.0, 719.0, 18.0, 205.0]
2025-09-12 17:09:39,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 28 minutes, 14 seconds)
2025-09-12 17:22:24,893 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:22:24,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:23:56,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 936.80499 ± 565.692
2025-09-12 17:23:56,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [816.45184, 10.906387, 1284.9554, 1971.159, 882.6407, 1228.5247, 956.05304, 10.515879, 843.363, 1363.4802]
2025-09-12 17:23:56,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [280.0, 12.0, 417.0, 613.0, 306.0, 357.0, 345.0, 13.0, 294.0, 406.0]
2025-09-12 17:23:56,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 14 minutes, 7 seconds)
2025-09-12 17:36:26,196 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:36:26,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:38:00,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 951.51917 ± 1162.684
2025-09-12 17:38:00,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [2803.4194, 2279.787, 277.5411, 14.037029, 396.59827, 716.6858, 9.079721, 45.714535, 11.726063, 2960.602]
2025-09-12 17:38:00,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [934.0, 697.0, 116.0, 17.0, 154.0, 249.0, 20.0, 48.0, 16.0, 887.0]
2025-09-12 17:38:00,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1251 [DEBUG]: Training session finished
